Multiple linear regression
From Wikiversity
| Home | Survey design |
Descr/ Graphs |
Correl- ation |
Psycho- metrics |
EFA | MLR | ANOVA | Qual. | Power | Effect size |
Review |
| Completion status: this resource is ~50% complete. |
| Educational level: this is a tertiary (university) resource. |
| Resource type: this resource consists of notes. |
The purpose of this multiple linear regression (MLR) learning project is to:
- explain the concepts and principles of MLR and
- provide practical data analysis exercises.
|
[edit] Assumed knowledge
Before undertaking this section, it is recommended that you understand:
[edit] What is MLR?
- MLR is a multivariate statistical technique for examining the linear correlations between two or more independent variables (IVs) and a single dependent variable (DV).
- A research question of the form "To what extent do X1, X2, and X3 (IVs) predict Y (DV) would lend itself to MLR. For example, "To what extent does people's age, gender, and average amount of red meat eaten per week (IVs) predict their levels of blood cholesterol (DV)"?
- MLR analyses can be visualised as path diagrams and/or venn diagrams
|
Multiple linear regression (MLR) is used to statistically 'distill' the relative contribution of two or more independent variables on a single dependent variable.
|
[edit] Assumptions
- Level of measurement
- Type of DV
- continuous
- Types of IVs
- continuous, or
- dichotomous (may require recoding into dummy variables)
- Type of DV
- Linear relations
- Multivariate outliers (Mahalanobis' distance, Cook's D)
- Sample size
- Recommended to have at least 20 cases per IV; 5 cases per IV is (approximately) the minimum (basically, you need enough data to provide reliable correlation estimates)
[edit] Statistics
- MLR analyses produce several diagnostic and outcome statistics which are summarised below and are important to understand.
- Also, make sure that you can learn how to find and interpret these statistics from statistical software output.
[edit] Correlations
Examine the linear correlations between (usually as a correlation matrix, but also view the scatterplots):
- IVs
- each IV and the DV
[edit] R
- (Big) R is the multiple correlation coefficient and its interpretation is similar to that for little r which represents the linear correlation between two variables, ranging between -1 (perfect negative relationship) to 1 (perfect positive relationship), with 0 indicating no relationship. However R can only range from 0 to 1, with 0 indicating that linear relationships between the independent variables (IV) and the dependent variable (DV) don't explain any of the variance in the DV. Large values of R indicate more variance explained in the DV.
- R can be squared and interpreted as for r2, with a rough rule of thumb being .1 (small), .3 (medium), and .5 (large). These R2 values would indicate 10%, 30%, and 50% of the variance in the DV explained respectively.
- When generalising findings to the population, the R2 for a sample tends to overestimate the R2 of the population. Thus, adjusted R2 is recommended when generalising from a sample, and this value will be adjusted downward based on the sample size; the smaller the sample size, the greater the reduction. Finally, the statistical significance of R can be examined using an F test.
[edit] Regression coefficients
- B (unstandardised)
- β (standardised)
- Partial correlations
- Part correlations
- t, p
- Confidence intervals
[edit] Equation
- Prediction equation
[edit] Types
There are several types of MLR, including:
- Direct (or Standard)
- All IVs are entered simultaneously
- Hierarchical
- IVs are entered in steps, i.e., some before others
- Interpret: R2 change, F change
- Forward
- The software enters IVs one by one until there are no more significant IVs to be entered
- Backward
- The software removes IVs one to one until there are no more non-significant IVs to removed
- Stepwise
- A combination of Forward and Backward MLR
[edit] Advanced concepts
- Partial correlations
- Use of hierarchical regression to partial out or remove the effect of 'control' variables
- Interactions between IVs
- Moderation and mediation
[edit] Writing up
- Assumptions
- Correlations
- Regression coefficients - e.g., see example table
- Causality
[edit] Data analysis exercises
[edit] See also
| Run a search on Multiple linear regression at Wikipedia. |
- Least-Squares Fitting
- Life expectancy MLR activity
- Logistic regression
- Multiple linear regression (Commons)
[edit] External links
- Correlation and simple least squares regression (Zady, 2000)
- Multiple linear regression I (lecture slides) (Neill, 2008)
- Multiple regression (Statsoft)
- Multiple regression assumptions (ERIC Digest)