Multiple linear regression/Assumptions
From Wikiversity
Assumptions
| View the accompanying screencast: [1] |
- Level of measurement
- IVs: MLR involves two or more continuous (interval or ratio) or dichotomous variables (may require recoding into dummy variables)
- DV: One continuous (interval or ratio) variable
- Sample size (some rules of thumb):
- Total N based on ratio of cases to IVs:
- Min. 5 cases per predictor (5:1) (basically, you need enough data to provide reliable correlation estimates)
- Ideally 20 cases per predictor (20:1), with an overall N of at least 100; this allows sufficient power to detect a medium ES of R2 of .13 (Francis, p. 128))
- Total N based on a constant plus ratio of cases to IVs:
- Tabachnick and Fidell (2007) suggest that N should ideally be 50 + 8(k) for testing a full regression model or 104 + k when testing individual predictors (where k is the number of IVs).
- Total N based on ratio of cases to IVs:
- Normality
- Check the univariate descriptive statistics (M, SD, skewness and kurtosis)
- Check the histograms with a normal curve imposed
- Estimates of correlations will be more reliable and stable when the variables are normally distributed
- Linearity
- Are the bivariate relationships linear?
- Check scatterplots and correlations between the DV (Y) and each of the IVs (Xs)
- Check for influence of bivariate outliers
- Homoscedasticity
- Are the bivariate distributions reasonably evenly spread about the line of best fit?
- Check scatterplots between Y and each of Xs and/or check scatterplot of the residuals (ZRESID) and predicted values (ZPRED))
- Multicollinearity
- Is there multicollinearity between the IVs? Predictors should not be overly correlated with one another. Ways to check:
- Examine bivariate correlations and scatterplots between each of the IVs (i.e., are the predictors overly correlated e.g., above .7?).
- Check the collinearity statistics in the coefficients table:
- The Variance Inflation Factor (VIF) should be low (< ~3-10) and/or
- Tolerance should be high (> .1 to .3) (Note that TOL=1/VIF so only one needs to be used).
- Is there multicollinearity between the IVs? Predictors should not be overly correlated with one another. Ways to check:
- Multivariate outliers
- Check whether there are influential multivariate outlying cases using Mahalanobis' Distance (MD) & Cook’s D (CD).
- Using SPSS: Linear Regression - Save - Mahalanobis and Cook's D - OK
- New variables called mah_1 and coo_1 will be added to the data file.
- Check the Residuals Statistics table in the output for the maximum MD and CD.
- The maximum MD should not exceed the critical chi-square value with degrees of freedom equal to number of predictors, with critical alpha =.001. CD should not be greater than 1.
- If outliers are detected, check each case, and consider removing the case from the analysis.
- See also Francis 5.1.4.2 Screening for influential case
- Normality of residuals
- Residuals are more likely to be normally distributed if each of the variables normally distributed
- Check histograms of all variables in an analysis
- Normally distributed variables will enhance the MLR solution
- See also
- Allen & Bennett 13.3.2.1 Assumptions (pp. 178-179)
- Francis 5.1.4 Practical Issues and Assumptions (pp. 126-128)