# Box's M

Type classification: this is a notes resource. |

In a multivariate normal distribution, each variable has a normal distribution, and the variables are correlated with each other.

Box's M tests the homogeneity of variance-covariance matrices by comparing the variance-covariance matrices for each group and dependent variable (DV). For example, if you have five DVs, it tests for five correlations and ten covariances for equality across the groups. So the more DVs, the higher the likelihood of non-equality of variances across the groups.

Box's *M* tests "the assumption ... that the vector of the dependent variables follow a multivariate normal distribution, and the variance-covariance matrices are equal across the cells formed by the between-subjects effects." (SPSS 14 Help - Tutorial)

- Homogeneity of Covariance Matrices

"MANOVA makes the assumption that the within-group covariance matrices are equal. If the design is balanced so that there is an equal number of observations in each cell, the robustness of the MANOVA tests is guaranteed. If the design is unbalanced, you should test the equality of covariance matrices using Box's *M* test. If this test is significant at less than 0.001, there may be severe distortion in the alpha levels of the tests. You should only use Pillai's trace criterion in this situation." from http://rimarcik.com/en/navigator/manova.html

Box's *M* is highly sensitive, so unless *p* < .001 and your sample sizes are unequal, ignore it. However, if significant and you have unequal sample sizes, the MANOVA is not robust (Tabachnick & Fidell, 2001).

If there are many DVs and great discrepancy between cell sample sizes, then there is more potential for distortion of the alpha levels. Look at the sample sizes and sizes of the variances and covariances for the cells:

- If cells with larger samples have larger variances and covariances, then the alpha level is conservative and the null hypothesis can be rejected confidently.
- If cells with smaller numbers produce larger variances then beware - the significant test is too liberal. If there is a non-significant result, the null hypothesis can be confidently retained, but significant results are suspect.

Use Pillai's criterion instead of Wilk's lambda or if there is a large *n*, randomly delete cases from the sample to equalise the numbers in each group, assuming power can be maintained at a sensible level. SPSS provides Pillai's trace just above Wilks's Lambda.

Overall, it seems that using Pillai's is the most sensible and conservative thing to do.[1]