# Exploratory factor analysis/Assumptions

There are several requirements for a dataset to be suitable for factor analysis:

1. Normality: Statistical inference is improved if the variables are multivariate normal[1]
2. Linear relations between variables - Test by visually examining all or at least some of the bivariate scatterplots:
1. Is the relationship linear?
2. Are there bivariate outliers?
3. Is the spread about the line of best fit homoscedastic (even (or cigar-shaped) as opposed to fanning in or out))?
4. If there are a large number of variables (and bivariate scatterplots), then consider using Matrix Scatterplots to efficiently visualise relations amongst the sets of variables within each factor (e.g., a Matrix Scatterplot for the variables which belong to Factor 1, and another Matrix Scatterplot for the variables which belong to Factor 2 etc.)
3. Factorability is the assumption that there are at least some correlations amongst the variables so that coherent factors can be identified. Basically, there should be some degree of collinearity among the variables but not an extreme degree or singularity among the variables. Factorability can be examined via any of the following:
1. Inter-item correlations (correlation matrix) - are there at least several small-moderate sized correlations e.g., > .3?
2. Anti-image correlation matrix diagonals - they should be > ~.5.
3. Measures of sampling adequacy (MSAs):
• Kaiser-Meyer-Olkin (KMO) (should be > ~.5 or .6)[2] and
• Bartlett's test of sphericity (should be significant)
4. Sample size: The sample size should be large enough to yield reliable estimates of correlations among the variables:
1. Ideally, there should be a large ratio of N / k (Cases / Items) e.g., > ~20:1
1. e.g., if there are 20 items in the survey, ideally there would be at least 400 cases)
2. EFA can still be reasonably done with > ~5:1
3. Bare min. for pilot study purposes, as low as 3:1.

For more information, see these lecture notes.