Exploratory factor analysis
From Wikiversity
| Home | Survey design |
Descr/ Graphs |
Correl- ation |
EFA | Psycho- metrics |
Qual. | MLR | ANOVA | Power | Effect size |
Review |
|
| Completion status: this resource is ~50% complete. |
| This page introduces the use of exploratory factor analysis particularly for the purposes of psychometric instrument development. |
[edit] Assumed knowledge
[edit] Purposes of factor analysis
There are two main purposes or applications of factor analysis:
- 1. Data reduction
Reducing data to a smaller set of summary variables e.g., psychological questionnaires often aim to measure several psychological constructs, with each construct measured using multiple items which can be combined in a smaller number of factor scores.
- 2. Exploring theoretical structure
Theoretical questions about the underlying structure of psychological phenomenon can be explored and empirically tested using factor analysis e.g., is intelligence better described as a single, general factor, or as consisting of multiple, independent dimensions?
[edit] History
Factor analysis was initially developed by Charles Spearman in 1904. For more information, see:
[edit] Assumptions
- Linear relations between variables. Test by visually examining all or some of the bivariate scatterplots.
- Factorability can be examined via:
- Inter-item correlations (correlation matrix) - are there at least several sizable correlations e.g., > .5?
- Anti-image correlation matrix diagonals - they should be > ~.5.
- Measures of sampling adequacy (MSAs):
- Kaiser-Myer-Olkin (KMO) (should be > ~.5) and
- Bartlett's test of sphericity (should be significant)
- Sample size:
- Ideally, there should be a ratio of > ~20:1 (cases per item)
- EFA can still be reasonably done with > ~5:1
- Bare min. for pilot study purposes, as low as 3:1.
For more information, see the lecture notes.
[edit] Types (methods of extraction)
There are two main types of extraction:
- Principal components (PC): Analyses all variance in the items, usually preferred when trying to reduce the items to some composite scores for subsequent analysis
- Principal axis factoring (PAF): Analyses shared variance amongst the items. Used more often for theoretical explorations of the underlying factor structure.
[edit] Rotation
There are two main types of factor rotation:
- Orthogonal (varimax): Factors are independent, i.e. no correlation between factors
- Oblique (oblimin): Factors are related, with some correlations e.g., over > .3
[edit] Determining the number of factors
There is no definitive, simple way to determine the number of factors. This is a subjective decision by the researcher, but the researcher should be guided by several considerations:
- Theory: e.g., How many factors were expected? For new factors, were they expected? Do the extracted factor make theoretical sense?
- Kaiser's criterion: Eigen-values over 1; but this is arbitrary. Use judgement too about how many factors are going to be extracted for the final model.
- Scree-plot: Plots eigen-values. Look for a notable drop; the rest is 'scree'. Extra the number of factors that form the 'cliff'. Again, use judgement about the meaning of the factors for final decisions.
- Interpretability: Are all factors interpretable? (especially the last one?) In other words, can you reasonably name and describe the items as indicative of an underlying factor?
- Have you tried several different models, with different numbers of factors? Before deciding on the final number of factors make sure to look at solutions for, say, 2, 3, 4, 5, 6 and 7 factors.
- Have you eliminated items which don't don't seem to belong? (this can change the structure/number of factors)? After you remove items which don't seem to be belong, then re-check whether you still have a clear factor structure. It may be that a different number of factors (probably one or two fewer) is now more appropriate.
- Are the factor correlations not too high (e.g., not over ~.7 - otherwise the factors may be too similar (and redundant)?)
- Run a parallel analysis
[edit] Criteria for selecting items
For a simple factor structure, consider each item with regard to:
- Communality (ideally, above .5)
- Primary (target) factor loading (should be above .5, preferably above .6)
- Item cross-loadings (should be a gap of at least ~.2 between primary and cross-loadings), with cross-loadings above .3 being worrisome
- Meaningful and useful membership to a factor (each item should make a meaningful (face validity) and useful (non-redundant) contribution to an identifiable factor)
- Reliability (removal of the item wouldn't improve Cronbach's alpha)
- See also: How do I eliminate items? (lecture notes)
[edit] Data analysis exercises
[edit] Pros & cons
- Advantages (Wikipedia)
- Disadvantages (Wikipedia)
[edit] Glossary
- Anti-image correlation matrix: Contains the negative partial covariances and correlations. Diagonals are used as a measure of sampling adequacy (MSA).
- Bartlett test of sphericity: Statistical test for the overall significance of all correlations within a correlation matrix. Used as a measure of sampling adequacy (MSA).
- Common factor: A factor on which two or more variables load.
- Common factor analysis: A statistical technique which uses the correlations between observed variables to estimate common factors and the structural relationships linking factors to observed variables.
- Common variance: Variance in a variable shared with common factors. Factor analysis assumes that a variable's variance is composed of three components: common, specific and error.
- Communality: The proportion of a variable's variance explained by a factor structure. Final communality estimates are the sum of squared loadings for a variable in an orthogonal factor matrix.
- Complex variable: A variable which loads on two or more factors.
- Correlation: The product-moment correlation coefficient.
- Correlation matrix: Table showing the inter-correlations among all variables.
- Data reduction: Reducing the number of cases or variables in a data matrix e.g., factor analysis can be used to replace a large collection of variables with a smaller number of factors.
- Eigenvalue: Column sum of squared loadings for a factor. It conceptually represents that amount of variance accounted for by a factor.
- Error variance: Unreliable and inexplicable variation in a variable. Error variance is assumed to be independent of common variance, and a component of the unique variance of a variable.
- Exploratory factor analysis: A factor analysis technique used to explore the underlying structure of a collection of observed variables.
- Factor: Linear combination of the original variables. Factors represent the underlying dimensions (constructs) that summarise or account for the original set of observed variables.
- Factor analysis: A statistical technique used to (1) estimate factors, or (2) reduce the dimensionality of a large number of variables to a fewer number of factors.
- Factor loading: Correlation between a variable and a factor, and the key to understanding the nature of a particular factor. Squared factor loadings indicate what percentage of the variance in an original variable is explained by a factor.
- Factor matrix: Table displaying the factor loadings of all variables on each factor. Factors are presented as columns and the variables are presented as rows.
- Factor rotation: A process of adjusting the factor axes to achieve a simpler and pragmatically more meaningful factor solution - the goal is a simple factor structure.
- Factor score: Composite measure created for each observation (case) on each factor extracted in the factor analysis. Factor weights are used in conjunction with the original variable values to calculate each observation's score. The factor scores are standardised to according to a z-score.
- Image of a variable: The component of a variable which is predicted from other variables. Antonym: anti-image of a variable.
- Indeterminacy: If it is impossible to estimate population factor structures exactly because an infinite number of factor structures can produce the same correlation matrix, then there are more unknowns than equations in the common factor model, and we say that the factor structure is indeterminate.
- Latent factor: A theoretical underlying factor hypothesised to influence a number of observed variables. Common factor analysis assumes latent variables are linearly related to observed variables.
- Measure of sampling adequacy (MSA): Measures calculated both for the entire correlation matrix and each individual variable evaluating the appropriateness of applying factor analysis.
- Oblique factor rotation: Factor rotation such that the extracted factors are correlated. Rather than arbitrarily constraining the factor rotation to an orthogonal (90 degree angle), the oblique solution identifies the extent to which each of the factors are correlated.
- Orthogonal factor rotation: Factor rotation such that their axes are maintained at 90 degrees. Each factor is independent of, or orthogonal to, all other factors. The correlation between the factors is determined to be zero.
- Parsimony principle: When two or more theories explain the data equally well, select the simplest theory e.g., if a 2-factor and a 3-factor model explain about the same amount of variance, interpret the 2-factor model.
- Principal axis factoring (PAF): A method of factor analysis in which the factors are based on a reduced correlation matrix using a priori communality estimates. That is, communalities are inserted in the diagonal of the correlation matrix, and the extracted factors are based only on the common variance, with specific and error variances excluded.
- Principal component analysis (PC or PCA): (1) The factors are based on the total variance. Unities (1s) are used in the diagonal of the correlation matrix; this procedure computationally implies that all the variance is common or shared. [1] (2) a method of factoring a correlation matrix directly, without estimating communalities. Linear combinations of variables are estimated which explain the maximum amount of variance in the variables. The first component accounts for the most variance in the variables. Then the second component accounts for the most variance in the variables residualised for the first component, and so on. [2]
- Scree plot: A graphical method for determining the number of factors. The eigenvalues are plotted in the sequence of the principal factors. The number of factors is chosen where the plot levels off to a linear decreasing pattern.
- Simple structure: A pattern of factor loading results such that each variable loads highly onto one and only one factor.
- Specific variance: (1) Variance of each variable unique to that variable and not explained or associated with other variables in the factor analysis. [3] (2) The component of unique variance which is reliable but not explained by common factors. [4]
- Unique variance: The proportion of a variable's variance that is not shared with a factor structure. Unique variance is composed of specific and error variance.
- Varimax: The most commonly used factor rotation method; an orthogonal rotation criterion which maximizes the variance of the squared elements in the columns of a factor matrix.
- Glossary sources
- Factor analysis glossary (richmond.edu)
- Factor analysis glossary (siu.edu)</noinclude></noinclude>
[edit] See also
- Wikiversity
- Lecture notes
- Data analysis tutorial
- Internal consistency
- Composite scores
- Psychometric instrument development
- Survey research and design in psychology
- Wikipedia & Wikibooks
- Factor analysis in psychometrics (Wikipedia)
- Principal component analysis (Wikipedia)
- Principal component analysis (Wikibooks)
[edit] External links
- Darlington, R. B., Factor analysis.
- Psychometric instrument development:Exploratory factor analysis (Lecture on ucspace.canberra.edu.au)
- Factor analysis links (del.icio.us)
- Factor analysis resources: Understanding & using factor analysis in psychology & the social sciences (Wilderdom)
- Open and free online course on exploratory data analysis (Carnegie Mellon University)