Home Others Exploration factor analysis

Exploration factor analysis




Web pages




This page briefly describes the methods of Exploratory Factor Analysis (EFA) and provides an annotated list of resources.

[The following narrative is heavily related to James Neill (2013) and Tucker and MacCallum (1997) , but was distilled for Epi PhD students and junior researchers.]


Factor analysis is a 100 year old family of techniques used to identify the structure / dimensionality of observed data and uncover the underlying constructs that lead to observed phenomena. The techniques identify and examine clusters of correlated variables; these clusters are called factors or latent variables (see Figure 1). In statistical terms, factor analysis is a method of modeling the population covariance matrix of a set of variables using sample data. Factor analysis is used for theory development, psychometric tool development and data reduction.

Figure 1. Example of the factor structure of common psychiatric disorders. Common disorders appear to represent two latent dimensions, internalizing and externalizing disorders. Von Krueger, R. F., 1999, The Structure of Common Mental Disorders. Archive of General Psychiatry. 56: 921-926.

Factor analysis was developed in 1904 by the psychologist and statistician Charles Spearman (famous for Spearman's correlation coefficient) in his work on the underlying dimensions of intelligence. Until the introduction of statistical computing, its use was hampered by laborious hand calculations; technology has flourished since then.

There are two main types of factor analysis: exploratory and confirmatory. In exploratory factor analysis (EFA, focus on this resource page), each observed variable is potentially a measure of each factor, and the goal is to determine the strongest relationships (between observed variables and factors). Confirmatory Factor Analysis (CFA) postulates a simple factor structure, each variable can be a measure of only one factor, and the correlation structure of the data is tested against the hypothetical structure using goodness-of-fit tests. Figure 2 is a graphical representation of EFA and CFA.

Figure 2. EFA (left) and CFA (right). Adapted from Wall, M., September 20, 2012, Session 3 Guest Lecture in Epidemiology of Drug and Alcohol Problems, Hassin, D., Columbia University Mailman School of Public Health

There are different factor analysis techniques for different measurement and data scenarios:

  1. Observed variables are continuous, latent variables are assumed to be continuous

  2. Observed are continuous, latent are categorical

  3. Observed are categorical, latent are continuous

  4. Observed are categorical, latent are categorical

This resource page focuses on scenarios 1 and 3.

The following figures 3 and 4 illustrate some of the basic premises of measurement theory versus factor analysis:

  1. Factors or latent variables influence systematically observed variables (i.e. when we measure observed variables, those measurements / observations are at least in part caused by latent variables)

  2. Inter-individual differences (i.e., variance) in observed variables are due to latent variables and measurement errors

  3. Each type of factor (general, specific - see below) contributes to part of the variance in addition to the measurement error

Figure 3. Elements that affect the observed variables. Figure adapted from Tucker, LR and MacCallum, RC. 1997, Exploratory Factor Analysis: http://www.unc.edu/~rcm/book/factornew.htm

Figure 3 shows that three things affect the variables observed. Two are types of latent variables or factors. The first are common factors that result in more than one of the observed variables (e.g., math skills can result in an addition test result, a multiplication test result, and a division test result). The second is specific factors that result in only one of the observed variables (a common factor can become a specific factor if you remove all but one of the observed variables that they result in). The third factor that affects the observed variables is the measurement error, which is not latent but is often due to unsystematic events that affect the measurement. The measurement error is closely related to the reliability.

Each of the elements that affect the observed variables also contribute to the variance of those variables. Figure 4 shows that the variance of a particular observed variable is due in part to factors affecting other observed variables, factors affecting only the particular observed variable, and measurement errors. The common variance is sometimes called commonality, and the specific variance and error variance are often combined and called uniqueness.

Figure 4. Variance structure of observed variables. Figure from James Neill, 2013, Exploratory Factor Analysis, Lecture 5, Survey Research and Design in Psychology. http://www.slideshare.net/jtneill/exploratory-factor-analysis

The figure also shows a major difference between factor analysis andPrincipal component analysis. In principal component analysis, the goal is to take into account as much of the total variance of the observed variables as possible; Linear combinations of observed variables are used to create components. In factor analysis, the goal is to explain the covariance between variables; the observed variables are defined as linear combinations of the factors.

The main point is that factor analytical theory is about taking into account the covariation between observed variables. When observed variables are correlated with each other, factor analytical theory says that the correlation is at least partially due to the influence of common latent variables.


Factor analysis has the following assumptions, which can be explored in more detail in the resources linked below:

  1. Sample size (e.g. 20 observations per variable)

  2. Assessment level (e.g. the measurement / data scenarios mentioned above)

  3. normality

  4. Linearity

  5. Outliers (factor analysis is sensitive to outliers)

  6. Factorizability

Eigenvalues ​​and factor loadings

[Note: this matrix algebra review can help you understand what is going on with eigenvalues ​​and factor loadings under the hood, but is not strictly necessary to interpret the results of the factor analysis.]

Factors are extracted from correlation matrices by transforming such matrices by eigenvectors. An eigenvector of a square matrix is ​​a vector that, when pre-multiplied by the square matrix, yields a vector that is an integral multiple of the original vector. This integer multiple is an eigenvalue.

The eigenvalue represents the variance that each factor takes into account. Each extracted factor has an eigenvalue (the integer multiple of the original vector). The first extracted factor will try to absorb as much variance as possible so that consecutive eigenvalues ​​are lower than the first. Eigenvalues ​​above 1 are stable. The sum of all eigenvalues ​​is the number of observed variables in the model.

Abbildung 5. Scree-Plot, von James Neill, 2013, Exploratory Factor Analysis, Lecture 5, Survey Research and Design in Psychology. http://www.slideshare.net/jtneill/exploratory-factor-analysis

Each variable contributes to a variance of 1. Eigenvalues ​​are then assigned to the factors according to the declared amount of variance. Scree plots (Figure 5 below) are often output in factor analysis software and are line graphs of eigenvalues. They represent the amount of variance explained by each factor, and the cut-off is the number of factors just before the bend in the scree plot, e.g. For example, about 2 or 3 factors in Figure 5. Eigenvalues ​​and scree plots can help you determine many factors best fit your data.

Factor loadings are a matrix of the relationship between observed variables and the factors you specify. In geometrical terms, charges are the numerical coefficients that correspond to the directional paths that link common factors to observed variables. They form the basis for interpreting the latent variables. Higher loads mean that the observed variable is more closely related to the factor. As a rule of thumb, consider charges over 0.3.


Factors are rotated (literally in geometric space) to make interpretation easier. There are two types of rotation: orthogonal (vertical), in which factors must not be correlated with one another, and oblique, in which factors lie freely in the factor space and can be correlated with one another. Examples of orthogonal rotation include Varimax, Quartimax, and Equamax. Examples of an oblique twist are Oblimin, Promax, and Geomin. For information on choosing a rotation method, see the resources below.

After the rotation, the factors are rearranged to optimally traverse clusters of common variance so that the factors can be more easily interpreted. This is comparable to selecting a reference group in the regression. Figure 6 illustrates factor rotation using Varimax, but is for conceptual purposes only. Rotations take place under the hood of your software.

Figure 6. Example of an orthogonal varimax rotation. Observed variables related to wine characteristics. From Abdi, Hervé. http://www.utdallas.edu/~herve/Abdi-rotations-pretty.pdf

EFA with dichotomous items

A Pearson correlation matrix is ​​not suitable for categorical or dichotomous elements. Therefore, in order to perform EFA on such data, you need to build an appropriate correlation matrix called tetrachoric (for dichotomous elements) or polychoric (for other categorical elements). A tetrachoric correlation matrix is ​​the Pearson correlation derived from a 2 × 2 table under the assumption of bivariate normality. Polychoric generalizes this to an n x m table.

The idea illustrated in Figure 7 is that dichotomous items
underlying continuous constructs. Basically, in building a tetrachoric correlation matrix, you are estimating a model based on proportions that fall within each area in the lower right corner of Figure 7. The computer tries numerous thresholds and combinations.

Figure 7. Representation of the observed dichotomous variables (depressed yes / no) and a continuous latent construct. The lower corner shows how the latter is modeled by the former.

MPlus has been the gold standard for performing EFA on dichotomous items since spring 2013, but it can also be implemented in R. See resources below, especially the documentation for the Psych package.


Textbooks & Chapters

Methodical articles

Methodology (theory and background)

Methodical (applied)

Application item


Library (Psych)


#fast demo of exploratory factor analysis

Data (Harman)

head (Harman.Holzinger) # 9 × 9 correlation matrix of the cognitive performance tests, N = 696


pa<- fa(Harman.Holzinger, 4, fm=pa, rotate=varimax, SMC=FALSE)
print(pa, sort=TRUE)

#prints results, sort = TRUE shows loads according to absolute value. u ^ 2 is uniqueness and h ^ 2 is # reliability. See values ​​in? Fa for getting specific results

scree(Harman.Holzinger,factors=TRUE,pc=TRUE,main=Scree plot,hline=NULL,add=FALSE)

# creates a scree plot - a line diagram of eigenvalues. They represent the amount of variance, # explained by each factor, and the cut-off is the number of factors just before the curvature # in the scree plot, e.g. For example, about 2 or 3 factors in Figure 5. Eigenvalues ​​and scree plots can #guide # in determining how many factors best fit your data.
fa.diagram(pa, sort=TRUE, cut=.3, simple=TRUE, error=FALSE, digits=1, e.size=.05, rsize=0.15)

#A familiar looking graph of the relationship between factors and observed variables
#Code for dichotomous elements

your data<-read.csv(, header=TRUE, stringsAsFactors=FALSE)
dein.fa<-fa.poly(your.data, nfactors=3, n.obs = 184, n.iter=1, rotate=geominQ, scores=tenBerge, SMC=TRUE, symmetric=TRUE, warnings=TRUE, fm=wls,
alpha=.1, p =.05, oblique.scores=TRUE)

# the main difference here is the rotation (you need to choose an oblique method - geominQ is # closest to MPlus), the factoring method (weighted least squares or wls is #MPlus closest but not accurate), and scores = score tenBerge.

#if you want to create the tetrachoric correlation matrix yourself, use the polychor package

install.pakete (polycor)
Library (Polycor)



Interesting Articles

Editor'S Choice

The science of the flu shot
The science of the flu shot
Once in the air with the autumn cold, flu vaccinations are being given out in clinics and pharmacies across the country. Vaccination, while imperfect, is the most reliable way to avoid a potentially fatal infection. While many accept it as a seasonal inconvenience, the flu kills about 19,000 Americans in an average year. After the pioneering work of Hilary Koprowski,
Alumni publish 'Semper Fi' with Jai Courtney in the lead role
Alumni publish 'Semper Fi' with Jai Courtney in the lead role
Three Columbia alumni released Semper Fi earlier this month, a feature film starring Jai Courtney and distributed by Lionsgate. The film was produced by Alumna Karina Miller '04, co-written by Alumna Sean Mullin '06, and co-written and directed by Alumna Henry-Alex Rubin '95.
Columbia filmmakers make a splash at the Nashville Film Festival
Columbia filmmakers make a splash at the Nashville Film Festival
Current student Asad Farooqui and alumna Fany de la Chica '18 are represented at this year's Nashville Film Festival.
Review: 'The Emperor of All Diseases
Review: 'The Emperor of All Diseases'
It's difficult, if not impossible, to reach middle age without seeing the ravages of cancer up close.
Alexandra Carter
Alexandra Carter
As director of the Law School's Mediation Clinic, Alexandra Carter ’03 has been training students in various forms of alternative dispute resolution since 2008. Under her guidance, students learn negotiation strategies and advise clients in federal, state, and New York courts; Cases range from family business disputes to complaints filed with the U.S. Equal Employment Opportunity Commission. In 2016, Carter partnered with the United Nations Institute for Education and Research and their students are the exclusive providers of alternative dispute resolution classes for the United Nations Diplomatic Corps in New York. She is currently training judicial and administrative directors in New York state courts which will soon require most civil disputes to be resolved through alleged mediation rather than in public courts. In 2019, Columbia University honored Carter with the Presidential Award for Teaching for its innovative pedagogy and commitment to its students. Carter developed her passion for mediation and teaching as a student at the Law School's Mediation Clinic, led by Professor Carol Liebman, who became her mentor and role model. As a student, Carter won the Jane Marks Murphy Prize for Clinical Advocacy and the Lawrence S. Greenbaum Prize for Best Oral Argument in the 2002 Harlan Fiske Stone Moot Court Competition. Prior to enrolling in Law School, Carter was a private equity analyst with Goldman Sachs and Fulbright Fellow in Taiwan, where she researched contemporary literature to assess cross-strait political tensions. After Carter got her J.D. received, she worked on the U.S. District Court for the District of Massachusetts and then joined Cravath, Swaine & Moore as a litigator. She was retired to the academy by Liebman and other mentors from Columbia Law School. Carter's new take on negotiation is the subject of her upcoming general interest book Ask for More: 10 Questions to Negotiate Anything, which will be the main title published by Simon & Schuster in May 2020.
Ars Nova announces Melis Aker '18 as a resident of Play Group 2019
Ars Nova announces Melis Aker '18 as a resident of Play Group 2019
Alumna Melis Aker '18 joins Alumna Julia May Jonas '12 as a new member of the Ars Nova Plays Group 2019. The Play Group is a two-year residency where members become part of the Ars Nova resident artist community.
Clinic for Rehabilitation and Regenerative Medicine
Clinic for Rehabilitation and Regenerative Medicine
What are tendonitis and tendinitis? Tendons are strong strands of tissue that connect muscles to bones. Tendonitis is when a tendon is inflamed. It can hit any tendon in the body. When a tendon is inflamed, it can cause swelling, pain, and discomfort. Another problem called tenosynovitis is linked to tendonitis. This is the inflammation of the lining of the tendon sheath around a tendon. Usually the vagina is inflamed by itself, but both the vagina and tendon can be inflamed at the same time.