language-icon Old Web
English
Sign In

Factor analysis

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus 'error' terms. Factor analysis aims to find independent latent variables.In PCA, 1.00s are put in the diagonal meaning that all of the variance in the matrix is to be accounted for (including variance unique to each variable, variance common among variables, and error variance). That would, therefore, by definition, include all of the variance in the variables. In contrast, in EFA, the communalities are put in the diagonal meaning that only the variance shared with other variables is to be accounted for (excluding variance unique to each variable and error variance). That would, therefore, by definition, include only variance that is common among the variables. Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus 'error' terms. Factor analysis aims to find independent latent variables. It is a theory used in machine learning and related to data mining. The theory behind factor analytic methods is that the information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis is commonly used in biology, psychometrics, personality theories, marketing, product management, operations research, and finance. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a systematic inter-dependence and the objective is to find out the latent factors that create a commonality. Factor analysis is related to principal component analysis (PCA), but the two are not identical. There has been significant controversy in the field over differences between the two techniques (see section on exploratory factor analysis versus principal components analysis below). PCA can be considered as a more basic version of exploratory factor analysis (EFA) that was developed in the early days prior to the advent of high-speed computers. Both PCA and factor analysis aim to reduce the dimensionality of a set of data, but the approaches taken to do so are different for the two techniques. Factor analysis is clearly designed with the objective to identify certain unobservable factors from the observed variables, whereas PCA does not directly address this objective; at best, PCA provides an approximation to the required factors. From the point of view of exploratory analysis, the eigenvalues of PCA are inflated component loadings, i.e., contaminated with error variance. Suppose we have a set of p {displaystyle p} observable random variables, x 1 , … , x p {displaystyle x_{1},dots ,x_{p}} with means μ 1 , … , μ p {displaystyle mu _{1},dots ,mu _{p}} . Suppose for some unknown constants l i j {displaystyle l_{ij}} and k {displaystyle k} unobserved random variables F j {displaystyle F_{j}} (called 'common factors,' because they influence all the observed random variables), where i ∈ 1 , … , p {displaystyle iin {1,dots ,p}} and j ∈ 1 , … , k , {displaystyle jin {1,dots ,k},} where k < p {displaystyle k<p} , we have Here, the ε i {displaystyle varepsilon _{i}} are unobserved stochastic error terms with zero mean and finite variance, which may not be the same for all i {displaystyle i} .

[ "Social psychology", "Statistics", "Econometrics", "Machine learning" ]
Parent Topic
Child Topic
    No Parent Topic