What is the difference between pca and factor analysis




















Factor analysis splits total variance of the p input variables into two uncorrelated nonoverlapping parts: communality part m -dimensional, where m common factors rule and uniqueness part p -dimensional, where errors are, also called unique factors, mutually uncorrelated. So pardon for not showing the true factor of our data on a scatterplot here.

It could be visualized quite adequately via vectors in "subject space" as done here without showing data points. Above, in the section "The idea of common FA latent feature " I displayed factor axis F as wedge in order to warn that true factor axis does not lie on the plane V1 V2. That means that - in contrast to principal component P1 - factor F as axis is not a rotation of axis V1 or V2 in their space, and F as variable is not a linear combination of variables V1 and V2.

Therefore F is modeled extracted from variables V1 v2 as if an outer, independent variable, not a derivation of them. Equations like Eq. Not only true factor is not a function of the manifest variables, true factor's values are not uniquely defined. In other words, they are simply unknown. That all is due to the fact that we're in the excessive 5D analytic space and not in our home 2D space of the data.

Only good approximations a number of methods exist to true factor values, called factor scores , are there for us. Factor scores do lie in the plane V1 V2, like principal component scores are, they are computed as the linear functions of V1, V2, too, and it were they that I plotted in the section "FA: approximate solution factor scores ".

Principal component scores are true component values; factor scores are only reasonable approximation to the indetermined true factor values. To gather in one small clot what the two previous sections said, and add final strokes.

Actually, FA can if you do it right, and see also data assumptions find the true factor solution by "true" I mean here optimal for the data sample. However, various methods of extraction exist they differ in some secondary constraints they put.

Thus, loadings are of optimal, true factors. Factor scores - if you need them - are computable out of those loadings in various ways and return approximations to factor values. Thus, "factor solution" displayed by me in section "FA: approximate solution factor scores " was based actually on optimal loadings, i.

But the scores were not optimal, by destiny. The scores are computed to be a linear function of the observed variables, like component scores are, so they both could be compared on a scatterplot and I did it in didactic pursuit to show like a gradual pass from the PCA idea towards FA idea.

One must be wary when plotting on the same biplot factor loadings with factor scores in the "space of factors", be conscious that loadings pertain to true factors while scores pertain to surrogate factors see my comments to this answer in this thread.

Rotation of factors loadings help interpret the latent features. PCA tends to converge in results with FA as the number of variables grow see the extremely rich thread on practical and conceptual similarities and differences between the two methods.

There is a considerable number of good links to other participants' answers on the topic outside this thread; I'm sorry I only used few of them in the current answer. There there was later a nice question , asking:. My comment, which is based directly on the current answer, would be: "Both in FA model and PCA model a variable is a linear composite of factors latents.

However, only in PCA the backward direction is also true. In FA, the backward is true only for factor scores but not for the true factors". There are numerous suggested definitions on the web. Here is one from a on-line glossary on statistical learning :. Constructing new features which are the principal components of a data set. The principal components are random variables of maximal variance constructed from linear combinations of the input features.

Equivalently, they are the projections onto the principal component axes, which are lines that minimize the average squared distance to each point in the data set. To ensure uniqueness, all of the principal component axes must be orthogonal. PCA is a maximum-likelihood technique for linear regression in the presence of Gaussian noise on both inputs and outputs. A generalization of PCA which is based explicitly on maximum-likelihood. Like PCA, each data point is assumed to arise from sampling a point in a subspace and then perturbing it with full-dimensional Gaussian noise.

The difference is that factor analysis allows the noise to have an arbitrary diagonal covariance matrix, while PCA assumes the noise is spherical. In addition to estimating the subspace, factor analysis estimates the noise covariance matrix.

You are right about your first point, although in FA you generally work with both uniqueness and communality. I don't quite follow your points, though. Rotation of principal axes can be applied whatever the method is used to construct latent factors. In fact, most of the times this is the VARIMAX rotation orthogonal rotation, considering uncorrelated factors that is used, for practical reasons easiest interpretation, easiest scoring rules or interpretation of factor scores, etc. PROMAX might probably better reflect the reality latent constructs are often correlated with each other , at least in the tradition of FA where you assume that a latent construct is really at the heart of the observed inter-correlations between your variables.

From a psychometrical perspectice, FA models are to be preferred since they explicitly account for measurement errors, while PCA doesn't care about that. Briefly stated, using PCA you are expressing each component factor as a linear combination of the variables, whereas in FA these are the variables that are expressed as linear combinations of the factors including communalities and uniqueness components, as you said.

The top answer in this thread suggests that PCA is more of a dimensionality reduction technique, whereas FA is more of a latent variable technique. This is sensu stricto correct. But many answers here and many treatments elsewhere present PCA and FA as two completely different methods, with dissimilar if not opposite goals, methods and outcomes.

I disagree; I believe that when PCA is taken to be a latent variable technique, it is quite close to FA, and they should better be seen as very similar methods. Also, can PCA be a substitute for factor analysis? There I argue that for simple mathematical reasons the outcome of PCA and FA can be expected to be quite similar, given only that the number of variables is not very small perhaps over a dozen.

See my [long! Here I would like to show it on an example. Here is how the correlation matrix looks like:. There are small deviations here and there, but the general picture is almost identical, and all the loadings are very similar and point in the same directions.

This is exactly what was expected from the theory and is no surprise; still, it is instructive to observe. For a much prettier PCA biplot of the same dataset, see this answer by vqv.

Factor loadings were computed by an "iterated principal factors" algorithm until convergence 9 iterations , with communalities initialized with partial correlations. Once the loadings converged, the scores were calculated using Bartlett's method.

This yields standardized scores; I scaled them up by the respective factor variances given by loadings lengths. In this respect it is a statistical technique which does not apply to principal component analysis which is a purely mathematical transformation.

One of the biggest reasons for the confusion between the two has to do with the fact that one of the factor extraction methods in Factor Analysis is called "method of principal components". However, it's one thing to use PCA and another thing to use the method of principal components in FA. The names may be similar, but there are significant differences. The former is an independent analytical method while the latter is merely a tool for factor extraction. Recently, I had the pleasure of analysing a scale through factor analysis.

This scale although it's widely used in industry was developed by using PCA, and to my knowledge had never been factor analysed. PCA just transforms the data into a new combination and doesn't care about communalities.

My conclusion was that the scale was not a very good one from a psychometric point of view, and I've confirmed this with a different sample. Essentially, if you want to predict using the factors, use PCA, while if you want to understand the latent factors, use Factor Analysis. A quote from a really nice textbook Brown, , pp. Although related to EFA, principal components analysis PCA is frequently miscategorized as an estimation method of common factor analysis.

Unlike the estimators discussed in the preceding paragraph ML, PF , PCA relies on a different set of quantitative methods that are not based on the common factor model. PCA does not differentiate common and unique variance. Rather, PCA aims to account for the variance in the observed measures rather than explain the correlations among them. Thus, PCA is more appropriately used as a data reduction technique to reduce a larger set of measures to a smaller, more manageable number of composite variables to use in subsequent analyses.

However, some methodologists have argued that PCA is a reasonable or perhaps superior alternative to EFA, in view of the fact that PCA possesses several desirable statistical properties e. Although debate on this issue continues, Fabrigar et al. These authors underscore the situations where EFA and PCA produce dissimilar results; for instance, when communalities are low or when there are only a few indicators of a given factor cf.

Widaman, Regardless, if the overriding rationale and empirical objectives of an analysis are in accord with the common factor model, then it is conceptually and mathematically inconsistent to conduct PCA; that is, EFA is more appropriate if the stated objective is to reproduce the intercorrelations of a set of indicators with a smaller number of latent dimensions, recognizing the existence of measurement error in the observed measures.

This is a noteworthy consideration in light of the fact that EFA is often used as a precursor to CFA in scale development and construct validation. A detailed demonstration of the computational differences between PCA and EFA can be found in multivariate and factor analytic textbooks e. Brown, T. Confirmatory factor analysis for applied research. New York: Guilford Press. Here's a simulation function to demonstrate this in R:. By default, this function performs Iterations , in each of which it produces random, normally distributed samples Sample.

It outputs a list of two Iterations -long vectors composed of the mean magnitudes of the simulated variables' loadings on the unrotated first component from PCA and general factor from EFA, respectively. It allows you to play around with sample size and number of variables and factors to suit your situation, within the limits of the principal and factanal functions and your computer. Using this code, I've simulated samples of 3— variables with iterations each to produce data:.

This demonstrates how differently one has to interpret the strength of loadings in PCA vs. Both depend somewhat on number of variables, but loadings are biased upward much more strongly in PCA. However, note that mean loadings will usually be higher in real applications, because one generally uses these methods on more correlated variables.

I'm not sure how this might affect the difference of mean loadings. One can think of a PCA as being like a FA in which the communalities are assumed to equal 1 for all variables.

Take Me to The Video! Comments Not a good explanation. Hi Rashidul, We try here to help people understand the concepts and meanings without getting much into the math. Very good explanation to use for people who are not statistically sophisticated. Hi, Very nice graphical explanation! Can you please tell me how I can cite the graphs? Many thanks, Vasilis.

The interpretation appears to be quite comprehensive. This is the number one link when you google Principal Components vs Factor Analysis. I have been struggling to get the difference between these two methods but now i got it clearly.

Very good explanation. I going to use to explain my students. Thanks a lot! Leave a Reply Cancel reply Your email address will not be published. The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor. Close Privacy Overview This website uses cookies to improve your experience while you navigate through the website.

Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website.

These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience. Necessary Necessary. Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information. Non-necessary Non-necessary.

In our world, where we make decisions based on this structured data, there is a potential danger of having too much information and even the risk of losing some important information. With only one predictor, we will not have a reliable and accurate model; hence, we add more variables to our model, say, marketing spend, costs to procure the goods, product categories, segments.

Now, as we add more features to the model, especially when we create dummy variables for the respective categorical features, the amount of data grows exponentially. It means that the data becomes scattered and more widespread. When the dimensions increase, the volume of the features space increases such that the already available data becomes sparse.

Essentially, we would not know how the data is spread across in the feature space. Bellman coined the term. The curse of dimensionality is caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. The implication of this is that we would need more training data points to predict, which will eventually lead to overfitting of the model.

This puts the model at the risk of having variance errors, meaning the model may fail to predict the new unseen data. This is very common in applications such as image processing. Additionally, when we take all the variables to build the model, there is another challenge that there may be multicollinearity present. The presence of multicollinearity can also lead to overfitting as there are more insignificant variables in the data.

To treat multicollinearity, we can drop some of the variables, but that also comes at a cost! As each feature contains some data, some value associated with it so by removing the variables, we will lose the respective information contained in that feature.

Hence, there are powerful techniques available to deal with these challenges. These are known as Dimensionality Reduction techniques. The way forward is instead of dropping the variables, and we engineer new composite dimensions to represent the original features and replace those features. Dimensionality reduction reduces features from high dimensional features space to lower dimensions to reduce the scatteredness or sparsity of the features space without impacting the total information content.

The two prevalent methods are Principal Component Analysis PCA and Factor Analysis , which help us overcome the curse of dimensionality while minimizing the loss of information. Principal Component Analysis PCA is the technique that removes dependency or redundancy in the data by dropping those features that contain the same information as given by other attributes.

The approach of PCA to reduce the unnecessary features, which are present in the data, is by creating or deriving new dimensions or also referred to as components.

These components are a linear combination of the original variables. This way, PCA converts a larger number of correlated variables i. A principal component of a data set is the direction with the largest variance. Technically, PCA does this by rotation of the axes of each of the variables. The axes are rotated so that it absorbs all the information or the spread available in the variable. So, now each of the axes is a new dimension or the principal component.

The component is defined as the direction of the dataset explaining the highest variance, which is implied by the eigenvalue of that component. The rotation of the axis is graphically depicted. Factor Analysis, the other technique to reduce the data, works fundamentally differently from PCA.

We perform factor analysis also to decrease the larger number of attributes into a smaller set of factors. When analyzing data with many predictors, some of the features may have a common theme amongst themselves.

The features that have similar meaning underneath could be influencing the target variable by sharing this causation, and hence such features are combined into a factor. Thus, a factor or latent is a common or underlying element with which several other variables are correlated. Also, these latent variables or latent constructs are not directly observable and hence are not measurable by themselves with a single variable.



0コメント

  • 1000 / 1000