| Peer-Reviewed

Comparison Between Principal Components and Factor Analysis for Different Data

Received: 21 February 2022    Accepted: 21 March 2022    Published: 29 December 2022
Views:       Downloads:
Abstract

Factor analysis (FA) is similar to the principal component analysis (PCA), but not the same. PCA can be considered as a basic of FA. PCA and FA aim to reduce dimension of a data, but the techniques are different. FA is clearly designed to identify the latent factors from the observed variables, PCA does not directly apply this aim. Eigenvalues of PCA are dispersed component loadings, with variance errors. FA assumes that the covariation of observed variables is due to the presence of latent variables. In contrast, PCA not depends on such causal relationship. If the factor model is incorrectly, then FA will give error results. PCA employs a transformation of the original data with no assumptions about the covariance matrix. PCA is used to determine linear combinations of the original variables and summarize the data set without losing information. For these reasons, we compared practically between FA and PCA using three different types of data. One of them is simulated data, and others are real data. R program is used for analysis the data, using suitable different packages and functions. Results are presented graphically and tabulated for the purposes of comparison. An obtained results interested for each data with three criteria: FA criterion is used to specify whereas a two factors are sufficient or not; the SS loadings specified the factor is worth keeping; the observed correlations between all original variables high or low; the Cattell's scree test, says to drop all further components after starting at the elbow. The PCA criterion, is used to determine the variable's importance, which have high Eigenvalue. The VRPC criterion is used to determine the variables tend to suitable factor.

Published in International Journal of Statistical Distributions and Applications (Volume 8, Issue 4)
DOI 10.11648/j.ijsd.20220804.11
Page(s) 65-79
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Principal Component Analysis, Factor Analysis, Rotation Process, Scree Test, Varimax Rotation, Eigenvalues

References
[1] Abdi, H. and Williams, J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics. 2 (4): pp. 433–459. doi: 10.1002/wics.101.
[2] Bandalos, L. and Boehm-Kaufman, R. (2008). Four common misconceptions in exploratory factor analysis. In Lance, Charles E.; Vandenberg, Robert J. (eds.). Statistical and Methodological Myths and Urban Legends: Doctrine, Verity and Fable in the Organizational and Social Sciences. Taylor & Francis. pp. 61–87. ISBN 978-0-8058-6237-9.
[3] Bouwmans, T. and Zahzah, E. (2014). Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveillance. Computer Vision and Image Understanding 122: pp. 22–34. doi: 10.1016/j.cviu.2013.11.009.
[4] Brown, D. (2009). Principal components analysis and exploratory factor analysis – Definitions, differences and choices. Shiken: JALT Testing & Evaluation SIG Newsletter.
[5] Cattell, R. B. (1978). Use of factor analysis in behavioral and life sciences. New York: Plenum.
[6] Chachlakis, G., Prater-Bennette, A. and Markopoulos, P. (2019). L1-norm Tucker Tensor Decomposition. IEEE Access. 7: pp. 178454–178465. doi: 10.1109/ACCESS.2019.2955134.
[7] Courtney, R. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. Practical Assessment, Research and Evaluation, 18 (8). Available online: http://pareonline.net/getvn.asp?v=18&n=8
[8] Emmanuel, J., Xiaodong, L., Yi, M. and John, W. (2011). Robust principal component analysis". Journal of the ACM. 58 (3): doi: 10.1145/1970392.1970395.
[9] Fabrigar et al. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods.
[10] Forkman J., Josse, J. and Piepho, P. (2019). Hypothesis tests for principal component analysis when variables are standardized. Journal of Agricultural, Biological, and Environmental Statistics. 24 (2): pp. 289–308. doi: 10.1007/s13253-019-00355-5.
[11] Garrido, L., Abad, J. and Ponsoda, V. (2012). A new look at Horn's parallel analysis with ordinal variables. Psychological Methods. Advance. online publication. doi: 10.1037/a0030005.
[12] Giorgia, P. (2017). Principal component analysis for stock portfolio management. International Journal of Pure and Applied Mathematics 115 (1).
[13] Guan, Y. and Dy, J. (2009). Sparse probabilistic principal component analysis. Journal of Machine Learning Research Workshop and Conference Proceedings 5: p. 185.
[14] Journee, M., Nesterov, Y., Richtarik, P. and Sepulchre, R. (2010). Generalized power method for Sparse principal component analysis. Journal of Machine Learning Research. 11: pp. 517–553.
[15] Kaiser, F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement. 20 (1): pp. 141–151. doi: 10.1177/001316446002000116.
[16] Larsen, R. and Warne, T. (2010). Estimating confidence intervals for Eigenvalues in exploratory factor analysis. Behavior Research Methods. 42 (3): pp. 871–876. doi: 10.3758/BRM.42.3.871.
[17] Ledesma, R. and Valero-Mora, P. (2007). Determining the number of factors to retain in EFA: An easy-to-use computer program for carrying out Parallel Analysis. Practical Assessment Research & Evaluation. 12 (2): pp. 1–11.
[18] Liao, T, Jombart, S., Devillard, F. and Balloux (2010). Discriminant analysis of principal components: a new method for the analysis of genetically structured populations" doi: 10.1186/1471-2156-11-94.
[19] Markopoulos, P., Kundu, S., Chamadia, S. and Pados, A. (2017). Efficient L1-Norm principal-component analysis via bit flipping". IEEE Transactions on Signal Processing. 65 (16): pp. 4252-4264. doi: 10.1109/TSP.2017.2708023.
[20] Miranda, A., Borgne, Y. and Bontempi, G. (2008). New routes from minimal approximation error to principal components, neural processing letters, Springer 27.
[21] Moghaddam, B., Weiss, Y. and Avidan, S. (2005). Spectral bounds for Sparse PCA: Exact and greedy algorithms. Advances in Neural Information Processing Systems 18. MIT Press.
[22] Ren, B., Pueyo, L., Zhu, B. and Duchêne, G. (2018). Non-negative matrix factorization: Robust extraction of extended structures". The Astrophysical Journal. 852 (2): p. 104. doi: 10.3847/1538-4357/aaa1f2.
[23] Ritter, N. (2012). A comparison of distribution-free and non-distribution free methods in factor analysis. Paper presented at Southwestern Educational Research Association (SERA) Conference 2012, New Orleans, LA (ED529153).
[24] Ruscio, J. and Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment. 24 (2): pp. 282–292. doi: 10.1037/a0025697. PMID 21966933.
[25] Suhr, D. (2009). Principal component analysis vs. exploratory factor analysis. SUGI 30 Proceedings.
[26] Tran, S., and Formann, K. (2009). Performance of parallel analysis in retrieving unidimensionality in the presence of binary data. Educational and Psychological Measurement, 69, pp. 50-61.
[27] Warne, T. and Larsen, R. (2014). Evaluating a proposed modification of the Guttman rule for determining the number of factors in an exploratory factor analysis. Psychological Test and Assessment Modeling. 56: pp. 104–123.
[28] Zou, H. and Xue, L. (2018). A selective overview of Sparse principal component analysis: pp. 1311–1320. doi: 10.1109/JPROC.2018.2846588.
Cite This Article
  • APA Style

    Ahmed Mohamed Mohamed Elsayed. (2022). Comparison Between Principal Components and Factor Analysis for Different Data. International Journal of Statistical Distributions and Applications, 8(4), 65-79. https://doi.org/10.11648/j.ijsd.20220804.11

    Copy | Download

    ACS Style

    Ahmed Mohamed Mohamed Elsayed. Comparison Between Principal Components and Factor Analysis for Different Data. Int. J. Stat. Distrib. Appl. 2022, 8(4), 65-79. doi: 10.11648/j.ijsd.20220804.11

    Copy | Download

    AMA Style

    Ahmed Mohamed Mohamed Elsayed. Comparison Between Principal Components and Factor Analysis for Different Data. Int J Stat Distrib Appl. 2022;8(4):65-79. doi: 10.11648/j.ijsd.20220804.11

    Copy | Download

  • @article{10.11648/j.ijsd.20220804.11,
      author = {Ahmed Mohamed Mohamed Elsayed},
      title = {Comparison Between Principal Components and Factor Analysis for Different Data},
      journal = {International Journal of Statistical Distributions and Applications},
      volume = {8},
      number = {4},
      pages = {65-79},
      doi = {10.11648/j.ijsd.20220804.11},
      url = {https://doi.org/10.11648/j.ijsd.20220804.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsd.20220804.11},
      abstract = {Factor analysis (FA) is similar to the principal component analysis (PCA), but not the same. PCA can be considered as a basic of FA. PCA and FA aim to reduce dimension of a data, but the techniques are different. FA is clearly designed to identify the latent factors from the observed variables, PCA does not directly apply this aim. Eigenvalues of PCA are dispersed component loadings, with variance errors. FA assumes that the covariation of observed variables is due to the presence of latent variables. In contrast, PCA not depends on such causal relationship. If the factor model is incorrectly, then FA will give error results. PCA employs a transformation of the original data with no assumptions about the covariance matrix. PCA is used to determine linear combinations of the original variables and summarize the data set without losing information. For these reasons, we compared practically between FA and PCA using three different types of data. One of them is simulated data, and others are real data. R program is used for analysis the data, using suitable different packages and functions. Results are presented graphically and tabulated for the purposes of comparison. An obtained results interested for each data with three criteria: FA criterion is used to specify whereas a two factors are sufficient or not; the SS loadings specified the factor is worth keeping; the observed correlations between all original variables high or low; the Cattell's scree test, says to drop all further components after starting at the elbow. The PCA criterion, is used to determine the variable's importance, which have high Eigenvalue. The VRPC criterion is used to determine the variables tend to suitable factor.},
     year = {2022}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Comparison Between Principal Components and Factor Analysis for Different Data
    AU  - Ahmed Mohamed Mohamed Elsayed
    Y1  - 2022/12/29
    PY  - 2022
    N1  - https://doi.org/10.11648/j.ijsd.20220804.11
    DO  - 10.11648/j.ijsd.20220804.11
    T2  - International Journal of Statistical Distributions and Applications
    JF  - International Journal of Statistical Distributions and Applications
    JO  - International Journal of Statistical Distributions and Applications
    SP  - 65
    EP  - 79
    PB  - Science Publishing Group
    SN  - 2472-3509
    UR  - https://doi.org/10.11648/j.ijsd.20220804.11
    AB  - Factor analysis (FA) is similar to the principal component analysis (PCA), but not the same. PCA can be considered as a basic of FA. PCA and FA aim to reduce dimension of a data, but the techniques are different. FA is clearly designed to identify the latent factors from the observed variables, PCA does not directly apply this aim. Eigenvalues of PCA are dispersed component loadings, with variance errors. FA assumes that the covariation of observed variables is due to the presence of latent variables. In contrast, PCA not depends on such causal relationship. If the factor model is incorrectly, then FA will give error results. PCA employs a transformation of the original data with no assumptions about the covariance matrix. PCA is used to determine linear combinations of the original variables and summarize the data set without losing information. For these reasons, we compared practically between FA and PCA using three different types of data. One of them is simulated data, and others are real data. R program is used for analysis the data, using suitable different packages and functions. Results are presented graphically and tabulated for the purposes of comparison. An obtained results interested for each data with three criteria: FA criterion is used to specify whereas a two factors are sufficient or not; the SS loadings specified the factor is worth keeping; the observed correlations between all original variables high or low; the Cattell's scree test, says to drop all further components after starting at the elbow. The PCA criterion, is used to determine the variable's importance, which have high Eigenvalue. The VRPC criterion is used to determine the variables tend to suitable factor.
    VL  - 8
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Department of Basic Science, Al-Obour High Institute for Management, Computers and Informatics, Obour City, Egypt

  • Sections