Main Article Content
Abstract
We consider a p-dimensional, centered normal population such that all variables have a positive variance σ2 and any correlation coefficient between different variables is a given nonnegative constant ρ < 1. Suppose that both the sample size n and population dimension p tend to infinity with p/n → c > 0. We prove that the limiting spectral distribution of a sample correlation matrix is the Marcenko-Pastur distribution of index c and scale parameter 1 − ρ. By the limiting spectral distributions, we rigorously show the limiting behavior of widespread stopping rules Guttman-Kaiser criterion and cumulative-percentage-of-variation rule in PCA and
EFA. As a result, we establish the following dichotomous behavior of Guttman-Kaiser criterion when both n and p are large, but p/n is small: (1) the criterion retains a small number of variables for ρ > 0, as suggested by Kaiser, Humphreys, and Tucker [Kaiser, H. F. (1992). On Cliff’s formula, the Kaiser-Guttman rule and the number of factors. Percept. Mot. Ski. 74]; and (2) the criterion retains p/2 variables for ρ = 0, as in a simulation study [Yeomans, K. A. and Golder, P. A. (1982). The Guttman-Kaiser criterion as a predictor of the number of common factors. J. Royal Stat. Soc. Series D. 31(3)].
Keywords
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
- Akama, Y., “Correlation matrix of equi-correlated normal population: fluctuation of the largest eigenvalue, scaling of the bulk eigenvalues, and stock market”, Preprint, (2022).
- Anderson, G. W., Guionnet, A. and Zeitouni, O., An introduction to random matrices, Cambridge University Press, Cambridge, 2010.
- Bai, Z. and Zhou, W., “Large sample covariance matrices without independence structures in columns”, Stat. Sin. 18(2) (2008), 425–442.
- Bai, Z. D., “Methodologies in spectral analysis of large dimensional random matrices, a review”, Stat. Sin. 9(3) (1999), 611–662.
- Bai, Z. D. and Yin, Y. Q., “Limit of the smallest eigenvalue of a large dimensional sample covariance matrix”, Ann. Probab. 21(3) (1993), 1275–1294.
- Bryson, J., Vershynin, R. and Zhao, H., “Marchenko–Pastur law with relaxed independence conditions”, Random Matrices Theory Appl (2021), 2150040.
- Chatterjee, S. and Hadi, A. S., Regression analysis by example, 4th ed. John Wiley & Sons, Hoboken, 2006.
- Dahirel, V., Shekhar, K., Pereyra, F., Miura, T., Artyomov, M., Talsania, S., Allen, T. M., Altfeld, M., Carrington, M., Irvine, D. J. et al., “Coordinate linkage of HIV evolution reveals regions of immunological vulnerability”, Proc. Natl. Acad. Sci. U.S.A. 108(28) (2011), 11530–11535.
- Embrechts, P. and Hofert, M., “A note on generalized inverses”, Math. Oper. Res. 77(3) (2013), 423–432.
- Engle, R. and Kelly, B., “Dynamic equicorrelation”, J. Bus. Econ. Stat. 30(2) (2012), 212–228.
- Fabrigar, L. R. and Wegener, D. T., Exploratory factor analysis, Oxford University Press, UK, 2011.
- Fan, J. and Jiang, T., “Largest entries of sample correlation matrices from equi-correlated normal populations”, Ann. Probab. 47(5) (2019), 3321–3374.
- Glosten, L. R., Jagannathan, R. and Runkle, D. E., “On the relation between the expected value and the volatility of the nominal excess return on stocks”, J. Finance 48(5) (1993), 1779–1801.
- Gotze, F. and Tikhomirov, A., “Rate of convergence in probability to the Marchenko-Pastur law”, Bernoulli 10(3) (2004), 503–548.
- Gotze, F. and Tikhomirov, A., “The rate of convergence of spectra of sample covariance matrices”, Theory Probab. Its Appl. 54(1) (2010), 129–140.
- Guttman, L., “Some necessary conditions for common-factor analysis”, Psychometrika 19(2) (1954), 149-161.
- Halabi, N., Rivoire, O., Leibler, S. and Ranganathan, R., “Protein sectors: evolutionary units of three-dimensional structure”, Cell 138(4) (2009), 774–786.
- Harman, H. H., Modern factor analysis, 3rd ed. University of Chicago press, Chicago, 1976.
- Huber, P. J. and Ronchetti, E. M., Robust statistics, 2nd ed. vol. 523. John Wiley & Sons, Hoboken, 2004.
- Husnaqilati, A., “Limiting spectral distribution of random matrices from equi-correlated normal population”, Preprint, (2022).
- Jackson, J. E., A user’s guide to principal components, vol. 587. John Wiley & Sons, Hoboken, 1991.
- Jiang, T., “The Limiting Distributions of Eigenvalues of Sample Correlation Matrices”, Sankhya: The Indian Journal of Statistics (2003-2007) 66(1) (2004), 35–48.
- Jolliffe, I. T., Principal component analysis, 2nd ed. Springer, New York, 2002.
- Kaiser, H. F., “The application of electronic computers to factor analysis”, Educ. Psychol. Meas. 20(1) (1960), 141–151.
- Kaiser, H. F., “A measure of the average intercorrelation”, Educ. Psychol. Meas. 28(2) (1968), 245–247.
- Kaiser, H. F., “On Cliff’s formula, the Kaiser-Guttman rule, and the number of factors”, Percept. Mot. Ski. 74(2) (1992), 595–598.
- Laloux, L., Cizeau, P., Potters, M. and Bouchaud, J., “Random matrix theory and financial correlations”, Int. J. Theor. Appl. Finance. 3(3) (2000), 391–397.
- Luxemburg, W. A. J., “Arzela’s dominated convergence theorem for the Riemann integral”, Am. Math. Mon. 78(9) (1971), 970–979.
- Marcenko, V. A. and Pastur, L. A., “Distribution of eigenvalues for some sets of random matrices”, Mat USSR-Sborn 1(4) (1967), 457–483.
- Morales-Jimenez, D., Johnstone, I. M., McKay, M. R. and Yang, J., “Asymptotics of eigen-structure of sample correlation matrices for high-dimensional spiked models”, Stat. Sin. 31(2) (2021), 571–601.
- Mulaik, S. A., Foundations of factor analysis, 2nd ed. CRC press, Boca Raton, 2010.
- Parzen, E., “Quantile functions, convergence in quantile, and extreme value distribution theory”, Technical Report No. B-3, Texas A & M University, Institute of Statistics (1980).
- Peres-Neto, P. R., Jackson, D. A. and Somers, K. M., “How many principal components? Stopping rules for determining the number of non-trivial axes revisited”, Comput. Stat. Data Anal. 49(4) (2005), 974–997.
- Quadeer, A. A., Louie, R. H., Shekhar, K., Chakraborty, A. K., Hsing, I. and McKay, M. R., “Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a hepatitis C virus nonstructural protein 3 exposes targets for immunogen design”, J. Virol. 88(13) (2014), 7628–7644.
- Quadeer, A. A., Morales-Jimenez, D. and McKay, M. R., “Co-evolution networks of HIV/HCV are modular with direct association to structure and function”, PLOS Comput. Biol. 14(9) (2018), 1–29.
- Ramey, J., “Datamicroarray”, R package version 1.14.4. (2013).
- Silverstein, J. W., “Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices”, J. Multivar. Anal. 55(2) (1995), 331–339.
- Statistics Bureau, Ministry of Internal Affairs and Communications., “Statistics Bureau home page/national survey of family income, consumption and wealth”, https://www.stat.go.jp/english/data/zenkokukakei/index.htm.
- Van der Vaart, A. W., Asymptotic statistics, vol. 3. Cambridge University Press, Cambridge, 2000.
- Yao, J., Zheng, S. and Bai, Z. D., Sample covariance matrices and high-dimensional data analysis. Cambridge University Press, New York, 2015.
- Yeomans, K. A. and Golder, P. A., “The Guttman-Kaiser criterion as a predictor of the number of common factors”, J. R. Stat. Soc. 31(3) (1982), 221–229.
- Zwick, W. R. and Velicer, W. F., “Comparison of five rules for determining the number of components to retain”, Psychol. Bull. 99(3) (1986), 432–442.
References
Akama, Y., “Correlation matrix of equi-correlated normal population: fluctuation of the largest eigenvalue, scaling of the bulk eigenvalues, and stock market”, Preprint, (2022).
Anderson, G. W., Guionnet, A. and Zeitouni, O., An introduction to random matrices, Cambridge University Press, Cambridge, 2010.
Bai, Z. and Zhou, W., “Large sample covariance matrices without independence structures in columns”, Stat. Sin. 18(2) (2008), 425–442.
Bai, Z. D., “Methodologies in spectral analysis of large dimensional random matrices, a review”, Stat. Sin. 9(3) (1999), 611–662.
Bai, Z. D. and Yin, Y. Q., “Limit of the smallest eigenvalue of a large dimensional sample covariance matrix”, Ann. Probab. 21(3) (1993), 1275–1294.
Bryson, J., Vershynin, R. and Zhao, H., “Marchenko–Pastur law with relaxed independence conditions”, Random Matrices Theory Appl (2021), 2150040.
Chatterjee, S. and Hadi, A. S., Regression analysis by example, 4th ed. John Wiley & Sons, Hoboken, 2006.
Dahirel, V., Shekhar, K., Pereyra, F., Miura, T., Artyomov, M., Talsania, S., Allen, T. M., Altfeld, M., Carrington, M., Irvine, D. J. et al., “Coordinate linkage of HIV evolution reveals regions of immunological vulnerability”, Proc. Natl. Acad. Sci. U.S.A. 108(28) (2011), 11530–11535.
Embrechts, P. and Hofert, M., “A note on generalized inverses”, Math. Oper. Res. 77(3) (2013), 423–432.
Engle, R. and Kelly, B., “Dynamic equicorrelation”, J. Bus. Econ. Stat. 30(2) (2012), 212–228.
Fabrigar, L. R. and Wegener, D. T., Exploratory factor analysis, Oxford University Press, UK, 2011.
Fan, J. and Jiang, T., “Largest entries of sample correlation matrices from equi-correlated normal populations”, Ann. Probab. 47(5) (2019), 3321–3374.
Glosten, L. R., Jagannathan, R. and Runkle, D. E., “On the relation between the expected value and the volatility of the nominal excess return on stocks”, J. Finance 48(5) (1993), 1779–1801.
Gotze, F. and Tikhomirov, A., “Rate of convergence in probability to the Marchenko-Pastur law”, Bernoulli 10(3) (2004), 503–548.
Gotze, F. and Tikhomirov, A., “The rate of convergence of spectra of sample covariance matrices”, Theory Probab. Its Appl. 54(1) (2010), 129–140.
Guttman, L., “Some necessary conditions for common-factor analysis”, Psychometrika 19(2) (1954), 149-161.
Halabi, N., Rivoire, O., Leibler, S. and Ranganathan, R., “Protein sectors: evolutionary units of three-dimensional structure”, Cell 138(4) (2009), 774–786.
Harman, H. H., Modern factor analysis, 3rd ed. University of Chicago press, Chicago, 1976.
Huber, P. J. and Ronchetti, E. M., Robust statistics, 2nd ed. vol. 523. John Wiley & Sons, Hoboken, 2004.
Husnaqilati, A., “Limiting spectral distribution of random matrices from equi-correlated normal population”, Preprint, (2022).
Jackson, J. E., A user’s guide to principal components, vol. 587. John Wiley & Sons, Hoboken, 1991.
Jiang, T., “The Limiting Distributions of Eigenvalues of Sample Correlation Matrices”, Sankhya: The Indian Journal of Statistics (2003-2007) 66(1) (2004), 35–48.
Jolliffe, I. T., Principal component analysis, 2nd ed. Springer, New York, 2002.
Kaiser, H. F., “The application of electronic computers to factor analysis”, Educ. Psychol. Meas. 20(1) (1960), 141–151.
Kaiser, H. F., “A measure of the average intercorrelation”, Educ. Psychol. Meas. 28(2) (1968), 245–247.
Kaiser, H. F., “On Cliff’s formula, the Kaiser-Guttman rule, and the number of factors”, Percept. Mot. Ski. 74(2) (1992), 595–598.
Laloux, L., Cizeau, P., Potters, M. and Bouchaud, J., “Random matrix theory and financial correlations”, Int. J. Theor. Appl. Finance. 3(3) (2000), 391–397.
Luxemburg, W. A. J., “Arzela’s dominated convergence theorem for the Riemann integral”, Am. Math. Mon. 78(9) (1971), 970–979.
Marcenko, V. A. and Pastur, L. A., “Distribution of eigenvalues for some sets of random matrices”, Mat USSR-Sborn 1(4) (1967), 457–483.
Morales-Jimenez, D., Johnstone, I. M., McKay, M. R. and Yang, J., “Asymptotics of eigen-structure of sample correlation matrices for high-dimensional spiked models”, Stat. Sin. 31(2) (2021), 571–601.
Mulaik, S. A., Foundations of factor analysis, 2nd ed. CRC press, Boca Raton, 2010.
Parzen, E., “Quantile functions, convergence in quantile, and extreme value distribution theory”, Technical Report No. B-3, Texas A & M University, Institute of Statistics (1980).
Peres-Neto, P. R., Jackson, D. A. and Somers, K. M., “How many principal components? Stopping rules for determining the number of non-trivial axes revisited”, Comput. Stat. Data Anal. 49(4) (2005), 974–997.
Quadeer, A. A., Louie, R. H., Shekhar, K., Chakraborty, A. K., Hsing, I. and McKay, M. R., “Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a hepatitis C virus nonstructural protein 3 exposes targets for immunogen design”, J. Virol. 88(13) (2014), 7628–7644.
Quadeer, A. A., Morales-Jimenez, D. and McKay, M. R., “Co-evolution networks of HIV/HCV are modular with direct association to structure and function”, PLOS Comput. Biol. 14(9) (2018), 1–29.
Ramey, J., “Datamicroarray”, R package version 1.14.4. (2013).
Silverstein, J. W., “Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices”, J. Multivar. Anal. 55(2) (1995), 331–339.
Statistics Bureau, Ministry of Internal Affairs and Communications., “Statistics Bureau home page/national survey of family income, consumption and wealth”, https://www.stat.go.jp/english/data/zenkokukakei/index.htm.
Van der Vaart, A. W., Asymptotic statistics, vol. 3. Cambridge University Press, Cambridge, 2000.
Yao, J., Zheng, S. and Bai, Z. D., Sample covariance matrices and high-dimensional data analysis. Cambridge University Press, New York, 2015.
Yeomans, K. A. and Golder, P. A., “The Guttman-Kaiser criterion as a predictor of the number of common factors”, J. R. Stat. Soc. 31(3) (1982), 221–229.
Zwick, W. R. and Velicer, W. F., “Comparison of five rules for determining the number of components to retain”, Psychol. Bull. 99(3) (1986), 432–442.