A correction of approximations used in sensitivity study of principal component analysis

Short note
  • 27 Downloads

Abstract

Principal component analysis is a method of dimensionality reduction based on the eigensystem of the covariance matrix of a set of multivariate observations. Analyzing the effects of some specific observations on this eigensystem is therefore of particular importance in the sensitivity study of the results. In this framework, approximations for the perturbed eigenvalues and eigenvectors when deleting one or several observations are useful from a computational standpoint. Indeed, they allow one to evaluate the effects of these observations without having to recompute the exact perturbed eigenvalues and eigenvectors. However, it turns out that some approximations which have been suggested are based on an incorrect application of matrix perturbation theory. The aim of this short note is to provide the correct formulations which are illustrated with a numerical study.

Keywords

Covariance matrix Eigenvalues and eigenvectors Influential observations Perturbation theory 

Notes

Acknowledgements

The author is grateful to the reviewers for their careful reading of the paper and their helpful comments.

References

  1. Bénasséni J (1987) Perturbation des poids des unités statistiques et approximation en analyse en composantes principales. R.A.I.R.O Recherche opérationnelle/Oper Res 21:175–198MathSciNetMATHGoogle Scholar
  2. Bénasséni J (1990) Sensitivity coefficients for the subspaces spanned by principal components. Commun Stat Theory Methods 19:2021–2034Google Scholar
  3. Critchley F (1985) Influence in principal component analysis. Biometrika 72:627–636MathSciNetCrossRefMATHGoogle Scholar
  4. Enguix-González A, Muñoz-Pichardo JM, Moreno-Rebollo JL, Pino-Mejías R (2005) Influence analysis in principal component analysis through power-series expansions. Commun Stat Theory Methods 34:2025–2046MathSciNetCrossRefMATHGoogle Scholar
  5. Hadi A, Nyquist H (1993) Further theoretical results and a comparison between two methods for approximating eigenvalues of perturbed covariance matrices. Stat Comput 3:113–123CrossRefGoogle Scholar
  6. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New YorkMATHGoogle Scholar
  7. Kendall MG (1975) Multivariate analysis. Griffin, LondonMATHGoogle Scholar
  8. Pack P, Jolliffe IT, Morgan BJT (1988) Influential observations in principal component analysis: a case-study. J Appl Stat 15:37–50CrossRefGoogle Scholar
  9. Prendergast LA (2008) A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions. Electron J Stat 2:454–467MathSciNetCrossRefMATHGoogle Scholar
  10. Prendergast LA, Suen Li Wai, C (2011) A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets. Comput Stat Data Anal 55:752–764MathSciNetCrossRefMATHGoogle Scholar
  11. Sibson R (1979) Studies in robustness of multidimensional scaling: perturbational analysis of classical scaling. J R Stat Soc B 41:217–229MathSciNetMATHGoogle Scholar
  12. Tanaka Y (1988) Sensitivity analysis in principal component analysis: influence on the subspace spanned by principal components. Commun Stat Theory Methods 17:3157–3175Google Scholar
  13. Wang S-G, Nyquist H (1991) Effects on the eigenstructure of a data matrix when deleting an observation. Comput Stat Data Anal 11:179–188MathSciNetCrossRefMATHGoogle Scholar
  14. Wang S-G, Liski EP (1993) Effects of observations on the eigensystem of a sample covariance matrix. J Stat Plan Inference 36:215–226MathSciNetCrossRefMATHGoogle Scholar
  15. Wilkinson JH (1988) The algebraic eigenvalue problem. Clarendon Press, OxfordMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Univ Rennes, CNRS, IRMAR - UMR 6625RennesFrance

Personalised recommendations