Principal component analysis is a method of dimensionality reduction based on the eigensystem of the covariance matrix of a set of multivariate observations. Analyzing the effects of some specific observations on this eigensystem is therefore of particular importance in the sensitivity study of the results. In this framework, approximations for the perturbed eigenvalues and eigenvectors when deleting one or several observations are useful from a computational standpoint. Indeed, they allow one to evaluate the effects of these observations without having to recompute the exact perturbed eigenvalues and eigenvectors. However, it turns out that some approximations which have been suggested are based on an incorrect application of matrix perturbation theory. The aim of this short note is to provide the correct formulations which are illustrated with a numerical study.
Covariance matrix Eigenvalues and eigenvectors Influential observations Perturbation theory
The author is grateful to the reviewers for their careful reading of the paper and their helpful comments.
Bénasséni J (1987) Perturbation des poids des unités statistiques et approximation en analyse en composantes principales. R.A.I.R.O Recherche opérationnelle/Oper Res 21:175–198MathSciNetMATHGoogle Scholar
Bénasséni J (1990) Sensitivity coefficients for the subspaces spanned by principal components. Commun Stat Theory Methods 19:2021–2034Google Scholar
Enguix-González A, Muñoz-Pichardo JM, Moreno-Rebollo JL, Pino-Mejías R (2005) Influence analysis in principal component analysis through power-series expansions. Commun Stat Theory Methods 34:2025–2046MathSciNetCrossRefMATHGoogle Scholar
Hadi A, Nyquist H (1993) Further theoretical results and a comparison between two methods for approximating eigenvalues of perturbed covariance matrices. Stat Comput 3:113–123CrossRefGoogle Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New YorkMATHGoogle Scholar
Pack P, Jolliffe IT, Morgan BJT (1988) Influential observations in principal component analysis: a case-study. J Appl Stat 15:37–50CrossRefGoogle Scholar
Prendergast LA (2008) A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions. Electron J Stat 2:454–467MathSciNetCrossRefMATHGoogle Scholar
Prendergast LA, Suen Li Wai, C (2011) A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets. Comput Stat Data Anal 55:752–764MathSciNetCrossRefMATHGoogle Scholar
Sibson R (1979) Studies in robustness of multidimensional scaling: perturbational analysis of classical scaling. J R Stat Soc B 41:217–229MathSciNetMATHGoogle Scholar
Tanaka Y (1988) Sensitivity analysis in principal component analysis: influence on the subspace spanned by principal components. Commun Stat Theory Methods 17:3157–3175Google Scholar