Principal Component Analysis and SVD
This chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) feature space reduction. Unlike the conventional formulation of PCA, also described, our presentation derives the property that the principal components are linear combinations of features rather than postulates it. Two more distant applications of PCA, Latent semantic analysis (for disambiguation in document retrieval) and Correspondence analysis (for visualization of contingency tables), are explained too. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length.
KeywordsPrincipal Component Analysis Singular Value Decomposition Singular Vector Latent Semantic Analysis Data Scatter
- Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis, 7th edn, Prentice Hall, ISBN-10: 0-13-813263-1 (2010)Google Scholar
- Kendall, M.G., Stewart, A.: Advanced Statistics: Inference and Relationship, 3rd edn. Griffin, London, ISBN: 0852642156 (1973)Google Scholar
- Lebart, L., Morineau, A., Piron, M.: Statistique Exploratoire Multidimensionelle. Dunod, Paris, ISBN 2-10-002886-3 (1995)Google Scholar