Principal Component Analysis and SVD

  • Boris Mirkin
Part of the Undergraduate Topics in Computer Science book series (UTICS)


This chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) feature space reduction. Unlike the conventional formulation of PCA, also described, our presentation derives the property that the principal components are linear combinations of features rather than postulates it. Two more distant applications of PCA, Latent semantic analysis (for disambiguation in document retrieval) and Correspondence analysis (for visualization of contingency tables), are explained too. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length.


Principal Component Analysis Singular Value Decomposition Singular Vector Latent Semantic Analysis Data Scatter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis, J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  2. Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis, 7th edn, Prentice Hall, ISBN-10: 0-13-813263-1 (2010)Google Scholar
  3. Kendall, M.G., Stewart, A.: Advanced Statistics: Inference and Relationship, 3rd edn. Griffin, London, ISBN: 0852642156 (1973)Google Scholar
  4. Lebart, L., Morineau, A., Piron, M.: Statistique Exploratoire Multidimensionelle. Dunod, Paris, ISBN 2-10-002886-3 (1995)Google Scholar
  5. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge, Cambridge University Press (2008)MATHCrossRefGoogle Scholar
  6. Mirkin, B.: Mathematical Classification and Clustering. Dordrecht, Kluwer Academic Press (1996)MATHCrossRefGoogle Scholar
  7. Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. London, Chapman & Hall/CRC, ISBN 1-58488-534-3 (2005)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Boris Mirkin
    • 1
    • 2
  1. 1.Research University – Higher School of Economics, School of Applied Mathematics and InformaticsMoscowRussia
  2. 2.Department of Computer ScienceBirkbeck University of LondonLondonUK

Personalised recommendations