Skip to main content

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

Abstract

This chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) feature space reduction. Unlike the conventional formulation of PCA, also described, our presentation derives the property that the principal components are linear combinations of features rather than postulates it. Two more distant applications of PCA, Latent semantic analysis (for disambiguation in document retrieval) and Correspondence analysis (for visualization of contingency tables), are explained too. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis, J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  • Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis, 7th edn, Prentice Hall, ISBN-10: 0-13-813263-1 (2010)

    Google Scholar 

  • Kendall, M.G., Stewart, A.: Advanced Statistics: Inference and Relationship, 3rd edn. Griffin, London, ISBN: 0852642156 (1973)

    Google Scholar 

  • Lebart, L., Morineau, A., Piron, M.: Statistique Exploratoire Multidimensionelle. Dunod, Paris, ISBN 2-10-002886-3 (1995)

    Google Scholar 

  • Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge, Cambridge University Press (2008)

    Book  MATH  Google Scholar 

  • Mirkin, B.: Mathematical Classification and Clustering. Dordrecht, Kluwer Academic Press (1996)

    Book  MATH  Google Scholar 

  • Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. London, Chapman & Hall/CRC, ISBN 1-58488-534-3 (2005)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Mirkin, B. (2011). Principal Component Analysis and SVD. In: Core Concepts in Data Analysis: Summarization, Correlation and Visualization. Undergraduate Topics in Computer Science. Springer, London. https://doi.org/10.1007/978-0-85729-287-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-287-2_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-286-5

  • Online ISBN: 978-0-85729-287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics