Principal Component Analysis and SVD

Mirkin, Boris

doi:10.1007/978-0-85729-287-2_5

Boris Mirkin^2,3

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

3873 Accesses
2 Citations

Abstract

This chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) feature space reduction. Unlike the conventional formulation of PCA, also described, our presentation derives the property that the principal components are linear combinations of features rather than postulates it. Two more distant applications of PCA, Latent semantic analysis (for disambiguation in document retrieval) and Correspondence analysis (for visualization of contingency tables), are explained too. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis, J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis, 7th edn, Prentice Hall, ISBN-10: 0-13-813263-1 (2010)
Google Scholar
Kendall, M.G., Stewart, A.: Advanced Statistics: Inference and Relationship, 3rd edn. Griffin, London, ISBN: 0852642156 (1973)
Google Scholar
Lebart, L., Morineau, A., Piron, M.: Statistique Exploratoire Multidimensionelle. Dunod, Paris, ISBN 2-10-002886-3 (1995)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge, Cambridge University Press (2008)
Book MATH Google Scholar
Mirkin, B.: Mathematical Classification and Clustering. Dordrecht, Kluwer Academic Press (1996)
Book MATH Google Scholar
Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. London, Chapman & Hall/CRC, ISBN 1-58488-534-3 (2005)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Research University – Higher School of Economics, School of Applied Mathematics and Informatics, 11 Pokrovsky Boulevard, Moscow, RF, Russia
Boris Mirkin
Department of Computer Science, Birkbeck University of London, Malet Street, London, UK
Boris Mirkin

Authors

Boris Mirkin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mirkin, B. (2011). Principal Component Analysis and SVD. In: Core Concepts in Data Analysis: Summarization, Correlation and Visualization. Undergraduate Topics in Computer Science. Springer, London. https://doi.org/10.1007/978-0-85729-287-2_5

Download citation

DOI: https://doi.org/10.1007/978-0-85729-287-2_5
Published: 09 February 2011
Publisher Name: Springer, London
Print ISBN: 978-0-85729-286-5
Online ISBN: 978-0-85729-287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics