Abstract
This chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) feature space reduction. Unlike the conventional formulation of PCA, also described, our presentation derives the property that the principal components are linear combinations of features rather than postulates it. Two more distant applications of PCA, Latent semantic analysis (for disambiguation in document retrieval) and Correspondence analysis (for visualization of contingency tables), are explained too. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis, J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis, 7th edn, Prentice Hall, ISBN-10: 0-13-813263-1 (2010)
Kendall, M.G., Stewart, A.: Advanced Statistics: Inference and Relationship, 3rd edn. Griffin, London, ISBN: 0852642156 (1973)
Lebart, L., Morineau, A., Piron, M.: Statistique Exploratoire Multidimensionelle. Dunod, Paris, ISBN 2-10-002886-3 (1995)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge, Cambridge University Press (2008)
Mirkin, B.: Mathematical Classification and Clustering. Dordrecht, Kluwer Academic Press (1996)
Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. London, Chapman & Hall/CRC, ISBN 1-58488-534-3 (2005)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Mirkin, B. (2011). Principal Component Analysis and SVD. In: Core Concepts in Data Analysis: Summarization, Correlation and Visualization. Undergraduate Topics in Computer Science. Springer, London. https://doi.org/10.1007/978-0-85729-287-2_5
Download citation
DOI: https://doi.org/10.1007/978-0-85729-287-2_5
Published:
Publisher Name: Springer, London
Print ISBN: 978-0-85729-286-5
Online ISBN: 978-0-85729-287-2
eBook Packages: Computer ScienceComputer Science (R0)