Principal Component Analysis for Sparse High-Dimensional Data

Raiko, Tapani; Ilin, Alexander; Karhunen, Juha

doi:10.1007/978-3-540-69158-7_59

Tapani Raiko¹,
Alexander Ilin¹ &
Juha Karhunen¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4984))

Included in the following conference series:

International Conference on Neural Information Processing

1332 Accesses
8 Citations

Abstract

Principal component analysis (PCA) is a widely used technique for data analysis and dimensionality reduction. Eigenvalue decomposition is the standard algorithm for solving PCA, but a number of other algorithms have been proposed. For instance, the EM algorithm is much more efficient in case of high dimensionality and a small number of principal components. We study a case where the data are high-dimensional and a majority of the values are missing. In this case, both of these algorithms turn out to be inadequate. We propose using a gradient descent algorithm inspired by Oja’s rule, and speeding it up by an approximate Newton’s method. The computational complexity of the proposed method is linear with respect to the number of observed values in the data and to the number of principal components. In the experiments with Netflix data, the proposed algorithm is about ten times faster than any of the four comparison methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)
Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Diamantaras, K., Kung, S.: Principal Component Neural Networks - Theory and Application. Wiley, Chichester (1996)
Google Scholar
Haykin, S.: Modern Filters. Macmillan, Basingstoke (1989)
Google Scholar
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications. Wiley, Chichester (2002)
Google Scholar
Roweis, S.: EM algorithms for PCA and SPCA. In: Advances in Neural Information Processing Systems, vol. 10, pp. 626–632. MIT Press, Cambridge (1998)
Google Scholar
Karhunen, J., Oja, E.: New methods for stochastic approximation of truncated Karhunen-Loeve expansions. In: Proceedings of the 6th International Conference on Pattern Recognition, pp. 550–553. Springer, Heidelberg (1982)
Google Scholar
Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press and J. Wiley (1983)
Google Scholar
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Article MathSciNet Google Scholar
Grung, B., Manne, R.: Missing values in principal components analysis. Chemometrics and Intelligent Laboratory Systems 42(1), 125–139 (1998)
Article Google Scholar
Raiko, T., Ilin, A., Karhunen, J.: Principal component analysis for large scale problems with lots of missing values. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 691–698. Springer, Heidelberg (2007)
Chapter Google Scholar
Bishop, C.: Variational principal components. In: Proceedings of the 9th International Conference on Artificial Neural Networks (ICANN 1999), pp. 509–514 (1999)
Google Scholar
Netflix: Netflix prize webpage (2007), http://www.netflixprize.com/
Funk, S.: Netflix update: Try this at home (December 2006), http://sifter.org/~simon/journal/20061211.html
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the International Conference on Machine Learning (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Adaptive Informatics Research Center, Helsinki Univ. of Technology, P.O. Box 5400, FI-02015, TKK, Finland
Tapani Raiko, Alexander Ilin & Juha Karhunen

Authors

Tapani Raiko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Ilin
View author publications
You can also search for this author in PubMed Google Scholar
Juha Karhunen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raiko, T., Ilin, A., Karhunen, J. (2008). Principal Component Analysis for Sparse High-Dimensional Data. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4984. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69158-7_59

Download citation

DOI: https://doi.org/10.1007/978-3-540-69158-7_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69154-9
Online ISBN: 978-3-540-69158-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics