Skip to main content

Principal Component Analysis for Sparse High-Dimensional Data

  • Conference paper
Neural Information Processing (ICONIP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4984))

Included in the following conference series:

Abstract

Principal component analysis (PCA) is a widely used technique for data analysis and dimensionality reduction. Eigenvalue decomposition is the standard algorithm for solving PCA, but a number of other algorithms have been proposed. For instance, the EM algorithm is much more efficient in case of high dimensionality and a small number of principal components. We study a case where the data are high-dimensional and a majority of the values are missing. In this case, both of these algorithms turn out to be inadequate. We propose using a gradient descent algorithm inspired by Oja’s rule, and speeding it up by an approximate Newton’s method. The computational complexity of the proposed method is linear with respect to the number of observed values in the data and to the number of principal components. In the experiments with Netflix data, the proposed algorithm is about ten times faster than any of the four comparison methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)

    Google Scholar 

  2. Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986)

    Google Scholar 

  3. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  4. Diamantaras, K., Kung, S.: Principal Component Neural Networks - Theory and Application. Wiley, Chichester (1996)

    Google Scholar 

  5. Haykin, S.: Modern Filters. Macmillan, Basingstoke (1989)

    Google Scholar 

  6. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications. Wiley, Chichester (2002)

    Google Scholar 

  7. Roweis, S.: EM algorithms for PCA and SPCA. In: Advances in Neural Information Processing Systems, vol. 10, pp. 626–632. MIT Press, Cambridge (1998)

    Google Scholar 

  8. Karhunen, J., Oja, E.: New methods for stochastic approximation of truncated Karhunen-Loeve expansions. In: Proceedings of the 6th International Conference on Pattern Recognition, pp. 550–553. Springer, Heidelberg (1982)

    Google Scholar 

  9. Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press and J. Wiley (1983)

    Google Scholar 

  10. Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)

    Article  MathSciNet  Google Scholar 

  11. Grung, B., Manne, R.: Missing values in principal components analysis. Chemometrics and Intelligent Laboratory Systems 42(1), 125–139 (1998)

    Article  Google Scholar 

  12. Raiko, T., Ilin, A., Karhunen, J.: Principal component analysis for large scale problems with lots of missing values. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 691–698. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Bishop, C.: Variational principal components. In: Proceedings of the 9th International Conference on Artificial Neural Networks (ICANN 1999), pp. 509–514 (1999)

    Google Scholar 

  14. Netflix: Netflix prize webpage (2007), http://www.netflixprize.com/

  15. Funk, S.: Netflix update: Try this at home (December 2006), http://sifter.org/~simon/journal/20061211.html

  16. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the International Conference on Machine Learning (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Raiko, T., Ilin, A., Karhunen, J. (2008). Principal Component Analysis for Sparse High-Dimensional Data. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4984. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69158-7_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69158-7_59

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69154-9

  • Online ISBN: 978-3-540-69158-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics