Skip to main content

Ensemble Nyström

  • Chapter
  • First Online:
  • 13k Accesses

Abstract

A crucial technique for scaling kernel methods to very large datasets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices. The Nyström method is a popular technique to generate low-rank matrix approximations but it requires sampling of a large number of columns from the original matrix to achieve good accuracy. This chapter describes a new family of algorithms based on mixtures of Nyström approximations, Ensemble Nyström algorithms, that yield more accurate low-rank approximations than the standard Nyström method. We give a detailed study of variants of these algorithms based on simple averaging, an exponential weight method, and regression-based methods. A theoretical analysis of these algorithms, including novel error bounds guaranteeing a better convergence rate than the standard Nyström method is also presented. Finally, experiments with several datasets containing up to 1 M points are presented, demonstrating significant improvement over the standard Nyström approximation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Similar results (not reported here) were observed for other values of k and l as well.

References

  1. Dimitris Achlioptas and Frank Mcsherry. Fast computation of low-rank matrix approximations. Journal of the ACM, 54(2), 2007.

    Google Scholar 

  2. Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for sparsifying matrices. In Approx-Random, 2006.

    Google Scholar 

  3. A. Asuncion and D.J. Newman. UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.

  4. Francis R. Bach and Michael I. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3:1–48, 2002.

    MathSciNet  MATH  Google Scholar 

  5. Francis R. Bach and Michael I. Jordan. Predictive low-rank decomposition for kernel methods. In International Conference on Machine Learning, 2005.

    Google Scholar 

  6. Christopher T. Baker. The numerical treatment of integral equations. Clarendon Press, Oxford, 1977.

    MATH  Google Scholar 

  7. M.-A. Belabbas and P. J. Wolfe. On landmark selection and sampling in high-dimensional data analysis. arXiv:0906.4582v1[stat.ML], 2009.

    Google Scholar 

  8. M. A. Belabbas and P. J. Wolfe. Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences of the United States of America, 106(2):369–374, January 2009.

    Article  Google Scholar 

  9. Bernhard E. Boser, Isabelle Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Conference on Learning Theory, 1992.

    Google Scholar 

  10. Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Symposium on Discrete Algorithms, 2009.

    Google Scholar 

  11. Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  12. Emmanuel J. Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. arXiv:0903.1476v1[cs.IT], 2009.

    Google Scholar 

  13. Corinna Cortes, Mehryar Mohri, Dmitry Pechyony, and Ashish Rastogi. Stability of transductive regression algorithms. In International Conference on Machine Learning, 2008.

    Google Scholar 

  14. Corinna Cortes, Mehryar Mohri, and Ameet Talwalkar. On the impact of kernel approximation on learning accuracy. In Conference on Artificial Intelligence and Statistics, 2010.

    Google Scholar 

  15. Corinna Cortes and Vladimir N. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995.

    Article  MATH  Google Scholar 

  16. Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation and projective clustering via volume sampling. In Symposium on Discrete Algorithms, 2006.

    Google Scholar 

  17. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal of Computing, 36(1), 2006.

    Google Scholar 

  18. Petros Drineas and Michael W. Mahoney. On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175, 2005.

    MathSciNet  MATH  Google Scholar 

  19. Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844–881, 2008.

    Google Scholar 

  20. Shai Fine and Katya Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2002.

    MATH  Google Scholar 

  21. Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. Spectral grouping using the Nyström method. Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225, 2004.

    Article  Google Scholar 

  22. Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. In Foundation of Computer Science, 1998.

    Google Scholar 

  23. Gene Golub and Charles Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 2nd edition, 1983.

    Google Scholar 

  24. S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A theory of pseudoskeleton approximations. Linear Algebra and Its Applications, 261:1–21, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  25. G. Gorrell. Generalized Hebbian algorithm for incremental Singular Value Decomposition in natural language processing. In European Chapter of the Association for Computational Linguistics, 2006.

    Google Scholar 

  26. Ming Gu and Stanley C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal of Scientific Computing, 17(4):848–869, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  27. A. Gustafson, E. Snitkin, S. Parker, C. DeLisi, and S. Kasif. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC:Genomics, 7:265, 2006.

    Google Scholar 

  28. Nathan Halko, Per Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061v1[math.NA], 2009.

    Google Scholar 

  29. Sariel Har-peled. Low-rank matrix approximation in linear time, manuscript, 2006.

    Google Scholar 

  30. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM, 53(3):307–323, 2006.

    Article  MathSciNet  MATH  Google Scholar 

  31. W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26:189–206, 1984.

    Article  MathSciNet  MATH  Google Scholar 

  32. Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Ensemble Nyström method. In Neural Information Processing Systems, 2009.

    Google Scholar 

  33. Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. On sampling-based approximate spectral decomposition. In International Conference on Machine Learning, 2009.

    Google Scholar 

  34. Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling techniques for the Nyström method. In Conference on Artificial Intelligence and Statistics, 2009.

    Google Scholar 

  35. Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.

  36. Mu Li, James T. Kwok, and Bao-Liang Lu. Making large-scale Nyström approximation possible. In International Conference on Machine Learning, 2010.

    Google Scholar 

  37. Edo Liberty. Accelerated dense random projections. Ph.D. thesis, computer science department, Yale University, New Haven, CT, 2009.

    Google Scholar 

  38. N. Littlestone and M. K. Warmuth. The Weighted Majority algorithm. Information and Computation, 108(2):212–261, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  39. David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004.

    Article  Google Scholar 

  40. Michael W Mahoney and Petros Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697–702, 2009.

    Google Scholar 

  41. E.J. Nyström. Über die praktische auflösung von linearen integralgleichungen mit anwendungen auf randwertaufgaben der potentialtheorie. Commentationes Physico-Mathematicae, 4(15):1–52, 1928.

    Google Scholar 

  42. Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent Semantic Indexing: a probabilistic analysis. In Principles of Database Systems, 1998.

    Google Scholar 

  43. John C. Platt. Fast embedding of sparse similarity graphs. In Neural Information Processing Systems, 2004.

    Google Scholar 

  44. Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for Principal Component Analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100–1124, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  45. Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, 54(4):21, 2007.

    Article  MathSciNet  MATH  Google Scholar 

  46. A. F. Ruston. Auerbachs theorem. Mathematical Proceedings of the Cambridge Philosophical Society, 56:476–480, 1964.

    Google Scholar 

  47. Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.

    Article  Google Scholar 

  48. Terence Sim, Simon Baker, and Maan Bsat. The CMU pose, illumination, and expression database. In Conference on Automatic Face and Gesture Recognition, 2002.

    Google Scholar 

  49. Alex J. Smola and Bernhard Schölkopf. Sparse Greedy Matrix Approximation for machine learning. In International Conference on Machine Learning, 2000.

    Google Scholar 

  50. G. W. Stewart. Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numerische Mathematik, 83(2):313–323, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  51. Ameet Talwalkar, Sanjiv Kumar, and Henry Rowley. Large-scale manifold learning. In Conference on Vision and Pattern Recognition, 2008.

    Google Scholar 

  52. Ameet Talwalkar and Afshin Rostamizadeh. Matrix coherence and the Nyström method. In Conference on Uncertainty in Artificial Intelligence, 2010.

    Google Scholar 

  53. Christopher K. I. Williams and Matthias Seeger. Using the Nyström method to speed up kernel machines. In Neural Information Processing Systems, 2000.

    Google Scholar 

  54. Kai Zhang and James T. Kwok. Density-weighted Nyström method for computing large kernel eigensystems. Neural Computation, 21(1):121–146, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  55. Kai Zhang, Ivor Tsang, and James Kwok. Improved Nyström low-rank approximation and error analysis. In International Conference on Machine Learning, 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjiv Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Kumar, S., Mohri, M., Talwalkar, A. (2012). Ensemble Nyström. In: Zhang, C., Ma, Y. (eds) Ensemble Machine Learning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9326-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9326-7_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-9325-0

  • Online ISBN: 978-1-4419-9326-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics