Skip to main content
Log in

On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The high computational costs of training kernel methods to solve nonlinear tasks limits their applicability. However, recently several fast training methods have been introduced for solving linear learning tasks. These can be used to solve nonlinear tasks by mapping the input data nonlinearly to a low-dimensional feature space. In this work, we consider the mapping induced by decomposing the Nyström approximation of the kernel matrix. We collect together prior results and derive new ones to show how to efficiently train, make predictions with and do cross-validation for reduced set approximations of learning algorithms, given an efficient linear solver. Specifically, we present an efficient method for removing basis vectors from the mapping, which we show to be important when performing cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Analy Appl 10(3): 203–214

    Article  Google Scholar 

  2. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory. ACM Press, pp 144–152.

  3. Bottou L, Lin CJ (2007) Support vector machine solvers. In: DD Léon Bottou Olivier Chapelle, Weston J (eds) Large-scale kernel machines, neural information processing, MIT Press, Cambridge, pp 1–28

  4. Cawley GC, Talbot NLC (2004) Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw 17(10): 1467–1475

    Article  MATH  Google Scholar 

  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297

    MATH  Google Scholar 

  6. Harmeling S, Ziehe A, Kawanabe M, Müller KR (2002) Kernel feature spaces and nonlinear blind source separation. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 761–768

    Google Scholar 

  7. Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  8. Joachims T (2006) Training linear SVMs in linear time. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2006). ACM Press, New York, pp 217–226

    Chapter  Google Scholar 

  9. Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS 2009). JMLR workshop and conference proceedings, vol 5, JMLR, pp 304–311

  10. Lee YJ, Mangasarian OJ (2001) RSVM: reduced support vector machines. In: Proceedings of the first SIAM international conference on data mining, Chicago

  11. Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14: 1449–1459

    Article  Google Scholar 

  12. Meyer CD (2000) Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, Philadelphia

    MATH  Google Scholar 

  13. Pahikkala T, Boberg J, Salakoski T (2006) Fast n-fold cross-validation for regularized least-squares. In: Honkela T, Raiko T, Kortela J, Valpola H (eds) Proceedings of the ninth Scandinavian conference on artificial intelligence (SCAI 2006). Otamedia Oy, Espoo, Finland, pp 83–90

    Google Scholar 

  14. Pahikkala T, Suominen H, Boberg J, Salakoski T (2009) Efficient hold-out for subset of regressors. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Proceedings of the international conference on natural and adaptive computing algorithms (ICANNGA 2009). Lecture notes in computer science, vol 5495. Springer, pp 350–359

  15. Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Järvinen J (2009) An efficient algorithm for learning to rank from preference graphs. Mach Learn 75(1): 129–165

    Article  Google Scholar 

  16. Poggio T, Girosi F (1990) Networks for approximation and learning. Proceedings of the IEEE 78(9)

  17. Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6: 1939–1959

    MathSciNet  Google Scholar 

  18. Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Platt JC, Koller D, Singer Y, Roweis ST, Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20. MIT Press, Cambridge

    Google Scholar 

  19. Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Suykens J, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, model and applications, nato science series III: computer and system sciences, vol 190, chap. 7. IOS Press, Amsterdam, pp 131–154

    Google Scholar 

  20. Sætre R, Sagae K, Tsujii J (2008) Syntactic features for protein–protein interaction extraction. In: Baker CJ, Jian S (eds) Proceedings of the 2nd international symposium on languages in biology and medicine (LBM 2007), CEUR Workshop Proceedings, pp 6.1–6.14

  21. Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson R (eds) Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory (COLT 2001). Springer, Berlin, Germany, pp 416–426

    Google Scholar 

  22. Schölkopf B, Mika S, Burges C, Knirsch P, Müller KR, Rätsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5): 1000–1017

    Article  Google Scholar 

  23. Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge

    Google Scholar 

  24. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Google Scholar 

  25. Shwartz SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Ghahramani Z (ed) Proceedings of the 24th international conference on Machine learning (ICML 2007). ACM international conference proceeding series, vol 227. New York, pp 807–814. doi:10.1145/1273496.1273598

  26. Smola AJ, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann Publishers Inc., San Francisco, pp 911–918

    Google Scholar 

  27. Smola AJ, Vishwanathan SVN, Le Q (2007) Bundle methods for machine learning. In: McCallum A (ed) Advances in neural information processing systems 20. MIT Press, Cambridge

    Google Scholar 

  28. Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2003) Least squares support vector machines. World Scientific Publishing Company

  29. Tsivtsivadze E, Pahikkala T, Airola A, Boberg J, Salakoski T (2008) A sparse regularized least-squares preference learning algorithm. In: Holst A, Kreuger P, Funk P (eds) Proceedings of the Tenth Scandinavian Conference on Artificial Intelligence (SCAI 2008). Frontiers in artificial intelligence and applications, vol 173. IOS Press, pp 76–83

  30. Tsuda K (1999) Support vector classifier with asymmetric kernel functions. In: European symposium on artificial neural networks (ESANN 1999), pp 183–188

  31. Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 682–688

    Google Scholar 

  32. Xiong H, Swamy M, Ahmad MO (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2): 460–474

    Article  Google Scholar 

  33. Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: McCallum A, Roweis S (eds) Proceedings of the 25th international conference on Machine learning (ICML 2008). ACM international conference proceeding series, vol 307. New York, pp 1232–1239

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antti Airola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Airola, A., Pahikkala, T. & Salakoski, T. On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix. Neural Process Lett 33, 17–30 (2011). https://doi.org/10.1007/s11063-010-9159-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-010-9159-4

Keywords

Navigation