Neural Processing Letters

, Volume 33, Issue 1, pp 17–30 | Cite as

On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

  • Antti Airola
  • Tapio Pahikkala
  • Tapio Salakoski


The high computational costs of training kernel methods to solve nonlinear tasks limits their applicability. However, recently several fast training methods have been introduced for solving linear learning tasks. These can be used to solve nonlinear tasks by mapping the input data nonlinearly to a low-dimensional feature space. In this work, we consider the mapping induced by decomposing the Nyström approximation of the kernel matrix. We collect together prior results and derive new ones to show how to efficiently train, make predictions with and do cross-validation for reduced set approximations of learning algorithms, given an efficient linear solver. Specifically, we present an efficient method for removing basis vectors from the mapping, which we show to be important when performing cross-validation.


Cross-validation Empirical kernel map Kernel methods Nyström approximation Reduced set method 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Analy Appl 10(3): 203–214CrossRefGoogle Scholar
  2. 2.
    Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory. ACM Press, pp 144–152.Google Scholar
  3. 3.
    Bottou L, Lin CJ (2007) Support vector machine solvers. In: DD Léon Bottou Olivier Chapelle, Weston J (eds) Large-scale kernel machines, neural information processing, MIT Press, Cambridge, pp 1–28Google Scholar
  4. 4.
    Cawley GC, Talbot NLC (2004) Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw 17(10): 1467–1475MATHCrossRefGoogle Scholar
  5. 5.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297MATHGoogle Scholar
  6. 6.
    Harmeling S, Ziehe A, Kawanabe M, Müller KR (2002) Kernel feature spaces and nonlinear blind source separation. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 761–768Google Scholar
  7. 7.
    Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, CambridgeMATHGoogle Scholar
  8. 8.
    Joachims T (2006) Training linear SVMs in linear time. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2006). ACM Press, New York, pp 217–226CrossRefGoogle Scholar
  9. 9.
    Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS 2009). JMLR workshop and conference proceedings, vol 5, JMLR, pp 304–311Google Scholar
  10. 10.
    Lee YJ, Mangasarian OJ (2001) RSVM: reduced support vector machines. In: Proceedings of the first SIAM international conference on data mining, ChicagoGoogle Scholar
  11. 11.
    Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14: 1449–1459CrossRefGoogle Scholar
  12. 12.
    Meyer CD (2000) Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, PhiladelphiaMATHGoogle Scholar
  13. 13.
    Pahikkala T, Boberg J, Salakoski T (2006) Fast n-fold cross-validation for regularized least-squares. In: Honkela T, Raiko T, Kortela J, Valpola H (eds) Proceedings of the ninth Scandinavian conference on artificial intelligence (SCAI 2006). Otamedia Oy, Espoo, Finland, pp 83–90Google Scholar
  14. 14.
    Pahikkala T, Suominen H, Boberg J, Salakoski T (2009) Efficient hold-out for subset of regressors. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Proceedings of the international conference on natural and adaptive computing algorithms (ICANNGA 2009). Lecture notes in computer science, vol 5495. Springer, pp 350–359Google Scholar
  15. 15.
    Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Järvinen J (2009) An efficient algorithm for learning to rank from preference graphs. Mach Learn 75(1): 129–165CrossRefGoogle Scholar
  16. 16.
    Poggio T, Girosi F (1990) Networks for approximation and learning. Proceedings of the IEEE 78(9)Google Scholar
  17. 17.
    Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6: 1939–1959MathSciNetGoogle Scholar
  18. 18.
    Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Platt JC, Koller D, Singer Y, Roweis ST, Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20. MIT Press, CambridgeGoogle Scholar
  19. 19.
    Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Suykens J, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, model and applications, nato science series III: computer and system sciences, vol 190, chap. 7. IOS Press, Amsterdam, pp 131–154Google Scholar
  20. 20.
    Sætre R, Sagae K, Tsujii J (2008) Syntactic features for protein–protein interaction extraction. In: Baker CJ, Jian S (eds) Proceedings of the 2nd international symposium on languages in biology and medicine (LBM 2007), CEUR Workshop Proceedings, pp 6.1–6.14Google Scholar
  21. 21.
    Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson R (eds) Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory (COLT 2001). Springer, Berlin, Germany, pp 416–426Google Scholar
  22. 22.
    Schölkopf B, Mika S, Burges C, Knirsch P, Müller KR, Rätsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5): 1000–1017CrossRefGoogle Scholar
  23. 23.
    Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, CambridgeGoogle Scholar
  24. 24.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeGoogle Scholar
  25. 25.
    Shwartz SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Ghahramani Z (ed) Proceedings of the 24th international conference on Machine learning (ICML 2007). ACM international conference proceeding series, vol 227. New York, pp 807–814. doi: 10.1145/1273496.1273598
  26. 26.
    Smola AJ, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann Publishers Inc., San Francisco, pp 911–918Google Scholar
  27. 27.
    Smola AJ, Vishwanathan SVN, Le Q (2007) Bundle methods for machine learning. In: McCallum A (ed) Advances in neural information processing systems 20. MIT Press, CambridgeGoogle Scholar
  28. 28.
    Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2003) Least squares support vector machines. World Scientific Publishing CompanyGoogle Scholar
  29. 29.
    Tsivtsivadze E, Pahikkala T, Airola A, Boberg J, Salakoski T (2008) A sparse regularized least-squares preference learning algorithm. In: Holst A, Kreuger P, Funk P (eds) Proceedings of the Tenth Scandinavian Conference on Artificial Intelligence (SCAI 2008). Frontiers in artificial intelligence and applications, vol 173. IOS Press, pp 76–83Google Scholar
  30. 30.
    Tsuda K (1999) Support vector classifier with asymmetric kernel functions. In: European symposium on artificial neural networks (ESANN 1999), pp 183–188Google Scholar
  31. 31.
    Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 682–688Google Scholar
  32. 32.
    Xiong H, Swamy M, Ahmad MO (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2): 460–474CrossRefGoogle Scholar
  33. 33.
    Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: McCallum A, Roweis S (eds) Proceedings of the 25th international conference on Machine learning (ICML 2008). ACM international conference proceeding series, vol 307. New York, pp 1232–1239Google Scholar

Copyright information

© Springer Science+Business Media, LLC. 2010

Authors and Affiliations

  • Antti Airola
    • 1
  • Tapio Pahikkala
    • 1
  • Tapio Salakoski
    • 1
  1. 1.Department of Information TechnologyTurku Centre for Computer Science (TUCS), University of TurkuTurkuFinland

Personalised recommendations