Advertisement

Large Scale Learning Techniques for Least Squares Support Vector Machines

  • Santiago Toledo-CortésEmail author
  • Ivan Y. Castellanos-MartinezEmail author
  • Fabio A. GonzalezEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11401)

Abstract

Although kernel machines allow a non-linear analysis through the transformation of their input data, their computational complexity makes them inefficient in terms of time and memory for the analysis of very large databases. Several attempts have been made to improve kernel methods performance, many of which are focused on approximate the kernel matrix or the feature mapping associated to it. Current trends in machine learning demands the capacity of dealing with large data sets while exploiting the capabilities of massively parallel architectures based on GPUs. This has been mainly accomplished by a combination of gradient descent optimization and online learning. This paper presents an online kernel-based model based on the dual formulation of Least Squared Support Vector Machine method, using the Learning on a Budget strategy to lighten the computational cost. This extends the algorithm capability to analyze very large or high-dimensional data without requiring high memory resources. The method was evaluated against other kernel approximation techniques: Nyström approximation and Random Fourier Features. Experiments made with different datasets show the effectiveness of the Learning on a Budget strategy compared with the other approximation techniques.

Keywords

Kernel Methods Least Squared Support Vector Machine Nyström Budget Random Fourier Features 

References

  1. 1.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. CoRR (2016)Google Scholar
  2. 2.
    Barreto, A.M.S., Precup, D., Pineau, J.: Practical kernel-based reinforcement learning. arXiv preprint arXiv:1407.5358, 17, 1–70 (2014). http://arxiv.org/abs/1407.5358%5Cnpapers://6d7e3f19-aa00-4bdb-8e1b-85d8ff06c154/Paper/p9035
  3. 3.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-7908-2604-3_16CrossRefGoogle Scholar
  4. 4.
    Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. R. 6, 2153–2175 (2005)zbMATHGoogle Scholar
  5. 5.
    Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36, 1171–1220 (2008)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Jian, L., Shen, S., Li, J., Liang, X., Li, L.: Budget online learning algorithm for least squares SVM. IEEE Trans. Neural Netw. Learn. Syst. 28, 2076–2087 (2017)MathSciNetGoogle Scholar
  7. 7.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR (2014)Google Scholar
  8. 8.
    Páez-Torres, A.E., González, F.A.: Online kernel matrix factorization. In: Pardo, A., Kittler, J. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. LNCS, vol. 9423, pp. 651–658. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25751-8_78CrossRefGoogle Scholar
  9. 9.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS 2007, pp. 1177–1184. Curran Associates Inc. (2007)Google Scholar
  10. 10.
    Si, S., Hsieh, C.J., Dhillon, I.S.: Memory efficient kernel approximation. J. Mach. Learn. Res. 18(1), 682–713 (2017)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Suykens, J., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefGoogle Scholar
  12. 12.
    Wang, Z., Crammer, K., Vucetic, S.: Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training. J. Mach. Learn. Res. 13(1), 3103–3131 (2012)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Yang, T., Li, Y.F., Mahdavi, M., Jin, R., Zhou, Z.H.: Nyström method vs random fourier features: a theoretical and empirical comparison. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 476–484. Curran Associates, Inc. (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.MindLab Research GroupUniversidad Nacional de ColombiaBogotáColombia

Personalised recommendations