Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces

Abstract

In the context of support vector machines, identifying the support vectors is a key issue when dealing with large data sets. In Camelo et al. (Ann Oper Res 235:85–101, 2015), the authors present a promising approach to finding or approximating most of the support vectors through a procedure based on sub-sampling and enriching the support vector sets by nearest neighbors. This method has been shown to improve the computational efficiency of support vector machines on large data sets with low or intermediate feature space dimension. In the present article we discuss ways of adapting the nearest neighbor enriching methodology to the context of very high dimensional data, such as text data or other high dimensional data types, for which nearest neighbor queries involve, in principle, a high computational cost. Our approach incorporates the proximity preserving order search algorithm of Chavez et al. (MICAI 2005: advances in artificial intelligence, Springer, Berlin, pp 405–414, 2005), into the nearest neighbor enriching method of Camelo et al. (2015), in order to adapt this procedure to the high dimension setting. For the required set of pivots, both random pivots and the base prototype pivot set of Micó et al. (Pattern Recogn Lett 15:9–17, 2015), are considered. The methodology proposed is evaluated on real data sets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#real-sim.

  2. 2.

    Features with zero values in every entry were removed.

  3. 3.

    https://archive.ics.uci.edu/ml/datasets/Arcene.

  4. 4.

    http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/.

References

  1. 1.

    Camelo, S., Gonzalez-Lima, M., Quiroz, A.J.: Nearest neighbors methods for support vector machines. Ann. Oper. Res. 235, 85–101 (2015)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Chavez, E., Figueroa, K., Navarro, G.: Proximity searching in high dimensional spaces with a proximity preserving order. In: MICAI 2005: Advances in Artificial Intelligence, pp. 405–414. Springer, Berlin (2005)

  3. 3.

    Chavez, E., Navarro, G.: An effective clustering algorithm to index high dimensional metric spaces. In: SPIRE 2000. Proceedings of the Seventh International Symposium on String Processing and Information Retrieval, pp. 75–86. IEEE, Computer Science (2000)

  4. 4.

    Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  5. 5.

    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  6. 6.

    Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. J. R. Stat. Soc. Ser. B (Methodol.) 39, 262–268 (1977)

    MathSciNet  MATH  Google Scholar 

  7. 7.

    Freund, R., Osuna, E., Girosi, F.: An improved training algorithm for support vector machines. In: Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Workshop, pp. 276–285 (1997)

  8. 8.

    Gieseke, F., Airola, A., Pahikkala, T., Kramer, O.: Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing 123, 23–32 (2014)

    Article  Google Scholar 

  9. 9.

    Hart, P., Duda, R., Stork, D.: Pattern Classification. Wiley, Hoboken (2000)

    Google Scholar 

  10. 10.

    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2008)

    Google Scholar 

  11. 11.

    Kim, D., Der, M., Saul, L.: A Gaussian latent variable model for large margin classification of labeled and unlabeled data. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. W&CP, JMLR, vol. 33, pp. 484–492 (2014)

  12. 12.

    Mico, M.L., Oncino, J., Vidal, E.: A new version of the nearest neighbours approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (2015)

    Article  Google Scholar 

  13. 13.

    Mangasarian, O., Musicant, D.: Succesive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)

    Article  Google Scholar 

  14. 14.

    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 41–65. MIT Press, Cambridge (1998)

    Google Scholar 

  15. 15.

    Shin, H., Cho, S.: Neighborhood property based pattern selection for support vector machines. Neural Comput. 19, 816–855 (2007)

    Article  Google Scholar 

  16. 16.

    Sindhwani, V., Keerthi, S.S.: Large scale semi-supervised linear SVMs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 477–484. ACM (2006)

  17. 17.

    Sindhwani, V., Keerthi, S.S.: Newton methods for fast solution of semi-supervised linear SVMs. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines, pp. 155–174. MIT Press (2007)

  18. 18.

    Suykens, J.A.K., van Gestel, T., De Brabanter, J., De Moore, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Publishing Co., Hackensack (2002)

    Google Scholar 

  19. 19.

    Teo, C.H., Vishwanthan, S.V.N., Smola, A.J., Le, Q.V.: Bundle methods for regularized risk minimization. J. Mach. Learn. Res. 11(Jan), 311–365 (2010)

    MathSciNet  MATH  Google Scholar 

  20. 20.

    Zhang, X., Saha, A., Vishwanathan, S.V.N.: Smoothing multivariate performance measures. J. Mach. Learn. Res. 13(Dec), 3623–3680 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alvaro J. Riascos Villegas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Montañés, D.C., Quiroz, A.J., Dulce Rubio, M. et al. Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces. Optim Lett 15, 391–404 (2021). https://doi.org/10.1007/s11590-020-01616-w

Download citation

Keywords

  • Nearest neighbors methods
  • Support vector machines
  • High dimensional features
  • Pivots