Extreme vector machine for fast training on large data

  • Xiaoqing GuEmail author
  • Fu-lai Chung
  • Shitong Wang
Original Article


Quite often, different types of loss functions are adopted in SVM or its variants to meet practical requirements. How to scale up the corresponding SVMs for large datasets are becoming more and more important in practice. In this paper, extreme vector machine (EVM) is proposed to realize fast training of SVMs with different yet typical loss functions on large datasets. EVM begins with a fast approximation of the convex hull, expressed by extreme vectors, of the training data in the feature space, and then completes the corresponding SVM optimization over the extreme vector set. When hinge loss function is adopted, EVM is the same as the approximate extreme points support vector machine (AESVM) for classification. When square hinge loss function, least squares loss function and Huber loss function are adopted, EVM corresponds to three versions, namely, L2-EVM, LS-EVM and Hub-EVM, respectively, for classification or regression. In contrast to the most related machine AESVM, with the retainment of its theoretical advantage, EVM is distinctive in its applicability to a wide variety of loss functions to meet practical requirements. Compared with the other state-of-the-art fast training algorithms CVM and FastKDE of SVMs, EVM indeed relaxes the limitation of least squares loss functions, and experimentally exhibits its superiority in training time, robustness capability and number of support vectors.


Support vector machine Convex hull Loss functions Fast training 



This work was supported in part by the Hong Kong Polytechnic University under Grant G-UA3W, by the National Natural Science Foundation of China under Grant nos. 61572236, 61702225 and 61806026, by the Natural Science Foundation of Jiangsu Province under Grant BK20161268 and BK20180956.


  1. 1.
    Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297zbMATHGoogle Scholar
  2. 2.
    Vapnik V (1995) The nature of statistical learning theory. Springer, BerlinzbMATHGoogle Scholar
  3. 3.
    Tahira M, Khan A (2016) Protein subcellular localization of fluorescence microscopy images: employing new statistical and Texton based image features and SVM based ensemble classification. Inf Sci 345(6):65–80Google Scholar
  4. 4.
    Li YJ, Leng QK, Fu YZ (2017) Cross kernel distance minimization for designing support vector machines. Int J Mach Learn Cybernet 8(5):1585–1593Google Scholar
  5. 5.
    Hu L, Lu SX, Wang XZ (2013) A new and informative active learning approach for support vector machine. Inf Sci 244(9):142–160MathSciNetzbMATHGoogle Scholar
  6. 6.
    Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybernet 8(4):1211–1221Google Scholar
  7. 7.
    Reshma K, Pal A (2017) Tree based multi-category Laplacian TWSVM for content based image retrieval. Int J Mach Learn Cybernet 8(4):1197–1210Google Scholar
  8. 8.
    Muhammad T, Shubham K (2017) A regularization on Lagrangian twin support vector regression. Int J Mach Learn Cybernet 8(3):807–821Google Scholar
  9. 9.
    Williams C, Seeger M (2000) Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th international conference on neural information processing systems, pp 661–667Google Scholar
  10. 10.
    Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1595Google Scholar
  11. 11.
    Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: International conference on neural information processing systems. Curran Associates Inc., pp 1177–1184Google Scholar
  12. 12.
    Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288MathSciNetzbMATHGoogle Scholar
  13. 13.
    Keerthi S, Shevade S, Bhattachayya C, Murth K (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649zbMATHGoogle Scholar
  14. 14.
    Peng XJ, Kong LY, Chen DJ (2017) A structural information-based twin-hypersphere support vector machine classifier. Int J Mach Learn Cybernet 8(1):295–308Google Scholar
  15. 15.
    Joachims T (1999) Making large-scale support vector machine learning practical. Advances in kernel methods. MIT Press, Cambridge, pp 169–184Google Scholar
  16. 16.
    Wang D, Qiao H, Zhang B, Wang M (2013) Online support vector machine based on convex hull vertices selection. IEEE Trans Neural Netw Learn Syst 24(4):593–609Google Scholar
  17. 17.
    Gu XQ, Chung FL, Wang ST (2018) Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks. Knowl Based Syst 151(1):149–164Google Scholar
  18. 18.
    Osuna E, Castro OD (2002) Convex hull in feature space for support vector machines. In: Proceedings of advances in artificial intelligence, pp 411–419Google Scholar
  19. 19.
    Osuna E, Tsang I, Kwok J, Cheung P (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392MathSciNetzbMATHGoogle Scholar
  20. 20.
    Tsang I, Kwok J, Zurada J (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140Google Scholar
  21. 21.
    Tsang I, Kwok A, Kwok J (2007) Simpler core vector machines with enclosing balls. In: Proceedings of the 24th international conference on machine learning, pp 911–918Google Scholar
  22. 22.
    Wang ST, Wang J, Chung F (2014) Kernel density estimation, kernel methods, and fast learning in large data sets. IEEE Trans Cybernet 44(1):1–20Google Scholar
  23. 23.
    Nandan M, Khargonekar PP, Talathi SS (2014) Fast SVM training using approximate extreme points. J Mach Learn Res 15:59–98MathSciNetzbMATHGoogle Scholar
  24. 24.
    Huang CQ, Chung FL, Wang ST (2016) Multi-view L2-SVM and its multi-view core vector machine. Neural Netw 75(3):110–125Google Scholar
  25. 25.
    Suykens J, Gestel T, Brabanter J, Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific Pub, SingaporezbMATHGoogle Scholar
  26. 26.
    Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104zbMATHGoogle Scholar
  27. 27.
    Karasuyama M, Takeuchi I (2010) Nonlinear regularization path for the modified Huber loss support vector machines. In: Proceedings of international joint conference on neural networks, pp 1–8Google Scholar
  28. 28.
    Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126zbMATHGoogle Scholar
  29. 29.
    Chau A, Li X, Yu W (2013) Large data sets classification using convex–concave hull and support vector machine. Soft Comput 17(5):793–804Google Scholar
  30. 30.
    Theodoridis S, Mavroforakis M (2007) Reduced convex hulls: a geometric approach to support vector machines. IEEE Signal Process Mag 24(3):119–122Google Scholar
  31. 31.
    Blum M, Floyd RW, Pratt V, Rivest RL, Tarjan RE (1973) Time bounds for selection. J Comput Syst Sci 7(8):448–461MathSciNetzbMATHGoogle Scholar
  32. 32.
    Tax D, Duin R (1999) Support vector domain description. Pattern Recogn Lett 20(11):1191–1199Google Scholar
  33. 33.
    Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178MathSciNetzbMATHGoogle Scholar
  34. 34.
    Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M (1997) Deterministic edge-preserving regularization in computed imaging. IEEE Trans Image Proc 6(2):298–311Google Scholar
  35. 35.
    Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, CambridgezbMATHGoogle Scholar
  36. 36.
    Ye J, Xiong T (2007) SVM versus least squares SVM. In: Proceedings of the 7th international conference on artificial intelligence and statistics, pp 644–651Google Scholar
  37. 37.
    Lin C. LIBSVM data. Accessed 28 Feb 2017
  38. 38.
    Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2):255–287Google Scholar
  39. 39.
    Gao S, Tsang IW, Chia LT (2013) Sparse representation with kernels. IEEE Trans Image Process 22(2):423–434MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Digital MediaJiangnan UniversityWuxiChina
  2. 2.School of Information Science and EngineeringChangzhou UniversityChangzhouChina
  3. 3.Department of ComputingHong Kong Polytechnic UniversityKowloonChina

Personalised recommendations