Abstract
Soft-margin support vector machines (SVMs) are an important class of classification models that are well known to be highly accurate in a variety of settings and over many applications. The training of SVMs usually requires that the data be available all at once, in batch. The Stochastic majorization–minimization (SMM) algorithm framework allows for the training of SVMs on streamed data instead. We utilize the SMM framework to construct algorithms for training hinge loss, squared-hinge loss, and logistic loss SVMs. We prove that our three SMM algorithms are each convergent and demonstrate that the algorithms are comparable to some state-of-the-art SVM-training methods. An application to the famous MNIST data set is used to demonstrate the potential of our algorithms.
Similar content being viewed by others
References
Abe, S. (2010). Support Vector Machines for Pattern Classification. London: Springer.
Bohning, D., & Lindsay, B. R. (1988). Monotonicity of quadratic-approximation algorithms. Annals of the Institute of Mathematical Statistics, 40, 641–663.
Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.
Cappe, O., & Moulines, E. (2009). On-line expectation-maximizatoin algorithm for latent data models. Journal of the Royal Statistical Society B, 71, 593–613.
Chouzenoux, E., Idier, J., & Moussaoui, S. (2011). A majorize-minimize strategy for subspace otpimization applied to image restoration. IEEE Transactions on Image Processing, 20, 1517–1528.
Chouzenoux, E., Jezierska, A., Pesquet, J.-C., & Talbot, H. (2013). A majorize-minimize subspace approach for \(l_2\)-\(l_0\) image regularization. SIAM Journal of Imaging Science, 6, 563–591.
Chouzenoux, E., & Pesquet, J.-C. (2017). A stochastic majorize-minimize subspace algorithm for online penalized least squares estimation. IEEE Transactions on Signal Processing, 65, 4770–4783.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
De Pierro, A. R. (1993). On the relation between the ISRA and the EM algorithm for positron emission tomography. IEEE Transactions on Medical Imaging, 12, 328–333.
Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. New York: Springer.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Groenen, P. J. F., Nalbantov, G., & Bioch, J. C. (2008). SVM-Maj: a majorization approach to linear support vector machines with different hinge errors. Advances in Data Analysis and Classification, 2, 17–43.
Helleputte, T. (2017). LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library
Hsia, C.-Y., Zhu, Y., & Lin, C.-J. (2017). A study on trust region update rules in Newton methods for large-scale linear classification. Proceedings of Machine Learning Research, 77, 33–48.
Jolliffe, I. T. (2002). Principal Component Analysis. New York: Springer.
Kim, S., Pasupathy, R., & Henderson, S. G. (2015). Handbook of Simulation Optimization, chapter A guide to sample average approximation (pp. 207–243). New York: Springer.
Lange, K. (2016). MM Optimization Algorithms. Philadelphia: SIAM.
LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Lin, C.-J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research, 9, 627–650.
Mairal, J. (2013). Stochastic majorization-minimization algorithms for large-scale optimization. In Advances in Neural Information Processing Systems (pp. 2283–2291)
Mairal, J. (2015). Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal of Optimization, 25, 829–855.
McAfee, A., Brynjolfsson, E., & Davenport, T. H. (2012). Big data: the management revolution. Harvard Business Review, 90, 60–68.
Navia-Vasquez, A., Perez-Cruz, F., Artes-Rodriguez, A., & Figueiras-Vidal, A. R. (2001). Weighted least squares training of support vector classifiers leading to compact and adaptive schemes. IEEE Transactions on Neural Networks, 12, 1047–1059.
Nguyen, H. D. & McLachlan, G. J. (2017). Iteratively-reweighted least-squares fitting of support vector machines: a majorization-minimization algorithm approach. In Proceedings of the 2017 Future Technologies Conference (FTC)
Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7, 186–199.
R Core Team (2016). R: a language and environment for statistical computing. R Foundation for Statistical Computing
Razaviyayn, M., Hong, M., & Luo, Z.-Q. (2013). A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal of Optimization, 23, 1126–1153.
Razaviyayn, M., Sanjabi, M., & Luo, Z.-Q. (2016). A stochastic successive minimization method for nonsmooth nonconvex optimization with applications to transceiver design in wireless communication networks. Mathematical Programming Series B, 157, 515–545.
Scholkopf, B., & Smola, A. J. (2002). Learning with Kernels. Cambridge: MIT Press.
Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: primal estimated sub-gradient solver for SVM. Mathematical Programming Series B, 127, 3–30.
Shawe-Taylor, J., & Sun, S. (2011). A review of optimization methodologies in support vector machines. Neurocomputing, 74, 3609–3618.
Steinwart, I., & Christmann, A. (2008). Support Vector Machine. New York: Springer.
Titterington, D. M. (1984). Recursive parameter estimation using incomplete data. Journal of the Royal Statistical Society B, 46, 257–267.
Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning
Acknowledgements
We thank the Associate Editor and Reviewer of the article for making helpful comments that greatly improved our exposition. HDN was supported by Australian Research Council (ARC) Grant DE170101134. GJM was supported by ARC Grant DP170100907.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, H.D., Jones, A.T. & McLachlan, G.J. Stream-suitable optimization algorithms for some soft-margin support vector machine variants. Jpn J Stat Data Sci 1, 81–108 (2018). https://doi.org/10.1007/s42081-018-0001-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-018-0001-y