Problem formulations and solvers in linear SVM: a review

Chauhan, Vinod Kumar; Dahiya, Kalpana; Sharma, Anuj

doi:10.1007/s10462-018-9614-6

Problem formulations and solvers in linear SVM: a review

Published: 16 January 2018

Volume 52, pages 803–855, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

8288 Accesses
272 Citations
2 Altmetric
1 Mention
Explore all metrics

Abstract

Support vector machine (SVM) is an optimal margin based classification technique in machine learning. SVM is a binary linear classifier which has been extended to non-linear data using Kernels and multi-class data using various techniques like one-versus-one, one-versus-rest, Crammer Singer SVM, Weston Watkins SVM and directed acyclic graph SVM (DAGSVM) etc. SVM with a linear Kernel is called linear SVM and one with a non-linear Kernel is called non-linear SVM. Linear SVM is an efficient technique for high dimensional data applications like document classification, word-sense disambiguation, drug design etc. because under such data applications, test accuracy of linear SVM is closer to non-linear SVM while its training is much faster than non-linear SVM. SVM is continuously evolving since its inception and researchers have proposed many problem formulations, solvers and strategies for solving SVM. Moreover, due to advancements in the technology, data has taken the form of ‘Big Data’ which have posed a challenge for Machine Learning to train a classifier on this large-scale data. In this paper, we have presented a review on evolution of linear support vector machine classification, its solvers, strategies to improve solvers, experimental results, current challenges and research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Optimization of Multi-class Support Vector Machines with MSVMpack

Feature selection in SVM via polyhedral k-norm

Article 18 September 2019

Automated Optimization of Non-linear Support Vector Machines for Binary Classification

Notes

datasets used in the experiments are available at link: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

Abe S (2015) Fuzzy support vector machines for multilabel classification. Pattern Recogn 48(6):2110–2117
Article MATH Google Scholar
Abe S, Inoue T (2002) Fuzzy support vector machines for multiclass problems. In: European symposium on artificial neural networks, pp 113–118
Beck A, Tetruashvili L (2013) On the convergence of block coordinate descent type methods. SIAM J Optim 23(4):2037–2060
Article MathSciNet MATH Google Scholar
Becker S, Fadili M (2012) A quasi-newton proximal splitting method. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, pp 2627–2635
Blondel M, Seki K, Uehara K (2013) Block coordinate descent algorithms for large-scale sparse multiclass classification. Mach Learn 93(1):31–52
Article MathSciNet MATH Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory, ACM Press, pp 144–152
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Lechevallier Y, Saporta G (eds) Proceedings of the 19th international conference on computational statistics (COMPSTAT’2010), Springer, Paris, pp 177–187. http://leon.bottou.org/papers/bottou-2010
Bottou L (2012) Neural networks: tricks of the trade. 2nd edn, Springer, Berlin, pp 421–436
Bottou L, Lin CJ (2007) Support vector machine solvers. Science 3(1):1–27
Google Scholar
Bottou L, Curtis FE, Nocedal J (2016) Optimization methods for large-scale machine learning. arXiv:1606.04838
Bredensteiner EJ, Bennett KP (1999) Computational optimization: a tribute to olvi mangasarian, vol I, Springer, Boston, chap Multicategory Classification by Support Vector Machines, pp 53–79
Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031
Article MathSciNet MATH Google Scholar
Cao LJ, Keerthi SS, Ong CJ, Zhang JQ, Periyathamby U, Fu XJ, Lee HP (2006) Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw 17(4):1039–1049
Article Google Scholar
Cauchy AL (1847) Méthode générale pour la résolution des systèmes d’équations simultanées. Compte Rendu des S’eances de L’Acad’emie des Sciences XXV S’erie A(25):536–538
Google Scholar
Chang CC, Hsu CW, Lin CJ (2000) The analysis of decomposition methods for support vector machines. IEEE Transactions on Neural Networks 11(4):1003–1008
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Article Google Scholar
Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9:1369–1398
MathSciNet MATH Google Scholar
Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178
Article MathSciNet MATH Google Scholar
Chauhan VK, Dahiya K, Sharma A (2017a) Mini-batch block-coordinate based stochastic average adjusted gradient methods to solve big data problems. In: Zhang ML, Noh YK (eds) Proceedings of the ninth Asian conference on machine learning, PMLR, Proceedings of machine learning research, vol 77, pp 49–64. http://proceedings.mlr.press/v77/chauhan17a.html
Chauhan VK, Dahiya K, Sharma A (2017b) Trust region levenberg-marquardt method for linear svm. In: Ninth international conference on advances in pattern recognition (ICAPR-2017)
Yc C, Su C (2016) Distance-based margin support vector machine for classification. Appl Math Comput 283:141–152
MathSciNet Google Scholar
Collobert R, Bengio S, Bengio Y (2002) A parallel mixture of svms for very large scale problems. Neural Comput 14(5):1105–1114
Article MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cotter A, Shalev-Shwartz S, Srebro N (2013) Learning optimally sparse support vector machines. In: Proceedings of the 30th ICML, pp 266–274
Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(1995):201–233
Article MATH Google Scholar
De S, Taylor G, Goldstein T (2015) Variance reduction for distributed stochastic gradient descent. arXiv:1512.01708
Defazio A, Bach F, Lacoste-Julien S (2014) Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the 27th international conference on neural information processing systems, MIT Press, Cambridge, NIPS’14, pp 1646–1654
Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp 109–117
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. In: Advances in computational mathematics, MIT Press, Cambridge, pp 1–50
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. JMLR 9:1871–1874
MATH Google Scholar
Farquhar JDR, Meng H, Szedmak S, Hardoon DR, Shawe-Taylor J (2006) Two view learning: Svm-2k, theory and practice. In: Advances in neural information processing systems, MIT Press, Cambridge
Ferris MC, Munson TS (2002) Interior-point methods for massive support vector machines. SIAM J Optim 13(3):783–804
Article MathSciNet MATH Google Scholar
Fletcher R, Reeves CM (1964) Function minimization by conjugate gradients. Comput J 7(2):149–154. https://doi.org/10.1093/comjnl/7.2.149
Article MathSciNet MATH Google Scholar
Franc V, Sonnenburg S (2008) Optimized cutting plane algorithm for support vector machines. In: Proceedings of the 25th international conference on machine learning, ACM, New York, NY, ICML ’08, pp 320–327
Friedman J (1996) Another approach to polychotomous classification. Stanford University, Tech Rep, Dept Statistics
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’01, pp 77–86
Gao Y, Sun S (2010) An empirical evaluation of linear and nonlinear kernels for text classification using support vector machines. In: 2010 seventh international conference on fuzzy systems and knowledge discovery, vol 4, pp 1502–1505, https://doi.org/10.1109/FSKD.2010.5569327
Gibson N (2011) Gradient-based methods for optimization. part i
Graf HP, Cosatto E, Bottou L, Durdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Advances in neural information processing systems pp 521–528
Guermeur Y (2002) Combining discriminant models with new multi-class svms. Pattern Anal Appl 5(2):168–179
Article MathSciNet MATH Google Scholar
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(6):409–436
Article MathSciNet MATH Google Scholar
Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on machine learning, ACM, New York, NY, ICML ’08, pp 408–415
Hsieh CJ, Si S, Dhillon I (2014) A divide-and-conquer solver for Kernel support vector machines. JMLR W&Cp 32(1):566–574
Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425. https://doi.org/10.1109/72.991427
Article Google Scholar
Jayadeva, Khemchandani R, Chandra S (2004) Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomput 61(C):401–411
Article Google Scholar
Jayadeva KR, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article MATH Google Scholar
Khemchandani R, Chandra S (2017) Twin support vector machines–models, extensions and applications, studies in computational intelligence, vol 659. Springer, Berlin
Ji Y, Sun S (2011) Multitask multiclass support vector machines. In: 2011 IEEE 11th international conference on data mining workshops, pp 512–518
Ji Y, Sun S (2013) Multitask multiclass support vector machines: model and experiments. Pattern Recogn 46(3):914–924
Article MATH Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the sixteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, ICML ’99, pp 200–209
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ’06, pp 217–226
Joachims T, Dortmund U, Joachimscsuni-dortmundde T (1999) Making large-scale SVM learning practical. In: Advances in Kernel methods–support vector learning, pp 41–56
Joachims T, Finley T, Yu CNJ (2009) Cutting-plane training of structural svms. Mach Learn 77(1):27–59
Article MATH Google Scholar
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26, Curran Associates, Inc., pp 315–323
Kao W, Chung K, Sun C, Lin C (2004) Decomposition methods for linear support vector machines. Neural Comput 16:1689–1704
Article MATH Google Scholar
Keerthi SS, DeCoste D (2005) A modified finite newton method for fast solution of large scale linear svms. JMLR 6:341–361
MathSciNet MATH Google Scholar
Keerthi SS, Decoste D (2006) Building support vector machines with reduced classifier complexity. JMLR 7:1493–1515
MathSciNet MATH Google Scholar
Keerthi SS, Sundararajan S, Chang KW, Hsieh CJ, Lin CJ (2008) A sequential dual method for large scale multi-class linear svms. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, KDD ’08, pp 408–416
Knerr S, Personnaz L, Dreyfus G (1990) Neurocomputing: algorithms, architectures and applications. Springer, Berlin, pp 41–50
Book Google Scholar
Konečný J, Richtárik P (2013) Semi-stochastic gradient descent methods, 1:19. arXiv:1312.1666
Kressel UHG (1999) Advances in Kernel methods. MIT Press, Cambridge, MA, pp 255–268
Google Scholar
Ladicky L, Torr P (2011) Locally linear support vector machines. ICML, Stockholm, pp 985–992
Google Scholar
Lazar A (2010) A comparison of linear support vector machine algorithms on large non-sparse datasets. In: Proceedings–9th ICMLA
Lee CP, Lin CJ (2013) A study on l2-loss (squared hinge-loss) multi-class svm. Neural Comput 25:1302–1323
Article MathSciNet MATH Google Scholar
Lee CP, Roth D (2015) Distributed box-constrained quadratic optimization for dual linear SVM. In: Proceedings of the 32nd ICML, 37
Lee YJ, Mangasarian O (2001) Ssvm: a smooth support vector machine for classification. Comput Optim Appl 20(1):5–22
Article MathSciNet MATH Google Scholar
Levenberg K (1944) A method for the solution of certain problems in least squares. Quart Appl Math 2:164–168
Article MathSciNet MATH Google Scholar
Li L, Li Y, Su H, Chu J (2006) Intelligent computing: international conference on intelligent computing, ICIC 2006, Kunming, China, August 16–19, 2006. Proceedings, Part I, Springer Berlin Heidelberg, Berlin, Heidelberg, chap Least Squares Support Vector Machines Based on Support Vector Degrees, pp 1275–1281
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014a) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, USENIX Association, Berkeley, CA, OSDI’14, pp 583–598
Li M, Zhang T, Chen Y, Smola AJ (2014b) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’14, pp 661–670
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Article Google Scholar
Lin CJ (2001) On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 12(6):1288–1298
Article Google Scholar
Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. JMLR 9:627–650
MathSciNet MATH Google Scholar
Lin CY, Tsai CH, Lee CP, Lin CJ (2014) Large-scale logistic regression and linear support vector machines using spark. In: 2014 IEEE international conference on Big Data (Big Data) pp 519–528
Lin HT, Lin CJ, Weng RC (2007) A note on platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276
Article Google Scholar
Liu D, Shi Y, Tian Y, Huang X (2016) Ramp loss least squares support vector machine. J Comput Sci 14:61–68
Article MathSciNet Google Scholar
Lu Z, Xiao L (2015) On the complexity analysis of randomized block-coordinate descent methods. Math Program 152(1):615–642
Article MathSciNet MATH Google Scholar
Luss R, d’Aspremont A (2009) Support vector machine classification with indefinite kernels. Math Program Comput 1(2):97–118
Article MathSciNet MATH Google Scholar
Mangasarian OL (2001) A finite newton method for classification problems. Tech. rep., Data Mining Institute, Computer Sciences Department, University of Wisconsin
Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7:1517–1530
MathSciNet MATH Google Scholar
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. JMLR 1:161–177
MathSciNet MATH Google Scholar
Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Article Google Scholar
Manno A, Palagi L, Sagratella S (2015) A class of parallel decomposition algorithms for SVMs training, pp 1–24
Marquardt D (1963) An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math 11:431–441
Article MathSciNet MATH Google Scholar
Moré JJ (1978) Numerical analysis. In: Proceedings of the biennial conference held at Dundee, June 28–July 1, 1977, Springer, Berlin, pp 105–116
Needell D, Srebro N, Ward R (2014) Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: Proceedings of the 27th international conference on neural information processing systems, MIT Press, Cambridge, MA, NIPS’14, pp 1017–1025
Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–362
Article MathSciNet MATH Google Scholar
Nie F, Huang Y, Wang X, Huang H (2014) New primal SVM solver with linear computational cost for big data classifications. In: Proceedings of the 31st international conference on international conference on machine learning, Vol 32, JMLR.org, ICML’14, pp II–505–II–513
Nie F, Wang X, Huang H (2017) Multiclass capped Lp-norm SVM for robust classifications. The 31st AAAI Conference on Artificial Intelligence (AAAI), San Francisco, USA
Nocedal WS (1999) Numerical optimization. Springer, New York
Book MATH Google Scholar
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239
Article Google Scholar
Parlett B (1998) The symmetric eigenvalue problem. Society for Industrial and Applied Mathematics, Philadelphia, PA
Peng X (2010) A nu-twin support vector machine (nu-tsvm) classifier and its geometric algorithms. Inf Sci 180(20):3863–3875
Article MATH Google Scholar
Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft Res pp MSR–TR–98–14
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin dags for multiclass classification. Adv Neural Inf Process Syst 12:547–553
Google Scholar
Prez-Cruz F, Weston J, Herrmann D, Schölkopf B (2003) Extension of the nu-SVM range for classification. Nato Sci Series Sub Series iii Comput Syst Sci 190:179–196
Google Scholar
Ranganathan A (2004) The levenberq-marquardt algorithm
Recht B, Re C, Wright S, Niu F (2011) Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems 24, Curran Associates, Inc., pp 693–701
Richtárik P, Takáč M (2014) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program 144(1–2):1–38
Article MathSciNet MATH Google Scholar
Richtárik P, Takáč M (2016) Parallel coordinate descent methods for big data optimization. Math Program 156(1):433–484
Article MathSciNet MATH Google Scholar
Schmidt M, Le Roux N, Bach F (2016) Minimizing finite sums with the stochastic average gradient. Math Program pp 1–30
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Article Google Scholar
Shalev-Shwartz S, Singer Y (2006) Online learning meets optimization in the dual, pp 423–437
Shalev-Shwartz S, Zhang T (2013a) Accelerated mini-batch stochastic dual coordinate ascent. In: NIPS’ 13 Proceedings of the 26th International Conference on Neural Information Processing Systems, Vol 1. pp 378–385
Shalev-Shwartz S, Zhang T (2013b) Stochastic dual coordinate ascent methods for regularized loss. JMLR 14(1):567–599
MathSciNet MATH Google Scholar
Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, ICML ’07, pp 807–814
Shawe-Taylor J, Sun S (2011) A review of optimization methodologies in support vector machines. Neurocomputing 74(17):3609–3618
Article Google Scholar
Shi Z, Liu R (2016) A better convergence analysis of the block coordinate descent method for large scale machine learning. arXiv:1608.04826
Stefano M, Mikhail B (2011) Laplacian support vector machines trained in the primal. JMLR 12:1149–1184
MathSciNet MATH Google Scholar
Steinwart I (2004) Sparseness of support vector machinessome asymptotically sharp bounds. Adv Neural Inf Process Syst 16:1069–1076
Google Scholar
Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: Proceedings of the 19th international conference on artificial neural networks: Part I, Springer, Berlin, ICANN ’09, pp 884–893
Sun S, Xie X (2016) Semisupervised support vector machines with tangent space intrinsic manifold regularization. IEEE Trans Neural Netw Learn Syst 27(9):1827–1839
Article MathSciNet Google Scholar
Suykens J, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Tak M, Bijral A, Richtrik P, Srebro N (2013) Mini-batch primal and dual methods for SVMs. In: 30th international conference on machine learning, Springer, Berlin, pp 537–552
Teo CH, Smola A, Vishwanathan SV, Le QV (2007) A scalable modular convex solver for regularized risk minimization. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’07, pp 727–736
Teo CH, Vishwanthan S, Smola AJ, Le QV (2010) Bundle methods for regularized risk minimization. JMLR 11:311–365
MathSciNet MATH Google Scholar
Tsai CH, Lin CJCY (2014) Incremental and decremental training for linear classification. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14, pp 343–352
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Article MathSciNet MATH Google Scholar
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, ICML ’04, p 104
Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York
MATH Google Scholar
Wang Z, Crammer K, Vucetic S (2012) Breaking the curse of kernelization : budgeted stochastic gradient descent for large-scale SVM training. JMLR 13(1):3103–3131
MathSciNet MATH Google Scholar
Wen T, Edelman A, Gorsich D (2003) Contemporary mathematics. American Mathematical Society, Boston, MA, chap A Fast Projected Conjugate Gradient Algorithm for Training Support Vector Machines, pp 245–263
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: ESANN
Willoughby RA (1979) Solutions of ill-posed problems (a. n. tikhonov and v. y. arsenin). SIAM Rev 21(2):266–267
Article Google Scholar
Woodsend K, Gondzio J (2009) Hybrid MPI/OpenMP parallel linear support vector machine training. JMLR 10:1937–1953
MathSciNet MATH Google Scholar
Wright SJ (2015) Coordinate descent algorithms. Math Program 151(1):3–34
Article MathSciNet MATH Google Scholar
Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optim 24(4):2057–2075
Article MathSciNet MATH Google Scholar
Xie X, Sun S (2015) Multi-view twin support vector machines. Intell Data Anal 19(4):701–712
Article Google Scholar
Xu C, Tao D, Xu C (2013) A survey on multi-view learning, pp 1–59. arXiv:1304.5634v1
Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758–1789
Article MathSciNet MATH Google Scholar
Xu Y, Yin W (2015) Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J Optim 25(3):1686–1716
Article MathSciNet MATH Google Scholar
Yang Y, Chen J, Zhu J (2016) Distributing the stochastic gradient sampler for large-scale lda. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’16, pp 1975–1984
Ye J, Xiong T (2007) SVM versus least squares SVM. In: JMLR W&P, pp 644–651
Yuan GX, Chang KW, Hsieh CJ, Lin CJ (2010) A comparison of optimization methods and software for large-scale L1-regularized linear classification. JMLR 11:3183–3234
MathSciNet MATH Google Scholar
Yuan GX, Ho CH, Lin CJ (2012) Recent advances of large-scale linear classification. Proc IEEE 100(9):2584–2603
Article Google Scholar
Zanghirati G, Zanni L (2003) A parallel solver for large quadratic programs in training support vector machines. Parallel Comput 29(4):535–551
Article MathSciNet MATH Google Scholar
Zanni L, Serafini T, Zanghirati G (2006) Parallel software for training large scale support vector machines on multiprocessor systems. JMLR 7:1467–1492
MathSciNet MATH Google Scholar
Zhang A, Gu Q (2016) Accelerated stochastic block coordinate descent with optimal sampling. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’16, pp 2035–2044
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, ICML ’04
Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-norm support vector machines. In: Neural information processing systems, MIT Press, Cambridge, p 16

Download references

Acknowledgements

First author would like to thank Ministry of Human Resource Development, Government of INDIA, for providing fellowship (University Grants Commission–Senior Research Fellowship) to pursue his Ph.D. work.

Author information

Authors and Affiliations

Department of Computer Science and Applications, Panjab University, Chandigarh, India
Vinod Kumar Chauhan & Anuj Sharma
University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Kalpana Dahiya

Authors

Vinod Kumar Chauhan
View author publications
You can also search for this author in PubMed Google Scholar
Kalpana Dahiya
View author publications
You can also search for this author in PubMed Google Scholar
Anuj Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anuj Sharma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chauhan, V.K., Dahiya, K. & Sharma, A. Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52, 803–855 (2019). https://doi.org/10.1007/s10462-018-9614-6

Download citation

Published: 16 January 2018
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10462-018-9614-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Problem formulations and solvers in linear SVM: a review

Abstract

Access this article

Similar content being viewed by others

Efficient Optimization of Multi-class Support Vector Machines with MSVMpack

Feature selection in SVM via polyhedral k-norm

Automated Optimization of Non-linear Support Vector Machines for Binary Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Problem formulations and solvers in linear SVM: a review

Abstract

Access this article

Similar content being viewed by others

Efficient Optimization of Multi-class Support Vector Machines with MSVMpack

Feature selection in SVM via polyhedral k-norm

Automated Optimization of Non-linear Support Vector Machines for Binary Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation