Support Vector Machines for Classification: A Statistical Portrait

  • Yoonkyung Lee
Part of the Methods in Molecular Biology book series (MIMB, volume 620)


The support vector machine is a supervised learning technique for classification increasingly used in many applications of data mining, engineering, and bioinformatics. This chapter aims to provide an introduction to the method, covering from the basic concept of the optimal separating hyperplane to its nonlinear generalization through kernels. A general framework of kernel methods that encompass the support vector machine as a special case is outlined. In addition, statistical properties that illuminate both advantage and limitation of the method due to its specific mechanism for classification are briefly discussed. For illustration of the method and related practical issues, an application to real data with high-dimensional features is presented.

Key words

Classification machine learning kernel methods regularization support vector machine 


  1. 1.
    Hastie, T., Tibshirani, R., and Friedman, J. (2001) The Elements of Statistical Learning. Springer Verlag, New York.Google Scholar
  2. 2.
    Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification (2nd Edition). Wiley-Interscience, New York.Google Scholar
  3. 3.
    McLachlan, G. J. (2004) Discriminant Analysis and Statistical Pattern Recognition. Wiley-Interscience, New York.Google Scholar
  4. 4.
    Vapnik, V. (1998) Statistical Learning Theory. Wiley, New York.Google Scholar
  5. 5.
    Boser, B., Guyon, I., and Vapnik, V. (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5, 144–152.CrossRefGoogle Scholar
  6. 6.
    Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines. Cambridge University Press, Cambridge.Google Scholar
  7. 7.
    Schölkopf, B. and Smola, A. (2002) Learning with Kernels – Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge, MA.Google Scholar
  8. 8.
    Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning 20(3), 273–297.Google Scholar
  9. 9.
    Rosenblatt, F. (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 386–408.PubMedCrossRefGoogle Scholar
  10. 10.
    Burges, C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167.CrossRefGoogle Scholar
  11. 11.
    Bennett, K. P. and Campbell, C. (2000) Support vector machines: Hype or hallelujah? SIGKDD Explorations 2(2), 1–13.CrossRefGoogle Scholar
  12. 12.
    Moguerza, J. M., and Munoz, A. (2006) Support vector machines with applications. Statistical Science 21(3), 322–336.CrossRefGoogle Scholar
  13. 13.
    Hoerl, A. and Kennard, R. (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67.CrossRefGoogle Scholar
  14. 14.
    Tibshirani, R. (1996) Regression selection and shrinkage via the lasso. Journal of the Royal Statistical Society B 58(1), 267–288.Google Scholar
  15. 15.
    Mangasarian, O. (1994) Nonlinear Programming. Classics in Applied Mathematics, Vol. 10, SIAM, Philadelphia.CrossRefGoogle Scholar
  16. 16.
    Wahba, G. (1990) Spline Models for Observational Data. Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia.CrossRefGoogle Scholar
  17. 17.
    Wahba, G. (1998) Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In Schölkopf, B., Burges, C. J. C., and Smola, A. J. (ed.), Advances in Kernel Methods: Support Vector Learning, MIT Press, p. 69–87.Google Scholar
  18. 18.
    Aronszajn, N. (1950) Theory of reproducing kernel. Transactions of the American Mathematical Society 68, 3337–3404.CrossRefGoogle Scholar
  19. 19.
    Kimeldorf, G. and Wahba, G. (1971) Some results on Tchebychean Spline functions. Journal of Mathematics Analysis and Applications 33(1), 82–95.CrossRefGoogle Scholar
  20. 20.
    Schölkopf, B., Tsuda, K., and Vert, J. P. (ed.) (2004) Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.Google Scholar
  21. 21.
    Zhang, T. (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics 32(1), 56–85.CrossRefGoogle Scholar
  22. 22.
    Bartlett, P. L., Jordan, M. I., and McAuliffe, J. D. (2006) Convexity, classification, and risk bounds. Journal of the American Statististical Association 101, 138–156.CrossRefGoogle Scholar
  23. 23.
    Lin, Y. (2002) A note on margin-based loss functions in classification. Statistics and Probability Letters 68, 73–82.CrossRefGoogle Scholar
  24. 24.
    Lee, Y., Lin, Y., and Wahba, G. (2004) Multicategory Support Vector Machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99, 67–81.CrossRefGoogle Scholar
  25. 25.
    Tewari, A. and Bartlett, P. L. (2007) On the consistency of multiclass classification methods. Journal of Machine Learning Research 8, 1007–1025.Google Scholar
  26. 26.
    Liu, Y. and Shen, X. (2006) Multicategory SVM and ψ-learning-methodology and theory. Journal of the American Statistical Association 101, 500–509.CrossRefGoogle Scholar
  27. 27.
    Steinwart, I. (2005) Consistency of support vector machines and other regularized kernel machines. IEEE Transactions on Information Theory 51, 128–142.CrossRefGoogle Scholar
  28. 28.
    Koo, J.-Y., Lee, Y., Kim, Y., and Park, C. (2008) A Bahadur representation of the linear Support Vector Machine. Journal of Machine Learning Research 9, 1343–1368.Google Scholar
  29. 29.
    van’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536.CrossRefGoogle Scholar
  30. 30.
    Zhu, J. and Hastie, T. (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3), 427–443.PubMedCrossRefGoogle Scholar
  31. 31.
    Wahba, G. (2002) Soft and hard classification by reproducing kernel Hilbert space methods. Proceedings of the National Academy of Sciences 99, 16524–16530.CrossRefGoogle Scholar
  32. 32.
    Lin, Y., Lee, Y., and Wahba, G. (2002) Support vector machines for classification in nonstandard situations. Machine Learning 46, 191–202.CrossRefGoogle Scholar
  33. 33.
    Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46(1–3), 389–422.CrossRefGoogle Scholar
  34. 34.
    Chen, S. S., Donoho, D. L., and Saunders, M. A. (1999) Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20(1), 33–61.CrossRefGoogle Scholar
  35. 35.
    Bradley, P. S., and Mangasarian, O. L. (1998) Feature selection via concave minimization and support vector machines. In Shavlik, J. (ed.), Machine Learning Proceedings of the Fifteenth International Conference Morgan Kaufmann, San Francisco, California, p. 82–90.Google Scholar
  36. 36.
    Zhu, J., Rosset, S., Hastie, T., and Tibshirani, R. (2004) 1-norm support vector machines. In Thrun, S., Saul, L., and Schölkopf, B. (ed.), Advances in Neural Information Processing Systems 16, MIT Press, Cambridge, MA.Google Scholar
  37. 37.
    Weston, J., Elisseff, A., Schölkopf, B., and Tipping, M. (2003) Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439–1461.Google Scholar
  38. 38.
    Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., and Vapnik, V. (2001) Feature selection for SVMs. In Solla, S. A., Leen, T. K., and Muller, K.-R. (ed.), Advances in Neural Information Processing Systems 13, MIT Press, Cambridge, MA, pp. 668–674.Google Scholar
  39. 39.
    Chapelle, O., Vapnik, V., Bousquet, O., and Mukherjee, S. (2002) Choosing multiple parameters for support vector machines. Machine Learning 46 (1–3), 131–59.CrossRefGoogle Scholar
  40. 40.
    Zhang, H. H. (2006) Variable selection for support vector machines via smoothing spline ANOVA. Statistica Sinica 16(2), 659–674.Google Scholar
  41. 41.
    Lee, Y., Kim, Y., Lee, S., and Koo, J.-Y. (2006) Structured Multicategory Support Vector Machine with ANOVA decomposition. Biometrika 93(3), 555–571.CrossRefGoogle Scholar
  42. 42.
    Lin, Y. and Zhang, H. H. (2006) Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics 34, 2272–2297.CrossRefGoogle Scholar
  43. 43.
    Bottou, L., and Lin, C.-J. (2007) Support Vector Machine Solvers. In Bottou, L., Chapelle, O., DeCoste, D., and Weston, J. (ed.), Large Scale Kernel Machines, MIT Press, Cambridge, MA, pp. 301–320.Google Scholar
  44. 44.
    Joachims, T. (1998) Making large-scale support vector machine learning practical. In Schölkopf, C. B. (ed.), Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA.Google Scholar
  45. 45.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. (2008) LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874.Google Scholar
  46. 46.
    Hastie, T., Rosset, S., Tibshirani, R., and Zhu, J. (2004) The entire regularization path for the support vector machine. Journal of Machine Learning Research 5, 1391–1415.Google Scholar
  47. 47.
    Lee, Y. and Cui, Z. (2006) Characterizing the solution path of Multicategory Support Vector Machines. Statistica Sinica 16(2), 391–409.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Yoonkyung Lee
    • 1
  1. 1.Department of StatisticsThe Ohio State UniversityColumbusUSA

Personalised recommendations