Statistical Learning and Kernel Methods

  • Bernhard Schölkopf
Part of the International Centre for Mechanical Sciences book series (CISM, volume 431)


We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces.


Support Vector Machine Support Vector Feature Space Support Vector Regression Kernel Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aizerman, M. A., Braverman, É.. M., and Rozonoér, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25: 821–837.Google Scholar
  2. Alon, N., Ben-David, S., Cesa-Bianchi, N., and Haussier, D. (1997). Scale-sensitive Dimensions, Uniform Convergence, and Learnability. Journal of the ACM 44 (4): 615–631.CrossRefMATHMathSciNetGoogle Scholar
  3. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society 68: 337–404.CrossRefMATHMathSciNetGoogle Scholar
  4. Bartlett, P. L., and Shawe-Taylor, J. (1999). Generalization performance of support vector machines and other pattern classifiers. In Schölkopf, B., Burges, C. J. C., and Smola, A. J., eds., Advances in Kernel Methods — Support Vector Learning, 43–54. Cambridge, MA: MIT Press.Google Scholar
  5. Berg, C., Christensen, J. P. R., and Ressel, R. (1984). Harmonic Analysis on Semigroups. New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  6. Bertsekas, D. R. (1995). Nonlinear Programming. Belmont, MA: Athena Scientific.MATHGoogle Scholar
  7. Blanz, V., Schölkopf, B., Bülthoff, H., Burges, C., Vapnik, V., and Vetter, T. (1996). Comparison of view-based object recognition algorithms using realistic 3D models. In von der Malsburg, C., von Seelen, W., Vorbrüggen, J. C., and Sendhoff, B., eds., Artificial Neural Networks — ICANN’96, 251–256. Berlin: Springer Lecture Notes in Computer Science, Vol. 1112.Google Scholar
  8. Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Haussier, D., ed., Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, 144–152. Pittsburgh, PA: ACM Press.Google Scholar
  9. Burges, C. J. C., and Schölkopf, B. (1997). Improving the accuracy and speed of support vector learning machines. In Mozer, M., Jordan, M., and Petsche, T., eds., Advances in Neural Information Processing Systems 9, 375–381. Cambridge, MA: MIT Press.Google Scholar
  10. Cortes, C., and Vapnik, V. (1995). Support vector networks. Machine Learning 20: 273–297.MATHGoogle Scholar
  11. DeCoste, D., and Schölkopf, B. (2001). Training invariant support vector machines. Machine Learning. Accepted for publication. Also: Technical Report JPL-MLTR-00–1, Jet Propulsion Laboratory, Pasadena, CA, 2000.Google Scholar
  12. Girosi, F., Jones, M., and Poggio, T. (1995). Regularization theory and neural networks architectures. Neural Computation 7 (2): 219–269.CrossRefGoogle Scholar
  13. Haussier, D. (1999). Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99–10, Computer Science Department, University of California at Santa Cruz.Google Scholar
  14. Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London A 209: 415–446.MATHGoogle Scholar
  15. Osuna, E., Freund, R., and Girosi, F. (1997). An improved training algorithm for support vector machines. In Principe, J., Gile, L., Morgan, N., and Wilson, E., eds., Neural Networks for Signal Processing VII — Proceedings of the 1997 IEEE Workshop, 276–285. New York: IEEE.CrossRefGoogle Scholar
  16. Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J., eds., Advances in Kernel Methods — Support Vector Learning, 185–208. Cambridge, MA: MIT Press.Google Scholar
  17. Poggio, T. (1975). On optimal nonlinear associative recall. Biological Cybernetics 19: 201–209.CrossRefMATHMathSciNetGoogle Scholar
  18. Schölkopf, B., and Smola, A. J. (2001). Learning with Kernels. Cambridge, MA: MIT Press. Forthcoming.Google Scholar
  19. Schölkopf, B., Burges, C., and Vapnik, V. (1995). Extracting support data for a given task. In Fayyad, U. M., and Uthurusamy, R., eds., Proceedings, First International Conference on Knowledge Discovery Data Mining. Menlo Park: AAAI Press.Google Scholar
  20. Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10: 1299–1319.CrossRefGoogle Scholar
  21. Schölkopf, B., Burges, C. J. C., and Smola, A. J. (1999). Advances in Kernel Methods - Support Vector Learning. Cambridge, MA: MIT Press.Google Scholar
  22. Schölkopf, B., Smola, A., Williamson, R. C., and Bartlett, P. L. (2000). New support vector algorithms. Neural Computation 12: 1207–1245.CrossRefGoogle Scholar
  23. Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation. To appear.Google Scholar
  24. Schölkopf, B. (1997). Support Vector Learning. München: R. Oldenbourg Verlag. Doktorarbeit, TU Berlin. Download: Scholar
  25. Schölkopf, B. (2000). The kernel trick for distances. TR MSR 2000–51, Microsoft Research, Redmond, WA. Published in: T. K. Leen, T. G. Dietterich and V. Tresp (eds.), Advances in Neural Information Processing Systems 13, MIT Press, 2001.Google Scholar
  26. Smola, A. J., and Schölkopf, B. (1998). On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica 22: 211–231.CrossRefMATHMathSciNetGoogle Scholar
  27. Smola, A., and Schölkopf, B. (2001). A tutorial on support vector regression. Statistics and Computing. Forthcoming.Google Scholar
  28. Smola, A., Schölkopf, B., and Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks 11: 637–649.CrossRefGoogle Scholar
  29. Smola, A. J., Bartlett, P. L., Schölkopf, B., and Schuurmans, D. (2000). Advances in Large Margin Classifiers. Cambridge, MA: MIT Press.MATHGoogle Scholar
  30. Vapnik, V., and Chervonenkis, A. (1974). Theory of Pattern Recognition [in Russian]. Moscow: Nauka. (German Translation: W. Wapnik A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 ).Google Scholar
  31. Vapnik, V., and Lerner, A. (1963). Pattern recognition using generalized portrait method. Automation and Remote Control 24.Google Scholar
  32. Vapnik, V. (1979). Estimation of Dependences Based on Empirical Data [in Russian]. Moscow: Nauka. ( English translation: Springer Verlag, New York, 1982 ).Google Scholar
  33. Vapnik, V. (1995). The Nature of Statistical Learning Theory. NY: Springer.CrossRefMATHGoogle Scholar
  34. Vapnik, V. (1998). Statistical Learning Theory. NY: Wiley.MATHGoogle Scholar
  35. Wahba, G. (1990). Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia: SIAM.Google Scholar
  36. Watkins, C. (2000). Dynamic alignment kernels. In Smola, A. J., Bartlett, P. L., Schölkopf, B., and Schuurmans, D., eds., Advances in Large Margin Classifiers, 39–50. Cambridge, MA: MIT Press.Google Scholar
  37. Williamson, R. C., Smola, A. J., and Scljilkopf, B. (1998). Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. Technical Report 19, NeuroCOLT, Accepted for publication in IEEE Transactions on Information Theory.

Copyright information

© Springer-Verlag Wien 2001

Authors and Affiliations

  • Bernhard Schölkopf
    • 1
  1. 1.Microsoft ResearchCambridgeUK

Personalised recommendations