A Short Review of Statistical Learning Theory

  • Massimiliano Pontil
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2486)


Statistical learning theory has emerged in the last few years as a solid and elegant framework for studying the problem of learning from examples. Unlike previous “classical” learning techniques, this theory completely characterizes the necessary and sufficient conditions for a learning algorithm to be consistent. The key quantity is the capacity of the set of hypotheses employed in the learning algorithm and the goal is to control this capacity depending on the given examples. Structural risk minimization (SRM) is the main theoretical algorithm which implements this idea. SRM is inspired and closely related to regularization theory. For practical purposes, however, SRM is a very hard problem and impossible to implement when dealing with a large number of examples. Techniques such as support vector machines and older regularization networks are a viable solution to implement the idea of capacity control. The paper also discusses how these techniques can be formulated as a variational problem in a Hilbert space and show how SRM can be extended in order to implement both classical regularization networks and support vector machines.


Statistical learning theory Structural risk minimization Regularization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler: 1993, ‘Scale-sensitive dimensions, uniform convergence, and learnability’. Symposium on Foundations of Computer Science.Google Scholar
  2. [2]
    N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 686:337–404, 1950.CrossRefMathSciNetGoogle Scholar
  3. [3]
    M. Bertero. Regularization methods for linear inverse problems. In C. G. Talenti, editor, Inverse Problems. Springer-Verlag, Berlin, 1986.Google Scholar
  4. [4]
    C. Cortes, and V. Vapnik: 1995, ‘Support Vector Networks’. Machine Learning 20, 1–25.Google Scholar
  5. [5]
    L. Devroye, L. Györfi, and G. Lugosi: 1996, A Probabilistic Theory of Pattern Recognition, No. 31 in Applications of mathematics. New York: Springer.zbMATHGoogle Scholar
  6. [6]
    T. Evgeniou, M. Pontil, C. Papageorgiou, and T. Poggio: 2000, ‘Image representations for object detection using kernel classifiers’. In: Proceedings ACCV. Taiwan, p. (to appear).Google Scholar
  7. [7]
    T. Evgeniou, M. Pontil, and T. Poggio: 1999, ‘Regularization Networks and Support Vector Machines’. Advances in Computational Mathematics 13, pp 1–50, 2000.CrossRefMathSciNetGoogle Scholar
  8. [8]
    F. Girosi. An equivalence between sparse approximation and Support Vector Machines. Neural Computation, 10(6):1455–1480, 1998.CrossRefGoogle Scholar
  9. [9]
    F. Girosi, M. Jones, and T. Poggio: 1995, ‘Regularization theory and neural networks architectures’. Neural Computation 7, 219–269.CrossRefGoogle Scholar
  10. [10]
    T. Jaakkola, and D. Haussler: 1998, ‘Probabilistic Kernel Regression Models’. In: Proc. of Neural Information Processing Conference.Google Scholar
  11. [11]
    M. Kearns, and R. Shapire: 1994, ‘Efficient distribution-free learning of probabilistic concepts.’. Journal of Computer and Systems Sciences 48(3), 464–497.zbMATHCrossRefGoogle Scholar
  12. [12]
    V. A. Morozov: 1984, Methods for solving incorrectly posed problems. Berlin, Springer-Verlag.Google Scholar
  13. [13]
    E. Osuna, R. Freund, and F. Girosi: 1997, ‘An Improved Training Algorithm for Support Vector Machines’. In: IEEE Workshop on Neural Networks and Signal Processing. Amelia Island, FL.Google Scholar
  14. [14]
    J. C. Platt. Sequential minimal imization: A fast algorithm for training support vector machines. Technical Report MST-TR-98-14, Microsoft Research, April 1998.Google Scholar
  15. [15]
    T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9), September 1990.Google Scholar
  16. [16]
    M. Pontil, S. Mukherjee, and F. Girosi. On the noise model of support vector machine regression. A.I. Memo, MIT Artificial Intelligence Laboratory, 1998.Google Scholar
  17. [17]
    M. Pontil, R. Rifkin, and T. Evgeniou. From regression to classification in support vector machines. A. I. Memo 1649, MIT Artificial Intelligence Lab., 1998.Google Scholar
  18. [18]
    J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.zbMATHCrossRefGoogle Scholar
  19. [19]
    A. N. Tikhonov, and V. Y. Arsenin: 1977, Solutions of Ill-posed Problems. Washington, D. C.: W. H. Winston.zbMATHGoogle Scholar
  20. [20]
    V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.zbMATHGoogle Scholar
  21. [21]
    V. N. Vapnik: 1998, Statistical Learning Theory. New York: Wiley.zbMATHGoogle Scholar
  22. [22]
    V. N. Vapnik, and A.Y. Chervonenkis: 1971, ‘On the Uniform Convergence of Relative Frequencies of events to their probabilities’. Th. Prob. and its Applications 17(2), 264–280.CrossRefMathSciNetGoogle Scholar
  23. [23]
    G. Wahba: 1990, Splines Models for Observational Data. Philadelphia: Series in Applied Mathematics, Vol. 59, SIAM.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Massimiliano Pontil
    • 1
  1. 1.Dipartimento di Ingegneria dell’InformazioneSienaItaly

Personalised recommendations