A Short Review of Statistical Learning Theory
Statistical learning theory has emerged in the last few years as a solid and elegant framework for studying the problem of learning from examples. Unlike previous “classical” learning techniques, this theory completely characterizes the necessary and sufficient conditions for a learning algorithm to be consistent. The key quantity is the capacity of the set of hypotheses employed in the learning algorithm and the goal is to control this capacity depending on the given examples. Structural risk minimization (SRM) is the main theoretical algorithm which implements this idea. SRM is inspired and closely related to regularization theory. For practical purposes, however, SRM is a very hard problem and impossible to implement when dealing with a large number of examples. Techniques such as support vector machines and older regularization networks are a viable solution to implement the idea of capacity control. The paper also discusses how these techniques can be formulated as a variational problem in a Hilbert space and show how SRM can be extended in order to implement both classical regularization networks and support vector machines.
KeywordsStatistical learning theory Structural risk minimization Regularization
Unable to display preview. Download preview PDF.
- N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler: 1993, ‘Scale-sensitive dimensions, uniform convergence, and learnability’. Symposium on Foundations of Computer Science.Google Scholar
- M. Bertero. Regularization methods for linear inverse problems. In C. G. Talenti, editor, Inverse Problems. Springer-Verlag, Berlin, 1986.Google Scholar
- C. Cortes, and V. Vapnik: 1995, ‘Support Vector Networks’. Machine Learning 20, 1–25.Google Scholar
- T. Evgeniou, M. Pontil, C. Papageorgiou, and T. Poggio: 2000, ‘Image representations for object detection using kernel classifiers’. In: Proceedings ACCV. Taiwan, p. (to appear).Google Scholar
- T. Jaakkola, and D. Haussler: 1998, ‘Probabilistic Kernel Regression Models’. In: Proc. of Neural Information Processing Conference.Google Scholar
- V. A. Morozov: 1984, Methods for solving incorrectly posed problems. Berlin, Springer-Verlag.Google Scholar
- E. Osuna, R. Freund, and F. Girosi: 1997, ‘An Improved Training Algorithm for Support Vector Machines’. In: IEEE Workshop on Neural Networks and Signal Processing. Amelia Island, FL.Google Scholar
- J. C. Platt. Sequential minimal imization: A fast algorithm for training support vector machines. Technical Report MST-TR-98-14, Microsoft Research, April 1998.Google Scholar
- T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9), September 1990.Google Scholar
- M. Pontil, S. Mukherjee, and F. Girosi. On the noise model of support vector machine regression. A.I. Memo, MIT Artificial Intelligence Laboratory, 1998.Google Scholar
- M. Pontil, R. Rifkin, and T. Evgeniou. From regression to classification in support vector machines. A. I. Memo 1649, MIT Artificial Intelligence Lab., 1998.Google Scholar
- G. Wahba: 1990, Splines Models for Observational Data. Philadelphia: Series in Applied Mathematics, Vol. 59, SIAM.Google Scholar