# Foundations of Statistical Learning and Model Selection

Chapter

First Online:

## Abstract

**What the reader should know to understand this chapter** \(\bullet \) Basic notions of machine learning. \(\bullet \) Notions of calculus. \(\bullet \) Chapter 5.

### Keywords

Entropy Sine### References

- 1.H. Akaike. Statistical predictor identification.
*Annals of the Institute of Statistical Mathematics*, 21:202–217, 1970.Google Scholar - 2.H. Akaike. Information theory and an extension of the maximum likelihood principle. In \(2^{nd}\)
*International Symposium on Information Theory*, pages 267–281, 1973.Google Scholar - 3.M. Anthony.
*Neural Network Learning: Theoretical Foundations*. Cambridge University Press, 1999.Google Scholar - 4.C. M. Bishop.
*Neural Networks for Pattern Recognition*. Cambridge University Press, 1995.Google Scholar - 5.S. Boucheron, G. Lugosi, and S. Massart. A sharp concentration inequality with applications.
*Random Structures and Algorithms*, 16(3):277–292, 2000.Google Scholar - 6.V. Cherkassky and F. Mulier.
*Learning from Data*. John Wiley, 1998.Google Scholar - 7.H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations.
*Annals of Mathematical Sciences*, 23:493–507, 1952.Google Scholar - 8.P. Craven and G. Wahba. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized crossvalidation.
*Numerische Mathematik*, 31(4):377–403, 1978.Google Scholar - 9.L. Devroye, L. Gyorfi, and G. Lugosi.
*A Probabilistic Theory of Pattern Recognition*. Springer-Verlag, 1996.Google Scholar - 10.R. O. Duda, P. E. Hart, and D. G. Stork.
*Pattern Classification*. John Wiley, 2001.Google Scholar - 11.B. Efron and R.J. Tibshirani.
*An Introduction to the Bootstrap*. Chapman & Hall, 1993.Google Scholar - 12.R. A. Fisher. The use of multiple measurements in taxonomic problems.
*Annals of Eugenics*, 7(2):179–188, 1936.Google Scholar - 13.K. Fukunaga.
*Introduction to Statistical Pattern Recognition*. Academic Press, 1990.Google Scholar - 14.S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias-variance dilemma.
*Neural Networks*, 4(1):1–58, 1992.Google Scholar - 15.T. Hastie, R.J. Tibshirani, and J. Friedman.
*The Elements of Statistical Learning*. Springer-Verlag, 2001.Google Scholar - 16.F. Mosteller and J.W. Tukey. Data analysis, including statistics. In
*Handbook of Social Psychology*, pages 80–203. Addison-Wesley, 1968.Google Scholar - 17.J. Rissanen. A universal prior for integers and estimation by minimum description length.
*Annals of Statistics*, 11(2):416–431, 1983.Google Scholar - 18.B. Schölkopf and A.J. Smola.
*Learning with Kernels*. MIT Press, 2002.Google Scholar - 19.G. Schwartz. Estimating the dimension of a model.
*Annals of Statistics*, 6(2):461–464, 1978.Google Scholar - 20.R. Shibata. An optimal selection of regression variables.
*Biometrika*, 68(1):45–54, 1981.Google Scholar - 21.M. Stone. Cross-validatory choice and assessment of statistical predictions.
*Journal of the Royal Statistical Society*, B36:111–147, 1974.Google Scholar - 22.M. Stone. An asymptotic equivalence of choice of model by crossvalidation and akaike’s criterion.
*Journal of the Royal Statistical Society*, B39:44–47, 1977.Google Scholar - 23.V.N. Vapnik.
*Estimation of Dependences based on Empirical Data*. Springer-Verlag, 1982.Google Scholar - 24.V.N. Vapnik.
*The Nature of Statistical Learning Theory*. Springer-Verlag, 1995.Google Scholar - 25.V.N. Vapnik.
*Statistical Learning Theory*. John Wiley, 1998.Google Scholar - 26.V.N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities.
*Theory of Probability and its Applications*, 16(2):264–280, 1971.Google Scholar - 27.V.N. Vapnik and A. Ya. Chervonenkis.
*Theory of Pattern Recognition*. Nauka, 1974.Google Scholar

## Copyright information

© Springer-Verlag London 2015