Abstract
What the reader should know to understand this chapter \(\bullet \) Basic notions of machine learning. \(\bullet \) Notions of calculus. \(\bullet \) ChapterĀ 5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(\textit{erf}(u)=\frac{2}{\sqrt{\pi }}\int _{0}^u e^{-u^2} du\).
- 2.
Numquam ponenda sine necessitate (W. Occam).
- 3.
\(f(\cdot )_+\) stands for the positive part of \(f(\cdot )\).
References
H. Akaike. Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 21:202ā217, 1970.
H. Akaike. Information theory and an extension of the maximum likelihood principle. In \(2^{nd}\) International Symposium on Information Theory, pages 267ā281, 1973.
M. Anthony. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
C. M. Bishop. Neural Networks for Pattern Recognition. Cambridge University Press, 1995.
S. Boucheron, G. Lugosi, and S. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16(3):277ā292, 2000.
V. Cherkassky and F. Mulier. Learning from Data. John Wiley, 1998.
H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Sciences, 23:493ā507, 1952.
P. Craven and G. Wahba. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized crossvalidation. Numerische Mathematik, 31(4):377ā403, 1978.
L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.
B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179ā188, 1936.
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990.
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias-variance dilemma. Neural Networks, 4(1):1ā58, 1992.
T. Hastie, R.J. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
F. Mosteller and J.W. Tukey. Data analysis, including statistics. In Handbook of Social Psychology, pages 80ā203. Addison-Wesley, 1968.
J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2):416ā431, 1983.
B. Schƶlkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.
G. Schwartz. Estimating the dimension of a model. Annals of Statistics, 6(2):461ā464, 1978.
R. Shibata. An optimal selection of regression variables. Biometrika, 68(1):45ā54, 1981.
M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B36:111ā147, 1974.
M. Stone. An asymptotic equivalence of choice of model by crossvalidation and akaikeās criterion. Journal of the Royal Statistical Society, B39:44ā47, 1977.
V.N. Vapnik. Estimation of Dependences based on Empirical Data. Springer-Verlag, 1982.
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
V.N. Vapnik. Statistical Learning Theory. John Wiley, 1998.
V.N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264ā280, 1971.
V.N. Vapnik and A.Ā Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, 1974.
Author information
Authors and Affiliations
Corresponding author
Problems
Problems
7.1
Prove that the average error, in the case of regression, \(\mathcal {E}[(f(\mathbf {x},\mathcal {D})-F(\mathbf {x}))^2]\) can be decomposed in the following way:
7.2
Consider the bias-variance decomposition for classification. Show that if the classification error \(P(f(\mathbf {x}, \mathcal {D}=y)\) does not coincide with Bayes discriminant error, it is given by:
7.3
Prove that the class of functions \(\textit{sin}(\alpha x)\) (\(\alpha \in \mathbb {R}\)) has infinite VC dimension (TheoremĀ 4). You can compare your proof with the one reported inĀ [25].
7.4
For any \(\epsilon > 0\), prove that
7.5
Prove that the annealed entropy is an upper bound of VC Entropy. Hint: use Jensenās inequalityĀ [25] which states that for a concave function \(\psi \) the inequality
holds.
7.6
Prove that if a class of function \(\mathcal {F}\) can shatter any data set of \(\ell \) samples the third milestone of VC theory is not fulfilled, that is the condition (7.32) does not hold.
7.7
Implement the AIC criterion. Consider spam data that can be dowloaded by ftp.ics.uci.edu/pub/machine-learning-databases/spam. Divide randomly spam data in two subsets with the same number of samples. Take the former and the latter sets respectively as the training and the test set. Select a learning algorithm for classification (e.g., K-Means or MLP) and train the algorithm with several parameter values. Use the AIC criterion for model selection. Compare their performances by means of the model assessment.
7.8
Implement the BIC criterion. Repeat ProblemĀ 7.7 and use the crossvalidation for model selection. Compare its performance with AIC.
7.9
Implement the crossvalidation criterion. Repeat ProblemĀ 7.7 and use 5-fold crossvalidation for model selection. Compare its performance with AIC and BIC.
7.10
Implement the leave-one-out method and test it on Iris Data Ā [12] which can be dowloaded by ftp.ics.uci.edu/pub/machine-learning-databases/iris.
Rights and permissions
Copyright information
Ā© 2015 Springer-Verlag London
About this chapter
Cite this chapter
Camastra, F., Vinciarelli, A. (2015). Foundations of Statistical Learning and Model Selection. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6735-8_7
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6735-8_7
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6734-1
Online ISBN: 978-1-4471-6735-8
eBook Packages: Computer ScienceComputer Science (R0)