Abstract
PAC learning theory is the foundation of computational learning theory. VC-dimension, Rademacher complexity, and empirical risk-minimization principle are three concepts for deriving a generalization error bound for a trained machine. The fundamental theorem of learning theory relates PAC learnability, VC-dimension, and empirical risk-minimization principle. Another basic theorem in computational learning theory is no-free-lunch theorem. These topics are addressed in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anguita, D., Ghio, A., Oneto, L., & Ridella, S. (2014). A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE Transactions on Neural Networks and Learning Systems, 25(12), 2202–2211.
Anthony, M., & Biggs, N. (1992). Computational learning theory. Cambridge, UK: Cambridge University Press.
Bartlett, P. L. (1993). Lower bounds on the Vapnik-Chervonenkis dimension of multi-layer threshold networks. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 144–150). New York: ACM Press.
Bartlett, P. L., & Maass, W. (2003). Vapnik-Chervonenkis dimension of neural nets. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed., pp. 1188–1192). Cambridge: MIT Press.
Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
Bartlett, P. L., Long, P. M., & Williamson, R. C. (1994). Fat-shattering and the learnability of real-valued functions. In Proceedings of the 7th Annual ACM Conference on Computational Learning Theory (pp. 299–310). New Brunswick, NJ.
Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497–1537.
Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.
Cataltepe, Z., Abu-Mostafa, Y. S., & Magdon-Ismail, M. (1999). No free lunch for early stropping. Neural Computation, 11, 995–1009.
Cherkassky, V., & Ma, Y. (2003). Comparison of model selection for regression. Neural Computation, 15, 1691–1714.
Cherkassky, V., & Ma, Y. (2009). Another look at statistical learning theory and regularization. Neural Networks, 22, 958–969.
Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14, 326–334.
Dudley, R. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis, 1(3), 290–330.
Friedrichs, F., & Schmitt, M. (2005). On the power of Boolean computations in generalized RBF neural networks. Neurocomputing, 63, 483–498.
Goldman, S., & Kearns, M. (1995). On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1), 20–31.
Goutte, C. (1997). Note on free lunches and cross-validation. Neural Computation, 9(6), 1245–1249.
Gribonval, R., Jenatton, R., Bach, F., Kleinsteuber, M., & Seibert, M. (2015). Sample complexity of dictionary learning and other matrix factorizations. IEEE Transactions on Information Theory, 61(6), 3469–3486.
Hanneke, S., & Yang, L. (2015). Minimax analysis of active learning. Journal of Machine Learning Research, 16, 3487–3602.
Hastie, T., Tibshirani, R., & Friedman, J. (2005). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.
Haussler, D. (1990). Probably approximately correct learning. In Proceedings of the 8th National Conference on Artificial Intelligence (Vol. 2, pp. 1101–1108). Boston, MA.
Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Koiran, P., & Sontag, E. D. (1996). Neural networks with quadratic VC dimension. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 197–203). Cambridge, MA: MIT Press.
Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5), 1902–1914.
Lei, Y., Ding, L., & Zhang, W. (2015). Generalization performance of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems, 26(3), 551–564.
Liu, J., & Zhu, X. (2016). The teaching dimension of linear learners. Journal of Machine Learning Research, 17, 1–25.
Magdon-Ismail, M. (2000). No free lunch for noise prediction. Neural Computation, 12, 547–564.
Mendelson, S. (2002). Rademacher averages and phase transitions in Glivenko-Cantelli classes. IEEE Transactions on Information Theory, 48(1), 251–263.
Mendelson, S. (2003). A few notes on statistical learning theory. In S. Mendelson & A. Smola (Eds.), Advanced lectures on machine learning (Lecture notes computer science) (Vol. 2600, pp. 1–40). Berlin: Springer-Verlag.
Oneto, L., Anguita, D., & Ridella, S. (2016). A local Vapnik-Chervonenkis complexity. Neural Networks, 82, 62–75.
Rivals, I., & Personnaz, L. (1999). On cross-validation for model selection. Neural Computation, 11(4), 863–870.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Schmitt, M. (2005). On the capabilities of higher-order neurons: A radial basis function approach. Neural Computation, 17, 715–729.
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. New York, NY: Cambridge University Press.
Shao, X., Cherkassky, V., & Li, W. (2000). Measuring the VC-dimension using optimized experimental design. Neural Computation, 12, 1969–1986.
Shawe-Taylor, J. (1995). Sample sizes for sigmoidal neural networks. In Proceedings of the 8th Annual Conference on Computational Learning Theory (pp. 258–264). Santa Cruz, CA.
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44, 1926–1940.
Shinohara, A., & Miyano, S. (1991). Teachability in computational learning. New Generation Computing, 8(4), 337–348.
Simon, H. U. (2015). An almost optimal PAC algorithm. In Proceedings of the 28th Conference on Learning Theory (pp. 1–12). Paris.
Valiant, P. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer-Verlag.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Vapnik, V. N., & Chervonenkis, A. J. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & its Applications, 16, 264–280.
Vapnik, V., Levin, E., & Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6, 851–876.
Wolpert, D. H., & Macready, W. G. (1995). No free lunch theorems for search, SFI-TR-95-02-010, Santa Fe Institute.
Yu, H.-F., Jain, P., & Dhillon, I. S. (2014). Large-scale multi-label learning with missing labels. In Proceedings of the 21st International Conference on Machine Learning (pp. 1–9).
Zhu, H. (1996). No free lunch for cross validation. Neural Computation, 8(7), 1421–1426.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer-Verlag London Ltd., part of Springer Nature
About this chapter
Cite this chapter
Du, KL., Swamy, M.N.S. (2019). Elements of Computational Learning Theory. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_3
Download citation
DOI: https://doi.org/10.1007/978-1-4471-7452-3_3
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7451-6
Online ISBN: 978-1-4471-7452-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)