Skip to main content

Elements of Computational Learning Theory

  • Chapter
  • First Online:
Neural Networks and Statistical Learning

Abstract

PAC learning theory is the foundation of computational learning theory. VC-dimension, Rademacher complexity, and empirical risk-minimization principle are three concepts for deriving a generalization error bound for a trained machine. The fundamental theorem of learning theory relates PAC learnability, VC-dimension, and empirical risk-minimization principle. Another basic theorem in computational learning theory is no-free-lunch theorem. These topics are addressed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anguita, D., Ghio, A., Oneto, L., & Ridella, S. (2014). A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE Transactions on Neural Networks and Learning Systems, 25(12), 2202–2211.

    Article  Google Scholar 

  2. Anthony, M., & Biggs, N. (1992). Computational learning theory. Cambridge, UK: Cambridge University Press.

    MATH  Google Scholar 

  3. Bartlett, P. L. (1993). Lower bounds on the Vapnik-Chervonenkis dimension of multi-layer threshold networks. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 144–150). New York: ACM Press.

    Google Scholar 

  4. Bartlett, P. L., & Maass, W. (2003). Vapnik-Chervonenkis dimension of neural nets. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed., pp. 1188–1192). Cambridge: MIT Press.

    Google Scholar 

  5. Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.

    MathSciNet  MATH  Google Scholar 

  6. Bartlett, P. L., Long, P. M., & Williamson, R. C. (1994). Fat-shattering and the learnability of real-valued functions. In Proceedings of the 7th Annual ACM Conference on Computational Learning Theory (pp. 299–310). New Brunswick, NJ.

    Google Scholar 

  7. Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497–1537.

    Article  MathSciNet  Google Scholar 

  8. Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.

    Article  Google Scholar 

  9. Cataltepe, Z., Abu-Mostafa, Y. S., & Magdon-Ismail, M. (1999). No free lunch for early stropping. Neural Computation, 11, 995–1009.

    Article  Google Scholar 

  10. Cherkassky, V., & Ma, Y. (2003). Comparison of model selection for regression. Neural Computation, 15, 1691–1714.

    Article  Google Scholar 

  11. Cherkassky, V., & Ma, Y. (2009). Another look at statistical learning theory and regularization. Neural Networks, 22, 958–969.

    Article  Google Scholar 

  12. Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14, 326–334.

    Article  Google Scholar 

  13. Dudley, R. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis, 1(3), 290–330.

    Article  MathSciNet  Google Scholar 

  14. Friedrichs, F., & Schmitt, M. (2005). On the power of Boolean computations in generalized RBF neural networks. Neurocomputing, 63, 483–498.

    Article  Google Scholar 

  15. Goldman, S., & Kearns, M. (1995). On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1), 20–31.

    Article  MathSciNet  Google Scholar 

  16. Goutte, C. (1997). Note on free lunches and cross-validation. Neural Computation, 9(6), 1245–1249.

    Article  Google Scholar 

  17. Gribonval, R., Jenatton, R., Bach, F., Kleinsteuber, M., & Seibert, M. (2015). Sample complexity of dictionary learning and other matrix factorizations. IEEE Transactions on Information Theory, 61(6), 3469–3486.

    Article  MathSciNet  Google Scholar 

  18. Hanneke, S., & Yang, L. (2015). Minimax analysis of active learning. Journal of Machine Learning Research, 16, 3487–3602.

    MathSciNet  MATH  Google Scholar 

  19. Hastie, T., Tibshirani, R., & Friedman, J. (2005). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.

    MATH  Google Scholar 

  20. Haussler, D. (1990). Probably approximately correct learning. In Proceedings of the 8th National Conference on Artificial Intelligence (Vol. 2, pp. 1101–1108). Boston, MA.

    Google Scholar 

  21. Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

    MATH  Google Scholar 

  22. Koiran, P., & Sontag, E. D. (1996). Neural networks with quadratic VC dimension. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 197–203). Cambridge, MA: MIT Press.

    Google Scholar 

  23. Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5), 1902–1914.

    Article  MathSciNet  Google Scholar 

  24. Lei, Y., Ding, L., & Zhang, W. (2015). Generalization performance of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems, 26(3), 551–564.

    Article  MathSciNet  Google Scholar 

  25. Liu, J., & Zhu, X. (2016). The teaching dimension of linear learners. Journal of Machine Learning Research, 17, 1–25.

    MathSciNet  MATH  Google Scholar 

  26. Magdon-Ismail, M. (2000). No free lunch for noise prediction. Neural Computation, 12, 547–564.

    Article  Google Scholar 

  27. Mendelson, S. (2002). Rademacher averages and phase transitions in Glivenko-Cantelli classes. IEEE Transactions on Information Theory, 48(1), 251–263.

    Article  MathSciNet  Google Scholar 

  28. Mendelson, S. (2003). A few notes on statistical learning theory. In S. Mendelson & A. Smola (Eds.), Advanced lectures on machine learning (Lecture notes computer science) (Vol. 2600, pp. 1–40). Berlin: Springer-Verlag.

    Chapter  Google Scholar 

  29. Oneto, L., Anguita, D., & Ridella, S. (2016). A local Vapnik-Chervonenkis complexity. Neural Networks, 82, 62–75.

    Article  Google Scholar 

  30. Rivals, I., & Personnaz, L. (1999). On cross-validation for model selection. Neural Computation, 11(4), 863–870.

    Article  Google Scholar 

  31. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.

    Google Scholar 

  32. Schmitt, M. (2005). On the capabilities of higher-order neurons: A radial basis function approach. Neural Computation, 17, 715–729.

    Article  MathSciNet  Google Scholar 

  33. Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. New York, NY: Cambridge University Press.

    Book  Google Scholar 

  34. Shao, X., Cherkassky, V., & Li, W. (2000). Measuring the VC-dimension using optimized experimental design. Neural Computation, 12, 1969–1986.

    Article  Google Scholar 

  35. Shawe-Taylor, J. (1995). Sample sizes for sigmoidal neural networks. In Proceedings of the 8th Annual Conference on Computational Learning Theory (pp. 258–264). Santa Cruz, CA.

    Google Scholar 

  36. Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44, 1926–1940.

    Article  MathSciNet  Google Scholar 

  37. Shinohara, A., & Miyano, S. (1991). Teachability in computational learning. New Generation Computing, 8(4), 337–348.

    Article  Google Scholar 

  38. Simon, H. U. (2015). An almost optimal PAC algorithm. In Proceedings of the 28th Conference on Learning Theory (pp. 1–12). Paris.

    Google Scholar 

  39. Valiant, P. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.

    Article  Google Scholar 

  40. Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer-Verlag.

    MATH  Google Scholar 

  41. Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.

    Book  Google Scholar 

  42. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.

    MATH  Google Scholar 

  43. Vapnik, V. N., & Chervonenkis, A. J. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & its Applications, 16, 264–280.

    Article  Google Scholar 

  44. Vapnik, V., Levin, E., & Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6, 851–876.

    Article  Google Scholar 

  45. Wolpert, D. H., & Macready, W. G. (1995). No free lunch theorems for search, SFI-TR-95-02-010, Santa Fe Institute.

    Google Scholar 

  46. Yu, H.-F., Jain, P., & Dhillon, I. S. (2014). Large-scale multi-label learning with missing labels. In Proceedings of the 21st International Conference on Machine Learning (pp. 1–9).

    Google Scholar 

  47. Zhu, H. (1996). No free lunch for cross validation. Neural Computation, 8(7), 1421–1426.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag London Ltd., part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Du, KL., Swamy, M.N.S. (2019). Elements of Computational Learning Theory. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_3

Download citation

Publish with us

Policies and ethics