Elements of Computational Learning Theory

Du, Ke-Lin; Swamy, M. N. S.

doi:10.1007/978-1-4471-7452-3_3

Ke-Lin Du^3,4 &
M. N. S. Swamy³

4416 Accesses
1 Citations

Abstract

PAC learning theory is the foundation of computational learning theory. VC-dimension, Rademacher complexity, and empirical risk-minimization principle are three concepts for deriving a generalization error bound for a trained machine. The fundamental theorem of learning theory relates PAC learnability, VC-dimension, and empirical risk-minimization principle. Another basic theorem in computational learning theory is no-free-lunch theorem. These topics are addressed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anguita, D., Ghio, A., Oneto, L., & Ridella, S. (2014). A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE Transactions on Neural Networks and Learning Systems, 25(12), 2202–2211.
Article Google Scholar
Anthony, M., & Biggs, N. (1992). Computational learning theory. Cambridge, UK: Cambridge University Press.
MATH Google Scholar
Bartlett, P. L. (1993). Lower bounds on the Vapnik-Chervonenkis dimension of multi-layer threshold networks. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 144–150). New York: ACM Press.
Google Scholar
Bartlett, P. L., & Maass, W. (2003). Vapnik-Chervonenkis dimension of neural nets. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed., pp. 1188–1192). Cambridge: MIT Press.
Google Scholar
Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
MathSciNet MATH Google Scholar
Bartlett, P. L., Long, P. M., & Williamson, R. C. (1994). Fat-shattering and the learnability of real-valued functions. In Proceedings of the 7th Annual ACM Conference on Computational Learning Theory (pp. 299–310). New Brunswick, NJ.
Google Scholar
Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497–1537.
Article MathSciNet Google Scholar
Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.
Article Google Scholar
Cataltepe, Z., Abu-Mostafa, Y. S., & Magdon-Ismail, M. (1999). No free lunch for early stropping. Neural Computation, 11, 995–1009.
Article Google Scholar
Cherkassky, V., & Ma, Y. (2003). Comparison of model selection for regression. Neural Computation, 15, 1691–1714.
Article Google Scholar
Cherkassky, V., & Ma, Y. (2009). Another look at statistical learning theory and regularization. Neural Networks, 22, 958–969.
Article Google Scholar
Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14, 326–334.
Article Google Scholar
Dudley, R. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis, 1(3), 290–330.
Article MathSciNet Google Scholar
Friedrichs, F., & Schmitt, M. (2005). On the power of Boolean computations in generalized RBF neural networks. Neurocomputing, 63, 483–498.
Article Google Scholar
Goldman, S., & Kearns, M. (1995). On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1), 20–31.
Article MathSciNet Google Scholar
Goutte, C. (1997). Note on free lunches and cross-validation. Neural Computation, 9(6), 1245–1249.
Article Google Scholar
Gribonval, R., Jenatton, R., Bach, F., Kleinsteuber, M., & Seibert, M. (2015). Sample complexity of dictionary learning and other matrix factorizations. IEEE Transactions on Information Theory, 61(6), 3469–3486.
Article MathSciNet Google Scholar
Hanneke, S., & Yang, L. (2015). Minimax analysis of active learning. Journal of Machine Learning Research, 16, 3487–3602.
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2005). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.
MATH Google Scholar
Haussler, D. (1990). Probably approximately correct learning. In Proceedings of the 8th National Conference on Artificial Intelligence (Vol. 2, pp. 1101–1108). Boston, MA.
Google Scholar
Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
MATH Google Scholar
Koiran, P., & Sontag, E. D. (1996). Neural networks with quadratic VC dimension. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 197–203). Cambridge, MA: MIT Press.
Google Scholar
Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5), 1902–1914.
Article MathSciNet Google Scholar
Lei, Y., Ding, L., & Zhang, W. (2015). Generalization performance of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems, 26(3), 551–564.
Article MathSciNet Google Scholar
Liu, J., & Zhu, X. (2016). The teaching dimension of linear learners. Journal of Machine Learning Research, 17, 1–25.
MathSciNet MATH Google Scholar
Magdon-Ismail, M. (2000). No free lunch for noise prediction. Neural Computation, 12, 547–564.
Article Google Scholar
Mendelson, S. (2002). Rademacher averages and phase transitions in Glivenko-Cantelli classes. IEEE Transactions on Information Theory, 48(1), 251–263.
Article MathSciNet Google Scholar
Mendelson, S. (2003). A few notes on statistical learning theory. In S. Mendelson & A. Smola (Eds.), Advanced lectures on machine learning (Lecture notes computer science) (Vol. 2600, pp. 1–40). Berlin: Springer-Verlag.
Chapter Google Scholar
Oneto, L., Anguita, D., & Ridella, S. (2016). A local Vapnik-Chervonenkis complexity. Neural Networks, 82, 62–75.
Article Google Scholar
Rivals, I., & Personnaz, L. (1999). On cross-validation for model selection. Neural Computation, 11(4), 863–870.
Article Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Google Scholar
Schmitt, M. (2005). On the capabilities of higher-order neurons: A radial basis function approach. Neural Computation, 17, 715–729.
Article MathSciNet Google Scholar
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. New York, NY: Cambridge University Press.
Book Google Scholar
Shao, X., Cherkassky, V., & Li, W. (2000). Measuring the VC-dimension using optimized experimental design. Neural Computation, 12, 1969–1986.
Article Google Scholar
Shawe-Taylor, J. (1995). Sample sizes for sigmoidal neural networks. In Proceedings of the 8th Annual Conference on Computational Learning Theory (pp. 258–264). Santa Cruz, CA.
Google Scholar
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44, 1926–1940.
Article MathSciNet Google Scholar
Shinohara, A., & Miyano, S. (1991). Teachability in computational learning. New Generation Computing, 8(4), 337–348.
Article Google Scholar
Simon, H. U. (2015). An almost optimal PAC algorithm. In Proceedings of the 28th Conference on Learning Theory (pp. 1–12). Paris.
Google Scholar
Valiant, P. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Article Google Scholar
Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer-Verlag.
MATH Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Book Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vapnik, V. N., & Chervonenkis, A. J. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & its Applications, 16, 264–280.
Article Google Scholar
Vapnik, V., Levin, E., & Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6, 851–876.
Article Google Scholar
Wolpert, D. H., & Macready, W. G. (1995). No free lunch theorems for search, SFI-TR-95-02-010, Santa Fe Institute.
Google Scholar
Yu, H.-F., Jain, P., & Dhillon, I. S. (2014). Large-scale multi-label learning with missing labels. In Proceedings of the 21st International Conference on Machine Learning (pp. 1–9).
Google Scholar
Zhu, H. (1996). No free lunch for cross validation. Neural Computation, 8(7), 1421–1426.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du & M. N. S. Swamy
Xonlink Inc., Hangzhou, China
Ke-Lin Du

Authors

Ke-Lin Du
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2019). Elements of Computational Learning Theory. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4471-7452-3_3
Published: 13 September 2019
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7451-6
Online ISBN: 978-1-4471-7452-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics