Mathematical Modelling of Generalization

  • Martin Anthony
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2486)


This paper surveys certain developments in the use of probabilistic techniques for the modelling of generalization. Some of the main methods and key results are discussed. Many details are omitted, the aim being to give a high-level overview of the types of approaches taken and methods used.


Probabilistic modelling of learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Noga Alon, Shai Ben-David, Nicolo Cesa-Bianchi, and David Haussler: Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM 44(5): 616–631.Google Scholar
  2. [2]
    Martin Anthony: Probabilistic analysis of learning in artificial neural networks: the PAC model and its variants. Neural Computing Surveys, 1, 1997.Google Scholar
  3. [3]
    Martin Anthony and Peter L. Bartlett: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge UK, 1999.zbMATHGoogle Scholar
  4. [4]
    Martin Anthony and Norman L. Biggs: Computational Learning Theory: An Introduction. Cambridge Tracts in Theoretical Computer Science, 30, 1992. Cambridge University Press, Cambridge, UK.Google Scholar
  5. [5]
    András Antos, Balázs Kégl, Tamás Linder and Gábor Lugosi: Data-dependent margin-based generalization bounds for classification. Preprint, Queen’s University at Kingston, Canada, Scholar
  6. [6]
    Peter Bartlett: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory 44(2): 525–536.Google Scholar
  7. [7]
    Peter L. Bartlett, Olivier Bousquet and Shahar Mendelson: Localized Rademacher complexities. To appear, Proceedings of the 15th Annual Conference on Computational Learning Theory, ACM Press, New York, NY, 2002.Google Scholar
  8. [8]
    Peter L. Bartlett and Philip M. Long: More theorems about scale-sensitive dimensions and learning. In Proceedings of the 8th Annual Conference on Computational Learning Theory, ACM Press, New York, NY, 1995, pp. 392–401.CrossRefGoogle Scholar
  9. [9]
    Peter Bartlett and Shahar Mendelson: Rademacher and Guassian complexities: risk bounds and structural results. In Proceedings of the 14th Annual Conference on Computational Learning Theory, Lecture Notes in Artificial Intelligence, Springer pp. 224–240, 2001.Google Scholar
  10. [10]
    Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4): 929–965, 1989.zbMATHCrossRefMathSciNetGoogle Scholar
  11. [11]
    Stéphane Boucheron, Gábor Lugosi and Pascal Massart: A sharp concentration inequality with applications. Random Structures and Algorithms, 16: 277–292, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  12. [12]
    Olivier Bousquet, Vladimir Koltchinskii and Dmitriy Panchenko: Some local measures of complexity on convex hulls and generalization bounds. To appear, Proceedings of the 15th Annual Conference on Computational Learning Theory, ACM Press, New York, NY, 2002.Google Scholar
  13. [13]
    Nello Cristianini and John Shawe-Taylor: An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, UK, 2000.Google Scholar
  14. [14]
    Luc Devroye and Gábor Lugosi: Combinatorial Methods in Density Estimation, Springer Series in Statistics, Springer-Verlag, New York, NY, 2001.zbMATHGoogle Scholar
  15. [15]
    Richard M. Dudley: Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, 63, Cambridge University Press, Cambridge, UK, 1999.zbMATHGoogle Scholar
  16. [16]
    Richard M. Dudley: Central limit theorems for empirical measures. Annals of Probability, 6(6): 899–929, 1978.zbMATHMathSciNetCrossRefGoogle Scholar
  17. [17]
    Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247–261, 1989.zbMATHCrossRefMathSciNetGoogle Scholar
  18. [18]
    E. Giné and J. Zinn: Some limit theorems for empirical processes. Annals of Probability 12(4): 929–989, 1984.zbMATHMathSciNetCrossRefGoogle Scholar
  19. [19]
    David Haussler: Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1): 78–150, 1992.zbMATHCrossRefMathSciNetGoogle Scholar
  20. [20]
    Marek Karpinski and Angus MacIntyre: Polynomial bounds for VC dimension of sigmoidal and general Pfaffian neural networks. Journal of Computer and System Sciences, 54: 169–176, 1997.zbMATHCrossRefMathSciNetGoogle Scholar
  21. [21]
    Michael J. Kearns and Umesh Vazirani: Introduction to Computational Learning Theory, MIT Press, Cambridge, MA, 1995.Google Scholar
  22. [22]
    Vladimir Koltchinskii and Dmitry Panchenko: Rademacher processes and bounding the risk of function learning. Technical report, Department of Mathematics and Statistics, University of New Mexico, 2000.Google Scholar
  23. [23]
    Gábor Lugosi: Lectures on Statistical Learning Theory, presented at the Garchy Seminar on Mathematical Statistics and Applications, August 27–September 1, 2000. (Availablefrom
  24. [24]
    Colin McDiarmid: On the method of bounded differences. In J. Siemons, editor, Surveys in Combinatorics, 1989, London Mathematical Society Lecture Note Series (141). Cambridge University Press, Cambridge, UK, 1989.Google Scholar
  25. [25]
    Shahar Mendelson: A few notes on Statistical Learning Theory. Technical Report, Australian National University Computer Science Laboratory.Google Scholar
  26. [26]
    S. Mendelson and R. Vershynin: Entropy, dimension and the Elton-Pajor theorem. Preprint, Australian National University.Google Scholar
  27. [27]
    David Pollard: Convergence of Stochastic Processes. Springer-Verlag, 1984.Google Scholar
  28. [28]
    N. Sauer: On the density of families of sets. Journal of Combinatorial Theory (A), 13: 145–147, 1972.zbMATHCrossRefMathSciNetGoogle Scholar
  29. [29]
    S. Shelah: A combinatorial problem: Stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics, 41: 247–261, 1972.zbMATHMathSciNetGoogle Scholar
  30. [30]
    John Shawe-Taylor, Peter Bartlett, Bob Williamson and Martin Anthony: Structural risk minimisation over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5): 1926–1940, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  31. [31]
    Aad W. van der Vaart and Jon A. Wellner: Weak Convergence and Empirical Processes, Springer Series in Statistics, Springer-Verlag, New York, NY, 1996.zbMATHGoogle Scholar
  32. [32]
    Leslie G. Valiant: A theory of the learnable. Communications of the ACM, 27(11): 1134–1142, Nov. 1984.Google Scholar
  33. [33]
    Vladimir N. Vapnik: Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York, 1982.zbMATHGoogle Scholar
  34. [34]
    Vladimir N. Vapnik: Statistical Learning Theory, Wiley, 1998.Google Scholar
  35. [35]
    V. N. Vapnik and A. Y. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2): 264–280, 1971.CrossRefMathSciNetzbMATHGoogle Scholar
  36. [36]
    M. Vidyasagar: A Theory of Learning and Generalization, Springer-Verlag, 1996.Google Scholar
  37. [37]
    Robert Williamson, John Shawe-Taylor, Bernhard Scholkopf, and Alex Smola: Sample Based Generalization Bounds, NeuroCOLT Technical Report, NC-TR-99-055, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Martin Anthony
    • 1
  1. 1.Department of MathematicsLondon School of EconomicsLondonUK

Personalised recommendations