Concentration Inequalities

  • Stéphane Boucheron
  • Gábor Lugosi
  • Olivier Bousquet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3176)


Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools.


Independent Random Variable Relative Entropy Empirical Process Moment Generate Function Conditional Entropy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Milman, V., Schechman, G.: Asymptotic theory of finite-dimensional normed spaces. Springer, New York (1986)Google Scholar
  2. 2.
    McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics 1989, pp. 148–188. Cambridge University Press, Cambridge (1989)CrossRefGoogle Scholar
  3. 3.
    McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195–248. Springer, New York (1998)CrossRefGoogle Scholar
  4. 4.
    Ahlswede, R., Gács, P., Körner, J.: Bounds on conditional probabilities with applications in multi-user communication. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 34, 157–177 (1976) (correction in 39, 353–354,1977)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Marton, K.: A simple proof of the blowing-up lemma. IEEE Transactions on Information Theory 32, 445–446 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Marton, K.: Bounding \(\bar{d}\)-distance by informational divergence: a way to prove measure concentration. Annals of Probability 24, 857–866 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Marton, K.: A measure concentration inequality for contracting Markov chains. Geometric and Functional Analysis 6, 556–571 (1996) (Erratum: 7:609–613, 1997)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Dembo, A.: Information inequalities and concentration of measure. Annals of Probability 25, 927–939 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Massart, P.: Optimal constants for Hoeffding type inequalities. Technical report, Mathematiques, Université de Paris-Sud, Report 98.86 (1998) Google Scholar
  10. 10.
    Rio, E.: Inégalités de concentration pour les processus empiriques de classes de parties. Probability Theory and Related Fields 119, 163–175 (2001)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Talagrand, M.: Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’I.H.E.S. 81, 73–205 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Talagrand, M.: New concentration inequalities in product spaces. Inventiones Mathematicae 126, 505–563 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Talagrand, M.: A new look at independence. Annals of Probability 24, 1–34 (1996) (Special Invited Paper)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Luczak, M.J., McDiarmid, C.: Concentration for locally acting permutations. In: Discrete Mathematics (to appear, 2003)Google Scholar
  15. 15.
    McDiarmid, C.: Concentration for independent permutations. Combinatorics, Probability, and Computing 2, 163–178 (2002)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Panchenko, D.: A note on Talagrand’s concentration inequality. Electronic Communications in Probability 6 (2001)Google Scholar
  17. 17.
    Panchenko, D.: Some extensions of an inequality of Vapnik and Chervonenkis. Electronic Communications in Probability 7 (2002)Google Scholar
  18. 18.
    Panchenko, D.: Symmetrization approach to concentration inequalities for empirical processes. Annals of Probability (to appear, 2003)Google Scholar
  19. 19.
    de la Peña, V., Giné, E.: Decoupling: from Dependence to Independence. Springer, New York (1999)CrossRefGoogle Scholar
  20. 20.
    Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM: Probability and Statistics 1, 63–87 (1997), MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Ledoux, M.: Isoperimetry and Gaussian analysis. In: Bernard, P. (ed.) Lectures on Probability Theory and Statistics, Ecole d’Eté de Probabilités de St-Flour XXIV-1994, pp. 165–294 (1996)Google Scholar
  22. 22.
    Bobkov, S., Ledoux, M.: Poincaré’s inequalities and Talagrands’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields 107, 383–400 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Massart, P.: About the constants in Talagrand’s concentration inequalities for empirical processes. Annals of Probability 28, 863–884 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Klein, T.: Une inégalité de concentration à gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334, 501–504 (2002)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Boucheron, S., Lugosi, G., Massart, P.: A sharp concentration inequality with applications. Random Structures and Algorithms 16, 277–292 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities using the entropy method. The Annals of Probability 31, 1583–1614 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Bousquet, O.: A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Acad. Sci. Paris 334, 495–500 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Bousquet, O.: Concentration inequalities for sub-additive functions using the entropy method. In: Giné, E., Houdré., C., Nualart, D. (eds.) Stochastic Inequalities and Applications, Birkhauser. Progress in Probability, vol. 56, pp. 213–247 (2003)Google Scholar
  29. 29.
    Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.: Moment inequalities for functions of independent random variables. The Annals of Probability (to appear, 2004)Google Scholar
  30. 30.
    Janson, S., Luczak, T., Ruciński, A.: Random graphs. John Wiley, New York (2000)CrossRefzbMATHGoogle Scholar
  31. 31.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Chernoff, H.: A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 23, 493–507 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Okamoto, M.: Some inequalities relating to the partial sum of binomial probabilities. Annals of the Institute of Statistical Mathematics 10, 29–35 (1958)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Bennett, G.: Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association 57, 33–45 (1962)CrossRefzbMATHGoogle Scholar
  35. 35.
    Bernstein, S.: The Theory of Probabilities. Gastehizdat Publishing House, Moscow (1946)Google Scholar
  36. 36.
    Efron, B., Stein, C.: The jackknife estimate of variance. Annals of Statistics 9, 586–596 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Steele, J.: An Efron-Stein inequality for nonsymmetric statistics. Annals of Statistics 14, 753–758 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)CrossRefzbMATHGoogle Scholar
  39. 39.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)CrossRefzbMATHGoogle Scholar
  40. 40.
    Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)zbMATHGoogle Scholar
  41. 41.
    van der Waart, A., Wellner, J.: Weak convergence and empirical processes. Springer, New York (1996)Google Scholar
  42. 42.
    Dudley, R.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)CrossRefzbMATHGoogle Scholar
  43. 43.
    Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics 30 (2002)Google Scholar
  44. 44.
    Massart, P.: Some applications of concentration inequalities to statistics. Annales de la Faculté des Sciencies de Toulouse IX, 245–303 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Bartlett, P., Boucheron, S., Lugosi, G.: Model selection and error estimation. Machine Learning 48, 85–113 (2001)CrossRefzbMATHGoogle Scholar
  46. 46.
    Lugosi, G., Wegkamp, M.: Complexity regularization via localized random penalties (submitted, 2003)Google Scholar
  47. 47.
    Bousquet, O.: New approaches to statistical learning theory. Annals of the Institute of Statistical Mathematics 55, 371–389 (2003)MathSciNetzbMATHGoogle Scholar
  48. 48.
    Lugosi, G.: Pattern classification and learning theory. In: Györfi, L. (ed.) Principles of Nonparametric Learning, pp. 5–62. Springer, Viena (2002)Google Scholar
  49. 49.
    Devroye, L., Györfi, L.: Nonparametric Density Estimation: The L1 View. John Wiley, New York (1985)zbMATHGoogle Scholar
  50. 50.
    Devroye, L.: The kernel estimate is relatively stable. Probability Theory and Related Fields 77, 521–536 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  51. 51.
    Devroye, L.: Exponential inequalities in nonparametric estimation. In: Roussas, G. (ed.) Nonparametric Functional Estimation and Related Topics. NATO ASI Series, pp. 31–44. Kluwer Academic Publishers, Dordrecht (1991)CrossRefGoogle Scholar
  52. 52.
    Devroye, L., Lugosi, G.: Combinatorial Methods in Density Estimation. Springer, New York (2000)zbMATHGoogle Scholar
  53. 53.
    Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 1902–1914 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Bartlett, P., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)MathSciNetzbMATHGoogle Scholar
  55. 55.
    Bartlett, P.L., Bousquet, O., Mendelson, S.: Localized rademacher complexities. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS, vol. 2375, pp. 44–48. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  56. 56.
    Vapnik, V., Chervonenkis, A.: Theory of Pattern Recognition. In: Nauka, Moscow (1974) (in Russian); German translation: Theorie der Zeichenerkennung. Akademie Verlag, Berlin (1979)Google Scholar
  57. 57.
    Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36, 929–965 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)CrossRefzbMATHGoogle Scholar
  59. 59.
    Rhee, W., Talagrand, M.: Martingales, inequalities, and NP-complete problems. Mathematics of Operations Research 12, 177–181 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  60. 60.
    Shamir, E., Spencer, J.: Sharp concentration of the chromatic number on random graphs g n,p. Combinatorica 7, 374–384 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  61. 61.
    Samson, P.M.: Concentration of measure inequalities for Markov chains and ø-mixing processes. Annals of Probability 28, 416–461 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  62. 62.
    Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, New York (1991)CrossRefzbMATHGoogle Scholar
  63. 63.
    Han, T.: Nonnegative entropy measures of multivariate symmetric correlations. Information and Control 36 (1978)Google Scholar
  64. 64.
    Beckner, W.: A generalized Poincaré inequality for Gaussian measures. Proceedings of the American Mathematical Society 105, 397–400 (1989)MathSciNetzbMATHGoogle Scholar
  65. 65.
    Latała, R., Oleszkiewicz, C.: Between lobolev and Poincaré. In: Geometric Aspects of Functional Analysis, Israel Seminar (GAFA), 1996-2000. Lecture Notes in Mathematics, vol. 1745, pp. 147–168. Springer, Heidelberg (2000)Google Scholar
  66. 66.
    Chafaï, D.: On ø-entropies and ø-Sobolev inequalities. Technical report, arXiv.math.PR/0211103 (2002)Google Scholar
  67. 67.
    Ledoux, M.: Concentration of measure and logarithmic sobolev inequalities. In: Séminaire de Probabilités XXXIII. Lecture Notes in Mathematics, vol. 1709, pp. 120–216. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  68. 68.
    Ledoux, M.: The concentration of measure phenomenon. American Mathematical Society, Providence, RI (2001)Google Scholar
  69. 69.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Stéphane Boucheron
    • 1
  • Gábor Lugosi
    • 2
  • Olivier Bousquet
    • 3
  1. 1.Laboratoire d’InformatiqueUniversité de Paris-SudOrsayFrance
  2. 2.Department of EconomicsPompeu Fabra UniversityBarcelonaSpain
  3. 3.Max-Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations