Distribution and Density Estimation

  • L. Devroye
  • L. Györfi
Part of the International Centre for Mechanical Sciences book series (CISM, volume 434)


The classical nonparametric example is the problem estimating a distribution function F(x) from i.i.d. samples X 1,X 2,...X n taking values in R d (d ≥ 1). Here on the one hand the construction of the empirical distribution function F n (x)is distribution-free, and on the other hand its uniform convergence, the Glivenko-Cantelli Theorem holds for all F (x):


Density Estimate Kernel Density Estimate Radial Basis Function Network Kernel Estimate Empirical Measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abou-Jaoude, S. (1976). Conditions nécessaires et suffisantes de convergence Li en probabilité de l’histogramme pour une densité. Annales de l’Institut Henri Poincaré, XII, 213–231.Google Scholar
  2. Akaike, H. (1954). “An approximation to the density function,” Annals of the Institute of Statistical Mathematics, vol. 6, pp. 127–132.MathSciNetCrossRefMATHGoogle Scholar
  3. Anthony, M. and Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge.CrossRefMATHGoogle Scholar
  4. Azuma, K. (1967). “Weighted sums of certain dependent random variables,” Tohoku Mathematical Journal, vol. 37, pp. 357–367.MathSciNetCrossRefGoogle Scholar
  5. Barron, A. R. (1988). The convergence in information of probability density estimates. IEEE ISIT, Kobe, Japan.Google Scholar
  6. Barron, A. R. (1989). Uniformly powerful goodness of fit tests. Ann. Statist. 17, 107–124.MathSciNetCrossRefMATHGoogle Scholar
  7. Barron, A. Birgé, L. and Massart, P. (1999). “Risk bounds for model selection via penalization;’ Probability Theory and Related Fields, vol. 113, pp. 301–415.MathSciNetCrossRefMATHGoogle Scholar
  8. Barron, A. R., Györfi, L. and van der Meulen, E. C. (1992). Distribution estimates consistent in total variation and in two types of information divergence. IEEE Trans. on Information Theory, 38, pp. 1437–1454.CrossRefMATHGoogle Scholar
  9. Beirlant, J., Berlinet, A. and Györfi, L. (1999). On piecewise linear density estimation. Statistica Neerlandica, 53, pp. 287–308.MathSciNetCrossRefMATHGoogle Scholar
  10. Beirlant, J., Devroye, L., Györfi, L. and Vajda I. (2001). Large deviations of divergence measures on partitions. J. Statistical Planning and Inference, 93, pp. 1–16.CrossRefMATHGoogle Scholar
  11. Beirlant, J. and Györfi, L. (1998). On the L1-error in histogram density estimation: the multidimensional case. J. Nonparametric Statistics, 9, pp. 197–216.CrossRefMATHGoogle Scholar
  12. Beirlant, J., Györfi, L. and Lugosi, G. (1994). On the asymptotic normality of the L1- and L2- errors in the histogram density estimation. Canadian J. Statistics, 22, pp. 309–318.CrossRefMATHGoogle Scholar
  13. Beirlant, J. and Mason, D. M. (1995). “On the asymptotic normality of Lp-norms of empirical functionals,” Mathematical Methods of Statistics, vol. 4, pp. 1–19.MathSciNetMATHGoogle Scholar
  14. Berlinet, A. (1995). Central limit theorems in functional estimation. Bulletin of the International Statistical Institute, 56, pp. 531–548.Google Scholar
  15. Berlinet, A. and Devroye, L. (1994). “A comparison of kernel density estimates,” Publications de l’Institut de Statistique de l’Université de Paris, vol. 38, pp. 3–59.MathSciNetMATHGoogle Scholar
  16. Berlinet, A., Devroye, L. and Györfi, L. (1995). Asymptotic normality of L1 error in density estimation. Statistics, 26, pp. 329–343.MathSciNetCrossRefMATHGoogle Scholar
  17. Berlinet, A., Györfi, L. and van der Meulen, E. (1995). The asymptotic normality of relative entropy in multivariate density estimation. Publications de l’Institut de Statistique de l’Université de Paris, 41, pp. 3–27.Google Scholar
  18. Birgé, L. (1986). On estimating a density using Hellinger distance and some other strange facts. Probability Theory and Related Fields, 71, pp. 271–291.MathSciNetCrossRefMATHGoogle Scholar
  19. Bretagnolle, J. and Huber, C. (1979). “Estimation des densités: Risque minimax,” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 47, pp. 119–137.MathSciNetCrossRefMATHGoogle Scholar
  20. Broniatowski, M., Deheuvels, P. and Devroye, L. (1989). “On the relationship between stability of extreme order statistics and convergence of the maximum likelihood kernel density estimate,” Annals of Statistics, vol. 17, pp. 1070–1086.MathSciNetCrossRefMATHGoogle Scholar
  21. Butzer, R L. and Nessel, R. J. (1971). Fourier Analysis and Approximation, BirkhäuserVerlag, Basel, 1971.Google Scholar
  22. Cao, R., Cuevas, A. and Gonzalez-Manteiga, W. (1994). “A comparative study of several smoothing methods in density estimation,” Computational Statistics and Data Analysis, vol. 17, pp. 153–176.CrossRefMATHGoogle Scholar
  23. Devroye, L. (1989a) “The double kernel method in density estimation,” Annales de l’Institut Henri Poincaré, vol. 25, pp. 533–580.MathSciNetMATHGoogle Scholar
  24. Castellan, G. (2000). “Sélection d’histogrammes ou de modèles exponentiels de poly-nomes par morceaux à l’aide d’un critère de type Akaike,” Thèse, Mathématiques, Université de Paris-Sud.Google Scholar
  25. Cline, D. B. H. (1988). “Admissible kernel estimators of a multivariate density,” Annals of Statistics, vol. 16, pp. 1421–1427.MathSciNetCrossRefMATHGoogle Scholar
  26. Cline, D. B. H. (1990) “Optimal kernel estimation of densities,” Annals of the Institute of Statistical Mathematics, vol. 42, pp. 287–303.MathSciNetCrossRefMATHGoogle Scholar
  27. Csiszâr, I. (1967). Information-type measures of divergence of probability distributions and indirect observations. Studia Sci. Math. Hungar., 2, pp. 299–318.MathSciNetMATHGoogle Scholar
  28. Csiszâr, I. and Körner, J. (1981). Information Theory: Coding Theorems for Memoryless Systems. Academic Press, New York.MATHGoogle Scholar
  29. Csörgô, M. and Horvath, L. (1988). “Central limit theorems for Lp-norms of density estimators,” Probability Theory and Related Fields, vol. 80, pp. 269–291.MathSciNetCrossRefGoogle Scholar
  30. Deheuvels, P. (1977). “Estimation nonparamétrique de la densité par histogrammes generalisés,” Publications de l’Institut de Statistique de l’Université de Paris, vol. 22, pp. 1–23.MathSciNetMATHGoogle Scholar
  31. Dembo, A. and Zeitouni, O. (1992). Large Deviations Techniques and Applications. Jones and Bartlett Publishers.Google Scholar
  32. Devroye, L. (1983a). The equivalence of weak, strong and complete convergence in Ll for kernel density estimates. Annals of Statistics, 11, pp. 896–904.MathSciNetCrossRefMATHGoogle Scholar
  33. Devroye, L. (1983b). On arbitrary slow rates of global convergence in density estimation. Zeitschrift für Wahscheinlichkeitstheorie und verwandte Gebiete, 62, pp. 475–483.MathSciNetCrossRefMATHGoogle Scholar
  34. Devroye, L. (1987). A Course in Density Estimation, Birkhäuser-Verlag, Boston.MATHGoogle Scholar
  35. Devroye, L. (1988a). The kernel estimate is relatively stable. Probability Theory and Related Fields, 77, pp. 521–536.MathSciNetCrossRefMATHGoogle Scholar
  36. Devroye, L. (1988b) “Asymptotic performance bounds for the kernel estimate,” Annals of Statistics, vol. 16, pp. 1162–1179.MathSciNetCrossRefMATHGoogle Scholar
  37. de Guzman, M. (1981). Real Variable Methods in Fourier Analysis, North-Holland, Amsterdam.MATHGoogle Scholar
  38. Devroye, L. (1989b). “Nonparametric density estimates with improved performance on given sets of densities,” Statistics (Mathematische Operationsforschung und Statistik), vol. 20, pp. 357–376.MathSciNetMATHGoogle Scholar
  39. Devroye, L. (1991). “Exponential inequalities in nonparametric estimation,” in: Nonparametric Functional Estimation and Related Topics, (edited by G. Roussas ), pp. 31–44, NATO ASI Series, Kluwer Academic, Dordrecht.CrossRefGoogle Scholar
  40. Devroye, L. (1997). “Universal smoothing factor selection in density estimation: Theory and practice (with discussion),” Test, vol. 6, pp. 223–320.MathSciNetCrossRefMATHGoogle Scholar
  41. Devroye, L. (1995). Another proof of a slow convergence result of Birgé. Statistics and Probability Letters, 23, pp. 63–67.MathSciNetCrossRefMATHGoogle Scholar
  42. Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: the L 1 View. Wiley.Google Scholar
  43. Devroye, L. and Györfi, L. (1990). No empirical measure can converge in total variation sense for all distributions. Annals of Statistics, 18, pp. 1496–1499.MathSciNetCrossRefMATHGoogle Scholar
  44. Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer Verlag.Google Scholar
  45. Devroye, L. and Lugosi, G. (1996). “A universally acceptable smoothing factor for kernel density estimation,” Annals of Statistics, vol. 24, pp. 2499–2512.MathSciNetCrossRefMATHGoogle Scholar
  46. Devroye, L. and Lugosi, G. (1997). “Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes,” Annals of Statistics, vol. 25, pp. 2626–2637.MathSciNetCrossRefMATHGoogle Scholar
  47. Devroye. L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation, Springer-Verlag, New York.CrossRefGoogle Scholar
  48. Devroye, L. and Penrod, C. S. (1984). “Distribution-free lower bounds in density estimation,” Annals of Statistics, vol. 12, pp. 1250–1262.MathSciNetCrossRefMATHGoogle Scholar
  49. Devroye L. and Wand, M. R (1993). “On the effect of density shape on the performance of its kernel estimate,” Statistics, vol. 24, pp. 215–233.MathSciNetCrossRefMATHGoogle Scholar
  50. de Guzman, M. (1975). Differentiation of Integrals in Rn, Lecture Notes in Mathematics #481, Springer-Verlag, Berlin.Google Scholar
  51. Koiran, P. and Sontag, E. D. (1997). “Neural networks with quadratic vc dimension,” Journal of Computer and System Science, vol. 54, pp. 190–198.MathSciNetCrossRefMATHGoogle Scholar
  52. Dvoretzky, A. Kiefer, J. and Wolfowitz, J. (1956). `Asymptotic minimax character of a sample distribution function and of the classical multinomial estimator,“ Annals of Mathematical Statistics, vol. 33, pp. 642–669.MathSciNetCrossRefGoogle Scholar
  53. Epanechnikov, V. A. (1969). “Nonparametric estimation of a multivariate probability density,” Theory of Probability and its Applications, vol. 14, pp. 153–158.CrossRefGoogle Scholar
  54. Goldberg P. and Jerrum, M. (1995). “Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers,” Machine Learning, vol. 18, pp. 131–148.MATHGoogle Scholar
  55. Györfi, L., Pâli, I. and van der Meulen, E. C. (1994). There is no universal source code for infinite alphabet. IEEE Trans. on Information Theory, 40, pp. 267–271.CrossRefMATHGoogle Scholar
  56. Györfi, L. and van der Meulen, E. C. (1994). There is no density estimate consistent in information divergence for all densities. Transactions of the Twelfth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, pp. 88–90.Google Scholar
  57. Haagerup, U. (1978). “Les meilleures constantes de l’inégalité de Khintchine,” Comptes Rendus des Séances de l’Académie des Sciences de Paris. Séries A, vol. 286, pp. 259–262.MathSciNetMATHGoogle Scholar
  58. Hall, P. and Wand, M. P. (1988). “Minimizing L1 distance in nonparametric density estimation,” Journal of Multivariate Analysis, vol. 26, pp. 59–88.MathSciNetCrossRefMATHGoogle Scholar
  59. Hoeffding, W. (1963). “Probability inequalities for sums of bounded random variables,” Journal of the American Statistical Association, vol. 58, pp. 13–30.MathSciNetCrossRefMATHGoogle Scholar
  60. Holmström, L. and Klemelä, J. (1992). “Asymptotic bounds for the expected L1 error of a multivariate kernel density estimator,” Journal of Multivariate Analysis, vol. 40, pp. 245255.Google Scholar
  61. Kemperman, J. H. B. (1969). An optimum rate of transmitting information. Ann. Math. Statist., 40, pp. 2156–2177.MathSciNetCrossRefMATHGoogle Scholar
  62. Karpinski, M. and Macintyre, A. (1997). “Polynomial bounds for vc dimension of sigmoidal and general pfaffian neural networks,” Journal of Computer and System Science, vol. 54, pp. 169–176.MathSciNetCrossRefMATHGoogle Scholar
  63. Khovanskii, A. G. (1991). “Fewnomials,” in: Translations of Mathematical Monographs, vol. 88, American Mathematical Society, Providence, RI.Google Scholar
  64. Krzyzak, A and Linder, T. (1998). “Radial basis function networks and complexity regularization in function learning,” IEEE Transactions on Neural Networks, vol. 9, pp. 247256.Google Scholar
  65. Kullback, S. (1967). A lower bound for discrimination in terms of variation. IEEE Trans. Information Theory, 13, pp. 126–127.CrossRefGoogle Scholar
  66. LeCam, L. (1973). “Convergence of estimates under dimensionality restrictions,” Annals of Statistics, vol. 1, pp. 38–53.MathSciNetCrossRefGoogle Scholar
  67. Ledoux, M. (1996). “On Talagrand’s deviation inequalities for product measures,” ESAIM: Probability and Statistics, vol. 1, pp. 63–87.MathSciNetCrossRefMATHGoogle Scholar
  68. Liese, F. and Vajda, I. (1987). Convex statistical distances. Teubner, Leipzig.MATHGoogle Scholar
  69. Lugosi, G. and Nobel, A. (1996). “Consistency of data-driven histogram methods for density estimation and classification,” Annals of Statistics, vol. 24, pp. 687–706.MathSciNetCrossRefMATHGoogle Scholar
  70. Macintyre, A. and Sontag, E. D. (1993). “Finiteness results for sigmoidal neural networks,” in: Proceedings of the 25th Annual ACM Symposium on the Theory of Computing, pp. 325–334, Association of Computing Machinery, New York.Google Scholar
  71. Massart, P. (1990). “The tight constant in the Dvoretzky—Kiefer—Wolfowitz inequality,” Annals of Probability, vol. 18, pp. 1269–1283.MathSciNetCrossRefMATHGoogle Scholar
  72. McDiarmid, C. (1989). “On the method of bounded differences,” in: Surveys in Combinatorics 1989, vol. 141, pp. 148–188, London Mathematical Society Lecture Notes Series, Cambridge University Press, Cambridge.Google Scholar
  73. Nadaraya, E. A. (1974). “On the integral mean square error of some nonparametric estimates for the density function,” Theory of Probability and its Applications, vol. 19, pp. 133–141.CrossRefMATHGoogle Scholar
  74. Parzen, E. (1962). On the estimation of a probability density function and the mode. Annals of Mathematical Statistics, 33, pp. 1065–1076.MathSciNetCrossRefMATHGoogle Scholar
  75. Poggio, T. and Girosi, F. (1990). “A theory of networks for approximation and learning,” Proceedings of the IEEE, vol. 78, pp. 1481–1497.CrossRefGoogle Scholar
  76. Rényi, A. (1959). On the dimension and entropy of probability distributions. Acta Math. Acad. Sci. Hungar., 10, pp. 193–215.MathSciNetCrossRefMATHGoogle Scholar
  77. Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27, pp. 832–837.MathSciNetCrossRefMATHGoogle Scholar
  78. Rosenblatt, M. (1971). “Curve estimates,” Annals of Mathematical Statistics, vol. 42, pp. 1815–1842.MathSciNetCrossRefMATHGoogle Scholar
  79. Sanov, I. N. (1957). On the probability of large deviations of random variables. Mat. Sb., 42, pp. 11–44 (English translation in Sel. Transi. Math. Statist. Prob., 1, (1961), pp.213244).Google Scholar
  80. Scheffé, H. (1947). A useful convergence theorem for probability distributions. Ann. Math. Statist. 18, pp. 434–458.CrossRefMATHGoogle Scholar
  81. Stein, E. M. (1970). Singular Integrals and Differentiability Properties of Functions, Princeton University Press, Princeton, NJ.Google Scholar
  82. Stone, C. J. (1984). “An asymptotically optimal window selection rule for kernel density estimates,” Annals of Statistics, vol. 12, pp. 1285–1297.MathSciNetCrossRefMATHGoogle Scholar
  83. Szarek, S. J. (1976). “On the best constants in the Khintchine inequality,” Studia Mathematica, vol. 63, pp. 197–208.MathSciNetGoogle Scholar
  84. Talagrand, M. (1995). “Concentration of measure and isoperimetric inequalities in product spaces,” Institut des Hautes Etudes Scientifiques. Publications Mathématiques, vol. 81, pp. 73–205.MathSciNetCrossRefMATHGoogle Scholar
  85. Talagrand, M. (1996). “A new look at independence,” Annals of Probability, vol. 24, pp. 134.Google Scholar
  86. Vajda, I. (1970). Note on discrimination information and variation. IEEE Trans. Information Theory, IT-16, 771–773.Google Scholar
  87. Vajda, I. (1989). Theory of Statistical Inference and Information. Kluwer Academic Publishers.Google Scholar
  88. Vapnik, V. N. and Chervonenkis, A. Ya. (1971). “On the uniform convergence of relative frequencies of events to their probabilities,” Theory of Probability and its Applications, vol. 16, pp. 264–280.CrossRefMATHGoogle Scholar
  89. Watson, G. S. and Leadbetter, M. R. (1963). “On the estimation of the probability density, I,” Annals of Mathematical Statistics, vol. 34, pp. 480–491.MathSciNetCrossRefMATHGoogle Scholar
  90. Wheeden, R. L. and Zygmund, A. (1977). Measure and Integral, Marcel Dekker, New York.MATHGoogle Scholar
  91. Wegkamp, M. (2000). “Quasi universal bandwidth selection for kernel density estimators,” Canadian Journal of Statistics. To appear.Google Scholar
  92. Yatracos, Y. G. (1985). “Rates of convergence of minimum distance estimators and Kolmogorov’, entropy,” Annals of Statistics, vol. 13, pp. 768–774.MathSciNetCrossRefMATHGoogle Scholar
  93. Yatracos, Y. G. (1988). “A note on L1 consistent estimation Canadian Journal of Statistics,vol. 16, pp. 283–292.Google Scholar

Copyright information

© Springer-Verlag Wien 2002

Authors and Affiliations

  • L. Devroye
    • 1
  • L. Györfi
    • 2
  1. 1.McGill UniversityMontrealCanada
  2. 2.Budapest University of Technology and EconomicsBudapestHungary

Personalised recommendations