A PAC-Bayes Bound for Tailored Density Estimation

  • Matthew Higgs
  • John Shawe-Taylor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6331)


In this paper we construct a general method for reporting on the accuracy of density estimation. Using variational methods from statistical learning theory we derive a PAC, algorithm-dependent bound on the distance between the data generating distribution and a learned approximation. The distance measure takes the role of a loss function that can be tailored to the learning problem, enabling us to control discrepancies on tasks relevant to subsequent inference. We apply the bound to an efficient mixture learning algorithm. Using the method of localisation we encode properties of both the algorithm and the data generating distribution, producing a tight, empirical, algorithm-dependent upper risk bound on the performance of the learner. We discuss other uses of the bound for arbitrary distributions and model averaging.


Tailored density estimation PAC-Bayes bounds localisation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shawe-Taylor, J., Dolia, A.: A framework for probability density estimation. In: ICML (2007)Google Scholar
  2. 2.
    Song, L., Zhang, X., Smola, A., Gretton, A., Schölkopf, B.: Tailoring density estimation via reproducing kernel moment matching. In: ICML 2008: Proceedings of the 25th international conference on Machine learning, pp. 992–999. ACM, New York (2008)CrossRefGoogle Scholar
  3. 3.
    McAllester, D.A.: PAC-Bayesian model averaging. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 164–170. ACM Press, New York (1999)CrossRefGoogle Scholar
  4. 4.
    Seeger, M.: Pac-Bayesian generalisation error bounds for gaussian process classification. J. Mach. Learn. Res. 3, 233–269 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Langford, J.: Tutorial on practical prediction theory for classification. J. Mach. Learn. Res. 6, 273–306 (2005)MathSciNetGoogle Scholar
  6. 6.
    Audibert, J.Y.: Aggregated estimators and empirical complexity for least square regression. Annales de l’Institut Henri Poincare (B) Probability and Statistics 40(6), 685–736 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Dalalyan, A., Tsybakov, A.B.: Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72(1-2), 39–61 (2008)CrossRefGoogle Scholar
  8. 8.
    Zhang, T.: Information-theoretic upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory 52(4), 1307–1321 (2006)CrossRefGoogle Scholar
  9. 9.
    Seldin, Y., Tishby, N.: A PAC-Bayesian approach to unsupervised learning with application to co-clustering analysis. Journal of Machine Learning Research, 1–46 (03 2010)Google Scholar
  10. 10.
    Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 449–456. MIT Press, Cambridge (2007)Google Scholar
  11. 11.
    Ralaivola, L., Szafranski, M., Stempfel, G.: Chromatic PAC-Bayes Bounds for Non-IID Data. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics AISTATS 2009. JMLR Workshop and Conference Proceedings, vol. 5, pp. 416–423 (2009)Google Scholar
  12. 12.
    Lever, G., Laviolette, F., Shawe-Taylor, J.: Distribution dependent PAC-Bayes priors. Technical report, University College London (2010)Google Scholar
  13. 13.
    Catoni, O.: A PAC-Bayesian approach to adaptive classification. Technical report, Laboratoire de Probabilités et Modéles Aléatoires, Universités Paris 6 and Paris 7 (2003)Google Scholar
  14. 14.
    Audibert, J.Y.: A better variance control for PAC-Bayesian classification. Technical report, Laboratoire de Probabilités et Modéles Aléatoires, Universités Paris 6 and Paris 7 (2004)Google Scholar
  15. 15.
    Alquier, P.: PAC-Bayesian bounds for randomized empirical risk minimizers. In: Mathematical Methods of StatisticS (2007)Google Scholar
  16. 16.
    Catoni, O.: Pac-Bayesian supervised classification: The thermodynamics of statistical learning (2007)Google Scholar
  17. 17.
    Serfling, R.J.: Approximation Theorems of Mathematical Statistics. John Wiley and Sons, Chichester (1980)zbMATHCrossRefGoogle Scholar
  18. 18.
    Maurer, A.: A note on the PAC Bayesian theorem (2004)Google Scholar
  19. 19.
    Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: ICML (2009)Google Scholar
  20. 20.
    Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 13–31. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Lanckriet, G.R.G., Shoelkopf, B.: Injective Hilbert space embeddings of probability measures. In: COLT, pp. 111–122. Omnipress (2008)Google Scholar
  22. 22.
    Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Interscience, New York (1991)zbMATHCrossRefGoogle Scholar
  23. 23.
    Bonnans, J., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Statistics. Springer, Heidelberg (2000)zbMATHGoogle Scholar
  24. 24.
    Shawe-Taylor, J., Cristianini, N.: Estimating the moments of a random vector. In: Proceedings of GRETSI 2003 Conference, vol. 1, p. 47–52 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Matthew Higgs
    • 1
  • John Shawe-Taylor
    • 1
  1. 1.Center for Computational Statistics and Machine LearningUniversity College London 

Personalised recommendations