Advertisement

Distribution-Dependent PAC-Bayes Priors

  • Guy Lever
  • François Laviolette
  • John Shawe-Taylor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6331)

Abstract

We develop the idea that the PAC-Bayes prior can be informed by the data-generating distribution. We prove sharp bounds for an existing framework, and develop insights into function class complexity in this model and suggest means of controlling it with new algorithms. In particular we consider controlling capacity with respect to the unknown geometry of the data-generating distribution. We finally extend this localization to more practical learning methods.

Keywords

Empirical Risk Structural Risk Minimization Machine Learn Research Intrinsic Geometry Empirical Counterpart 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter pac-bayes bounds. In: NIPS, pp. 9–16. MIT Press, Cambridge (2006)Google Scholar
  2. Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Mathematical Journal 68, 357–367 (1967)CrossRefMathSciNetGoogle Scholar
  3. Balcan, M., Blum, A.: A discriminative model for semi-supervised learning. JACM, 57 (2010)Google Scholar
  4. Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 624–638. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7, 2399–2434 (2006)MathSciNetGoogle Scholar
  6. Blanchard, G., Fleuret, F.: Occam’s hammer. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 112–126. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  8. Catoni, O.: PAC-Bayesian surpevised classification: the thermodynamics of statistical learning. Monograph Series of the Institute of Mathematical Statistics (2007)Google Scholar
  9. Da Prato, G.: An introduction to infinite-dimensional analysis. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  10. Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: Pac-bayesian learning of linear classifiers. In: ICML, p. 45. ACM, New York (2009)Google Scholar
  11. Hein, M., Audibert, J.-Y., von Luxburg, U.: Graph laplacians and their convergence on random neighborhood graphs. CoRR (2006)Google Scholar
  12. Kallenberg, O., Sztencel, R.: Some dimension-free features of vector-valued martingales. Probability Theory and Related Fields 88, 215–247 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  13. Langford, J.: Tutorial on practical prediction theory for classification. Journal of Machine Learning Research 6, 273–306 (2005)MathSciNetGoogle Scholar
  14. Langford, J., Shawe-taylor, J.: Pac-bayes and margins. In: Advances in Neural Information Processing Systems, vol. 15, pp. 439–446. MIT Press, Cambridge (2002)Google Scholar
  15. Lever, G., Laviolette, F., Shawe-Taylor, J.: Distribution dependent pac-bayes priors. UCL technical report (2010), http://www.cs.ucl.ac.uk/staff/G.Lever/pubs/DDPB.pdf
  16. McAllester, D.A.: Pac-bayesian model averaging. In: COLT, pp. 164–170 (1999)Google Scholar
  17. Ralaivola, L., Szafranski, M., Stempfel, G.: Chromatic pac-bayes bounds for non-iid data: Applications to ranking and stationary β-mixing processes. CoRR, abs/0909.1993 (2009)Google Scholar
  18. Seeger, M.: Pac-bayesian generalisation error bounds for gaussian process classification. Journal of Machine Learning Research 3, 233–269 (2002)CrossRefMathSciNetGoogle Scholar
  19. Serfling, R.: Approximation theorems of mathematical statistics. Wiley, Chichester (1980)zbMATHCrossRefGoogle Scholar
  20. Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory 44, 1926–1940 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  21. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003, pp. 912–919 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Guy Lever
    • 1
  • François Laviolette
    • 2
  • John Shawe-Taylor
    • 1
  1. 1.University College London 
  2. 2.Université Laval 

Personalised recommendations