Skip to main content

PAC-Bayes Theory

  • Chapter
  • First Online:
  • 596 Accesses

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 15))

Abstract

It is well known that combining the output of several rules results in much better performance than using any one of them alone. In fact many state-of-the-art algorithms search for a weighted combination of simpler rules [1]: Bagging [2, 3], Boosting [4, 5] and Bayesian approaches [6] or even Kernel methods [7] and Neural Networks [8].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In the following we will sometimes indicate \(\mathtt {KL} = \mathtt {KL}[\mathsf {Q}||\mathsf {P}]\) for brevity.

References

  1. Germain P, Lacasse A, Laviolette F, Marchand M, Roy JF (2015) Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. J Mach Learn Res 16(4):787–860

    MathSciNet  MATH  Google Scholar 

  2. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  4. Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686

    Article  MathSciNet  MATH  Google Scholar 

  5. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    Article  MATH  Google Scholar 

  6. Gelman A, Carlin JB, Stern HS, Rubin DB (2014) Bayesian data analysis, vol 2. Taylor & Francis

    Google Scholar 

  7. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  8. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press

    Google Scholar 

  9. Nitzan S, Paroush J (1982) Optimal decision rules in uncertain dichotomous choice situations. Int Econ Rev 23(2):289–97

    Article  MathSciNet  MATH  Google Scholar 

  10. Catoni O (2007) PAC-Bayesian supervised classification. Institute of Mathematical Statistics

    Google Scholar 

  11. Lever G, Laviolette F, Shawe-Taylor J (2010) Distribution-dependent PAC-Bayes priors. In Algorithmic learning theory

    Google Scholar 

  12. Parrado-Hernández E, Ambroladze A, Shawe-Taylor J, Sun S (2012) PAC-Bayes bounds with data dependent priors. J Mach Learn Res 13(1):3507–3531

    MathSciNet  MATH  Google Scholar 

  13. Lever G, Laviolette F, Shawe-Taylor J (2013) Tighter PAC-Bayes bounds through distribution-dependent priors. Theor Comput Sci 473:4–28

    Article  MathSciNet  MATH  Google Scholar 

  14. Berend D, Kontorovitch A (2014) Consistency of weighted majority votes. In: Neural Inf Process Syst

    Google Scholar 

  15. Donsker MD, Varadhan SRS (1975) Asymptotic evaluation of certain markov process expectations for large time, i. Commun Pure Appl Math 28(1):1–47

    Article  MathSciNet  MATH  Google Scholar 

  16. Shawe-Taylor J, Williamson RC (1997) A PAC analysis of a Bayesian estimator. In: Computational learning theory

    Google Scholar 

  17. McAllester DA (1998) Some PAC-Bayesian theorems. In: Computational learning theory

    Google Scholar 

  18. McAllester DA (2003) PAC-Bayesian stochastic model selection. Mach Learn 51(1):5–21

    Article  MATH  Google Scholar 

  19. Langford J, Seeger M (2001) Bounds for averaging classifiers. Technical report, Carnegie Mellon, Department of Computer Science

    Google Scholar 

  20. McAllester DA (2003) Simplified PAC-Bayesian margin bounds. In: Learning theory and kernel machines

    Google Scholar 

  21. Laviolette F, Marchand M (2005) PAC-Bayes risk bounds for sample-compressed gibbs classifiers. In: International conference on machine learning

    Google Scholar 

  22. Lacasse A, Laviolette F, Marchand M, Germain P, Usunier N (2006) PAC-Bayes bounds for the risk of the majority vote and the variance of the gibbs classifier. In: Neural information processing systems

    Google Scholar 

  23. Laviolette F, Marchand M (2007) PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. J Mach Learn Res 8(7):1461–1487

    MathSciNet  MATH  Google Scholar 

  24. Germain P, Lacasse A, Laviolette F, Marchand M (2009) PAC-Bayesian learning of linear classifiers. In: International conference on machine learning

    Google Scholar 

  25. Tolstikhin IO, Seldin Y (2013) PAC-Bayes-empirical-bernstein inequality. In: Neural information processing systems

    Google Scholar 

  26. Van Erven T (2014) PAC-Bayes mini-tutorial: a continuous union bound. arXiv preprint arXiv:1405.1580

  27. London B, Huang B, Taskar B, Getoor L, Cruz S (2014) PAC-Bayesian collective stability. In: Artificial intelligence and statistics

    Google Scholar 

  28. Shawe-Taylor J, Langford J (2002) PAC-Bayes & margins. In: Neural information processing systems

    Google Scholar 

  29. Seeger M (2002) PAC-Bayesian generalisation error bounds for Gaussian process classification. J Mach Learn Res 3:233–269

    Article  MathSciNet  MATH  Google Scholar 

  30. Seeger M (2003) Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis, University of Edinburgh

    Google Scholar 

  31. Audibert JY, Bousquet O (2003) PAC-Bayesian generic chaining. In: Neural information processing systems

    Google Scholar 

  32. Seldin Y, Tishby N (2009) PAC-Bayesian generalization bound for density estimation with application to co-clustering. In: International conference on artificial intelligence and statistics

    Google Scholar 

  33. Ralaivola L, Szafranski M, Stempfel G (2010) Chromatic PAC-Bayes bounds for non-iid data: applications to ranking and stationary \(\beta \)-mixing processes. J Mach Learn Res 11:1927–1956

    MathSciNet  MATH  Google Scholar 

  34. Seldin Y, Tishby N (2010) PAC-Bayesian analysis of co-clustering and beyond. J Mach Learn Res 11:3595–3646

    MathSciNet  MATH  Google Scholar 

  35. Audibert JY (2010) PAC-Bayesian aggregation and multi-armed bandits. arXiv preprint arXiv:1011.3396

  36. Roy JF, Marchand M, Laviolette F (2011) From PAC-Bayes bounds to quadratic programs for majority votes. In: International conference on machine learning

    Google Scholar 

  37. Seldin Y, Auer P, Shawe-Taylor JS, Ortner R, Laviolette F (2011) PAC-Bayesian analysis of contextual bandits. In: Neural information processing systems

    Google Scholar 

  38. Germain P, Lacoste A, Marchand M, Shanian S, Laviolette F (2011) A PAC-Bayes sample-compression approach to kernel methods. In: International conference on machine learning

    Google Scholar 

  39. Seldin Y, Laviolette F, Cesa-Bianchi N, Shawe-Taylor J, Auer P (2012) PAC-Bayesian inequalities for martingales. IEEE Trans Inf Theory 58(12):7086–7093

    Article  MathSciNet  MATH  Google Scholar 

  40. Morvant E (2013) Apprentissage de vote de majorité pour la classification supervisée et l’adaptation de domaine: approches PAC-Bayésiennes et combinaison de similarités. Aix-Marseille Université

    Google Scholar 

  41. Bégin L, Germain P, Laviolette F, Roy JF (2014) PAC-Bayesian theory for transductive learning. In: International conference on artificial intelligence and statistics

    Google Scholar 

  42. Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306

    MathSciNet  MATH  Google Scholar 

  43. Oneto L, Anguita D, Ridella S (2016) PAC-Bayesian analysis of distribution dependent priors: tighter risk bounds and stability analysis. Pattern Recogn Lett 80:200–207

    Article  Google Scholar 

  44. Ambroladze A, Parrado-Hernández E, Shawe-Taylor J (2006) Tighter PAC-Bayes bounds. In: Advances in neural information processing systems

    Google Scholar 

  45. Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526

    MathSciNet  MATH  Google Scholar 

  46. Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media

    Google Scholar 

  47. Maurer A (2004) A note on the PAC Bayesian theorem. arXiv preprint cs/0411099

    Google Scholar 

  48. Bégin L, Germain P, Laviolette F, Roy JF (2016) PAC-Bayesian bounds based on the rényi divergence. In: International conference on artificial intelligence and statistics

    Google Scholar 

  49. Younsi M (2012) Proof of a combinatorial conjecture coming from the PAC-Bayesian machine learning theory. arXiv preprint arXiv:1209.0824

  50. Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 404–413

    Google Scholar 

  51. Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett 36(3):275–283

    Article  Google Scholar 

  52. Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482

    MathSciNet  MATH  Google Scholar 

  53. Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076

    Article  MATH  Google Scholar 

  54. Oneto L, Ridella S, Anguita D (2017) Differential privacy and generalization: sharper bounds with applications. Pattern Recogn Lett 89:31–38

    Article  MATH  Google Scholar 

  55. Oneto L, Ridella S, Anguita D (2017) Generalization performances of randomized classifiers and algorithms built on data dependent distributions. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning

    Google Scholar 

  56. Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Oneto .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Oneto, L. (2020). PAC-Bayes Theory. In: Model Selection and Error Estimation in a Nutshell. Modeling and Optimization in Science and Technologies, vol 15. Springer, Cham. https://doi.org/10.1007/978-3-030-24359-3_8

Download citation

Publish with us

Policies and ethics