PAC-Bayes Theory

Oneto, Luca

doi:10.1007/978-3-030-24359-3_8

PAC-Bayes Theory

Luca Oneto¹⁷

Chapter
First Online: 18 July 2019

596 Accesses

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 15))

Abstract

It is well known that combining the output of several rules results in much better performance than using any one of them alone. In fact many state-of-the-art algorithms search for a weighted combination of simpler rules [1]: Bagging [2, 3], Boosting [4, 5] and Bayesian approaches [6] or even Kernel methods [7] and Neural Networks [8].

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In the following we will sometimes indicate \(\mathtt {KL} = \mathtt {KL}[\mathsf {Q}||\mathsf {P}]\) for brevity.

References

Germain P, Lacasse A, Laviolette F, Marchand M, Roy JF (2015) Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. J Mach Learn Res 16(4):787–860
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686
Article MathSciNet MATH Google Scholar
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336
Article MATH Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB (2014) Bayesian data analysis, vol 2. Taylor & Francis
Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press
Google Scholar
Nitzan S, Paroush J (1982) Optimal decision rules in uncertain dichotomous choice situations. Int Econ Rev 23(2):289–97
Article MathSciNet MATH Google Scholar
Catoni O (2007) PAC-Bayesian supervised classification. Institute of Mathematical Statistics
Google Scholar
Lever G, Laviolette F, Shawe-Taylor J (2010) Distribution-dependent PAC-Bayes priors. In Algorithmic learning theory
Google Scholar
Parrado-Hernández E, Ambroladze A, Shawe-Taylor J, Sun S (2012) PAC-Bayes bounds with data dependent priors. J Mach Learn Res 13(1):3507–3531
MathSciNet MATH Google Scholar
Lever G, Laviolette F, Shawe-Taylor J (2013) Tighter PAC-Bayes bounds through distribution-dependent priors. Theor Comput Sci 473:4–28
Article MathSciNet MATH Google Scholar
Berend D, Kontorovitch A (2014) Consistency of weighted majority votes. In: Neural Inf Process Syst
Google Scholar
Donsker MD, Varadhan SRS (1975) Asymptotic evaluation of certain markov process expectations for large time, i. Commun Pure Appl Math 28(1):1–47
Article MathSciNet MATH Google Scholar
Shawe-Taylor J, Williamson RC (1997) A PAC analysis of a Bayesian estimator. In: Computational learning theory
Google Scholar
McAllester DA (1998) Some PAC-Bayesian theorems. In: Computational learning theory
Google Scholar
McAllester DA (2003) PAC-Bayesian stochastic model selection. Mach Learn 51(1):5–21
Article MATH Google Scholar
Langford J, Seeger M (2001) Bounds for averaging classifiers. Technical report, Carnegie Mellon, Department of Computer Science
Google Scholar
McAllester DA (2003) Simplified PAC-Bayesian margin bounds. In: Learning theory and kernel machines
Google Scholar
Laviolette F, Marchand M (2005) PAC-Bayes risk bounds for sample-compressed gibbs classifiers. In: International conference on machine learning
Google Scholar
Lacasse A, Laviolette F, Marchand M, Germain P, Usunier N (2006) PAC-Bayes bounds for the risk of the majority vote and the variance of the gibbs classifier. In: Neural information processing systems
Google Scholar
Laviolette F, Marchand M (2007) PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. J Mach Learn Res 8(7):1461–1487
MathSciNet MATH Google Scholar
Germain P, Lacasse A, Laviolette F, Marchand M (2009) PAC-Bayesian learning of linear classifiers. In: International conference on machine learning
Google Scholar
Tolstikhin IO, Seldin Y (2013) PAC-Bayes-empirical-bernstein inequality. In: Neural information processing systems
Google Scholar
Van Erven T (2014) PAC-Bayes mini-tutorial: a continuous union bound. arXiv preprint arXiv:1405.1580
London B, Huang B, Taskar B, Getoor L, Cruz S (2014) PAC-Bayesian collective stability. In: Artificial intelligence and statistics
Google Scholar
Shawe-Taylor J, Langford J (2002) PAC-Bayes & margins. In: Neural information processing systems
Google Scholar
Seeger M (2002) PAC-Bayesian generalisation error bounds for Gaussian process classification. J Mach Learn Res 3:233–269
Article MathSciNet MATH Google Scholar
Seeger M (2003) Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis, University of Edinburgh
Google Scholar
Audibert JY, Bousquet O (2003) PAC-Bayesian generic chaining. In: Neural information processing systems
Google Scholar
Seldin Y, Tishby N (2009) PAC-Bayesian generalization bound for density estimation with application to co-clustering. In: International conference on artificial intelligence and statistics
Google Scholar
Ralaivola L, Szafranski M, Stempfel G (2010) Chromatic PAC-Bayes bounds for non-iid data: applications to ranking and stationary \(\beta \)-mixing processes. J Mach Learn Res 11:1927–1956
MathSciNet MATH Google Scholar
Seldin Y, Tishby N (2010) PAC-Bayesian analysis of co-clustering and beyond. J Mach Learn Res 11:3595–3646
MathSciNet MATH Google Scholar
Audibert JY (2010) PAC-Bayesian aggregation and multi-armed bandits. arXiv preprint arXiv:1011.3396
Roy JF, Marchand M, Laviolette F (2011) From PAC-Bayes bounds to quadratic programs for majority votes. In: International conference on machine learning
Google Scholar
Seldin Y, Auer P, Shawe-Taylor JS, Ortner R, Laviolette F (2011) PAC-Bayesian analysis of contextual bandits. In: Neural information processing systems
Google Scholar
Germain P, Lacoste A, Marchand M, Shanian S, Laviolette F (2011) A PAC-Bayes sample-compression approach to kernel methods. In: International conference on machine learning
Google Scholar
Seldin Y, Laviolette F, Cesa-Bianchi N, Shawe-Taylor J, Auer P (2012) PAC-Bayesian inequalities for martingales. IEEE Trans Inf Theory 58(12):7086–7093
Article MathSciNet MATH Google Scholar
Morvant E (2013) Apprentissage de vote de majorité pour la classification supervisée et l’adaptation de domaine: approches PAC-Bayésiennes et combinaison de similarités. Aix-Marseille Université
Google Scholar
Bégin L, Germain P, Laviolette F, Roy JF (2014) PAC-Bayesian theory for transductive learning. In: International conference on artificial intelligence and statistics
Google Scholar
Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306
MathSciNet MATH Google Scholar
Oneto L, Anguita D, Ridella S (2016) PAC-Bayesian analysis of distribution dependent priors: tighter risk bounds and stability analysis. Pattern Recogn Lett 80:200–207
Article Google Scholar
Ambroladze A, Parrado-Hernández E, Shawe-Taylor J (2006) Tighter PAC-Bayes bounds. In: Advances in neural information processing systems
Google Scholar
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
MathSciNet MATH Google Scholar
Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media
Google Scholar
Maurer A (2004) A note on the PAC Bayesian theorem. arXiv preprint cs/0411099
Google Scholar
Bégin L, Germain P, Laviolette F, Roy JF (2016) PAC-Bayesian bounds based on the rényi divergence. In: International conference on artificial intelligence and statistics
Google Scholar
Younsi M (2012) Proof of a combinatorial conjecture coming from the PAC-Bayesian machine learning theory. arXiv preprint arXiv:1209.0824
Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 404–413
Google Scholar
Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett 36(3):275–283
Article Google Scholar
Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
MathSciNet MATH Google Scholar
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076
Article MATH Google Scholar
Oneto L, Ridella S, Anguita D (2017) Differential privacy and generalization: sharper bounds with applications. Pattern Recogn Lett 89:31–38
Article MATH Google Scholar
Oneto L, Ridella S, Anguita D (2017) Generalization performances of randomized classifiers and algorithms built on data dependent distributions. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Google Scholar
Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber
Google Scholar

Download references

Author information

Authors and Affiliations

DIBRIS, Università degli Studi di Genova, Genoa, Italy
Luca Oneto

Authors

Luca Oneto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Oneto .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Oneto, L. (2020). PAC-Bayes Theory. In: Model Selection and Error Estimation in a Nutshell. Modeling and Optimization in Science and Technologies, vol 15. Springer, Cham. https://doi.org/10.1007/978-3-030-24359-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-24359-3_8
Published: 18 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24358-6
Online ISBN: 978-3-030-24359-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics