Abstract
It is well known that combining the output of several rules results in much better performance than using any one of them alone. In fact many state-of-the-art algorithms search for a weighted combination of simpler rules [1]: Bagging [2, 3], Boosting [4, 5] and Bayesian approaches [6] or even Kernel methods [7] and Neural Networks [8].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In the following we will sometimes indicate \(\mathtt {KL} = \mathtt {KL}[\mathsf {Q}||\mathsf {P}]\) for brevity.
References
Germain P, Lacasse A, Laviolette F, Marchand M, Roy JF (2015) Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. J Mach Learn Res 16(4):787–860
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336
Gelman A, Carlin JB, Stern HS, Rubin DB (2014) Bayesian data analysis, vol 2. Taylor & Francis
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press
Nitzan S, Paroush J (1982) Optimal decision rules in uncertain dichotomous choice situations. Int Econ Rev 23(2):289–97
Catoni O (2007) PAC-Bayesian supervised classification. Institute of Mathematical Statistics
Lever G, Laviolette F, Shawe-Taylor J (2010) Distribution-dependent PAC-Bayes priors. In Algorithmic learning theory
Parrado-Hernández E, Ambroladze A, Shawe-Taylor J, Sun S (2012) PAC-Bayes bounds with data dependent priors. J Mach Learn Res 13(1):3507–3531
Lever G, Laviolette F, Shawe-Taylor J (2013) Tighter PAC-Bayes bounds through distribution-dependent priors. Theor Comput Sci 473:4–28
Berend D, Kontorovitch A (2014) Consistency of weighted majority votes. In: Neural Inf Process Syst
Donsker MD, Varadhan SRS (1975) Asymptotic evaluation of certain markov process expectations for large time, i. Commun Pure Appl Math 28(1):1–47
Shawe-Taylor J, Williamson RC (1997) A PAC analysis of a Bayesian estimator. In: Computational learning theory
McAllester DA (1998) Some PAC-Bayesian theorems. In: Computational learning theory
McAllester DA (2003) PAC-Bayesian stochastic model selection. Mach Learn 51(1):5–21
Langford J, Seeger M (2001) Bounds for averaging classifiers. Technical report, Carnegie Mellon, Department of Computer Science
McAllester DA (2003) Simplified PAC-Bayesian margin bounds. In: Learning theory and kernel machines
Laviolette F, Marchand M (2005) PAC-Bayes risk bounds for sample-compressed gibbs classifiers. In: International conference on machine learning
Lacasse A, Laviolette F, Marchand M, Germain P, Usunier N (2006) PAC-Bayes bounds for the risk of the majority vote and the variance of the gibbs classifier. In: Neural information processing systems
Laviolette F, Marchand M (2007) PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. J Mach Learn Res 8(7):1461–1487
Germain P, Lacasse A, Laviolette F, Marchand M (2009) PAC-Bayesian learning of linear classifiers. In: International conference on machine learning
Tolstikhin IO, Seldin Y (2013) PAC-Bayes-empirical-bernstein inequality. In: Neural information processing systems
Van Erven T (2014) PAC-Bayes mini-tutorial: a continuous union bound. arXiv preprint arXiv:1405.1580
London B, Huang B, Taskar B, Getoor L, Cruz S (2014) PAC-Bayesian collective stability. In: Artificial intelligence and statistics
Shawe-Taylor J, Langford J (2002) PAC-Bayes & margins. In: Neural information processing systems
Seeger M (2002) PAC-Bayesian generalisation error bounds for Gaussian process classification. J Mach Learn Res 3:233–269
Seeger M (2003) Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis, University of Edinburgh
Audibert JY, Bousquet O (2003) PAC-Bayesian generic chaining. In: Neural information processing systems
Seldin Y, Tishby N (2009) PAC-Bayesian generalization bound for density estimation with application to co-clustering. In: International conference on artificial intelligence and statistics
Ralaivola L, Szafranski M, Stempfel G (2010) Chromatic PAC-Bayes bounds for non-iid data: applications to ranking and stationary \(\beta \)-mixing processes. J Mach Learn Res 11:1927–1956
Seldin Y, Tishby N (2010) PAC-Bayesian analysis of co-clustering and beyond. J Mach Learn Res 11:3595–3646
Audibert JY (2010) PAC-Bayesian aggregation and multi-armed bandits. arXiv preprint arXiv:1011.3396
Roy JF, Marchand M, Laviolette F (2011) From PAC-Bayes bounds to quadratic programs for majority votes. In: International conference on machine learning
Seldin Y, Auer P, Shawe-Taylor JS, Ortner R, Laviolette F (2011) PAC-Bayesian analysis of contextual bandits. In: Neural information processing systems
Germain P, Lacoste A, Marchand M, Shanian S, Laviolette F (2011) A PAC-Bayes sample-compression approach to kernel methods. In: International conference on machine learning
Seldin Y, Laviolette F, Cesa-Bianchi N, Shawe-Taylor J, Auer P (2012) PAC-Bayesian inequalities for martingales. IEEE Trans Inf Theory 58(12):7086–7093
Morvant E (2013) Apprentissage de vote de majorité pour la classification supervisée et l’adaptation de domaine: approches PAC-Bayésiennes et combinaison de similarités. Aix-Marseille Université
Bégin L, Germain P, Laviolette F, Roy JF (2014) PAC-Bayesian theory for transductive learning. In: International conference on artificial intelligence and statistics
Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306
Oneto L, Anguita D, Ridella S (2016) PAC-Bayesian analysis of distribution dependent priors: tighter risk bounds and stability analysis. Pattern Recogn Lett 80:200–207
Ambroladze A, Parrado-Hernández E, Shawe-Taylor J (2006) Tighter PAC-Bayes bounds. In: Advances in neural information processing systems
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media
Maurer A (2004) A note on the PAC Bayesian theorem. arXiv preprint cs/0411099
Bégin L, Germain P, Laviolette F, Roy JF (2016) PAC-Bayesian bounds based on the rényi divergence. In: International conference on artificial intelligence and statistics
Younsi M (2012) Proof of a combinatorial conjecture coming from the PAC-Bayesian machine learning theory. arXiv preprint arXiv:1209.0824
Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 404–413
Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett 36(3):275–283
Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076
Oneto L, Ridella S, Anguita D (2017) Differential privacy and generalization: sharper bounds with applications. Pattern Recogn Lett 89:31–38
Oneto L, Ridella S, Anguita D (2017) Generalization performances of randomized classifiers and algorithms built on data dependent distributions. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Oneto, L. (2020). PAC-Bayes Theory. In: Model Selection and Error Estimation in a Nutshell. Modeling and Optimization in Science and Technologies, vol 15. Springer, Cham. https://doi.org/10.1007/978-3-030-24359-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-24359-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24358-6
Online ISBN: 978-3-030-24359-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)