Abstract
Standard Bayesian inference can behave suboptimally if the model is wrong. We present a modification of Bayesian inference which continues to achieve good rates with wrong models. Our method adapts the Bayesian learning rate to the data, picking the rate minimizing the cumulative loss of sequential prediction by posterior randomization. Our results can also be used to adapt the learning rate in a PAC-Bayesian context. The results are based on an extension of an inequality due to T. Zhang and others to dependent random variables.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Audibert, J.Y.: PAC-Bayesian statistical learning theory. PhD thesis, Université Paris VI (2004)
Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37(4), 1034–1054 (1991)
Catoni, O.: PAC-Bayesian Supervised Classification. Lecture Notes IMS (2007)
Chaudhuri, K., Freund, Y., Hsu, D.: A parameter-free hedging algorithm. In: NIPS 2009, pp. 297–305 (2009)
Dawid, A.P.: Present position and potential developments: Some personal views, statistical theory, the prequential approach. J. R. Stat. Soc. Ser. A-G 147(2), 278–292 (1984)
Doob, J.L.: Application of the theory of martingales. In: Le Calcul de Probabilités et ses Applications. Colloques Internationaux du CNRS, pp. 23–27 (1949)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Grünwald, P.: The MDL Principle. MIT Press, Cambridge (2007)
Grünwald, P.: Safe learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity. In: Proc. COLT 2011, pp. 551–573 (2011)
Grünwald, P., Langford, J.: Suboptimal behavior of Bayes and MDL in classification under misspecification. Machine Learning 66(2-3), 119–149 (2007)
Kleijn, B., van der Vaart, A.: Misspecification in infinite-dimensional Bayesian statistics. Ann. Stat. 34(2) (2006)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 76–86 (1951)
Li, J.Q.: Estimation of Mixture Models. PhD thesis, Yale, New Haven, CT (1999)
McAllester, D.: PAC-Bayesian stochastic model selection. Mach. Learn. 51(1), 5–21 (2003)
Jordan, M.I., Bartlett, P.L., McAuliffe, J.D.: Convexity, classification and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian process classification. J. Mach. Learn. Res. 3, 233–269 (2002)
Shalizi, C.: Dynamics of Bayesian updating with dependent data and misspecified models. Electronic Journal of Statistics 3, 1039–1074 (2009)
Takeuchi, J., Barron, A.R.: Robustly minimax codes for universal data compression. In: Proc. ISITA 1998, Japan (1998)
van der Vaart, A.: Asymptotic Statistics. Cambridge University Press (1998)
Vovk, V.: Competitive on-line statistics. Intern. Stat. Rev. 69, 213–248 (2001)
Vovk, V.: Aggregating strategies. In: Proc. COLT 1990, pp. 371–383 (1990)
Zhang, T.: From ε-entropy to KL entropy: analysis of minimum information complexity density estimation. Ann. Stat. 34(5), 2180–2210 (2006a)
Zhang, T.: Information theoretical upper and lower bounds for statistical estimation. IEEE T. Inform. Theory 52(4), 1307–1321 (2006b)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grünwald, P. (2012). The Safe Bayesian. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2012. Lecture Notes in Computer Science(), vol 7568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34106-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-34106-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34105-2
Online ISBN: 978-3-642-34106-9
eBook Packages: Computer ScienceComputer Science (R0)