Skip to main content

The Safe Bayesian

Learning the Learning Rate via the Mixability Gap

  • Conference paper
Book cover Algorithmic Learning Theory (ALT 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7568))

Included in the following conference series:

Abstract

Standard Bayesian inference can behave suboptimally if the model is wrong. We present a modification of Bayesian inference which continues to achieve good rates with wrong models. Our method adapts the Bayesian learning rate to the data, picking the rate minimizing the cumulative loss of sequential prediction by posterior randomization. Our results can also be used to adapt the learning rate in a PAC-Bayesian context. The results are based on an extension of an inequality due to T. Zhang and others to dependent random variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Audibert, J.Y.: PAC-Bayesian statistical learning theory. PhD thesis, Université Paris VI (2004)

    Google Scholar 

  • Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37(4), 1034–1054 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Catoni, O.: PAC-Bayesian Supervised Classification. Lecture Notes IMS (2007)

    Google Scholar 

  • Chaudhuri, K., Freund, Y., Hsu, D.: A parameter-free hedging algorithm. In: NIPS 2009, pp. 297–305 (2009)

    Google Scholar 

  • Dawid, A.P.: Present position and potential developments: Some personal views, statistical theory, the prequential approach. J. R. Stat. Soc. Ser. A-G 147(2), 278–292 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Doob, J.L.: Application of the theory of martingales. In: Le Calcul de Probabilités et ses Applications. Colloques Internationaux du CNRS, pp. 23–27 (1949)

    Google Scholar 

  • Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Grünwald, P.: The MDL Principle. MIT Press, Cambridge (2007)

    Google Scholar 

  • Grünwald, P.: Safe learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity. In: Proc. COLT 2011, pp. 551–573 (2011)

    Google Scholar 

  • Grünwald, P., Langford, J.: Suboptimal behavior of Bayes and MDL in classification under misspecification. Machine Learning 66(2-3), 119–149 (2007)

    Article  Google Scholar 

  • Kleijn, B., van der Vaart, A.: Misspecification in infinite-dimensional Bayesian statistics. Ann. Stat. 34(2) (2006)

    Google Scholar 

  • Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 76–86 (1951)

    Article  MathSciNet  Google Scholar 

  • Li, J.Q.: Estimation of Mixture Models. PhD thesis, Yale, New Haven, CT (1999)

    Google Scholar 

  • McAllester, D.: PAC-Bayesian stochastic model selection. Mach. Learn. 51(1), 5–21 (2003)

    Article  MATH  Google Scholar 

  • Jordan, M.I., Bartlett, P.L., McAuliffe, J.D.: Convexity, classification and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian process classification. J. Mach. Learn. Res. 3, 233–269 (2002)

    Article  MathSciNet  Google Scholar 

  • Shalizi, C.: Dynamics of Bayesian updating with dependent data and misspecified models. Electronic Journal of Statistics 3, 1039–1074 (2009)

    Article  MathSciNet  Google Scholar 

  • Takeuchi, J., Barron, A.R.: Robustly minimax codes for universal data compression. In: Proc. ISITA 1998, Japan (1998)

    Google Scholar 

  • van der Vaart, A.: Asymptotic Statistics. Cambridge University Press (1998)

    Google Scholar 

  • Vovk, V.: Competitive on-line statistics. Intern. Stat. Rev. 69, 213–248 (2001)

    MATH  Google Scholar 

  • Vovk, V.: Aggregating strategies. In: Proc. COLT 1990, pp. 371–383 (1990)

    Google Scholar 

  • Zhang, T.: From ε-entropy to KL entropy: analysis of minimum information complexity density estimation. Ann. Stat. 34(5), 2180–2210 (2006a)

    Article  MATH  Google Scholar 

  • Zhang, T.: Information theoretical upper and lower bounds for statistical estimation. IEEE T. Inform. Theory 52(4), 1307–1321 (2006b)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grünwald, P. (2012). The Safe Bayesian. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2012. Lecture Notes in Computer Science(), vol 7568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34106-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34106-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34105-2

  • Online ISBN: 978-3-642-34106-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics