Laplace’s Rule of Succession in Information Geometry

  • Yann OllivierEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9389)

When observing data \(x_1,\ldots ,x_t\)


Vector Field Exponential Family Fisher Information Matrix Natural Parametrization Statistical Manifold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



I would like to thank Peter Grünwald and the referees for valuable comments and suggestions on this text.


  1. Ama98.
    Amari, S.-I.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)CrossRefGoogle Scholar
  2. AN00.
    Amari, S.-I., Nagaoka, H.: Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191. American Mathematical Society, Providence (2000). Translated from the 1993 Japanese original by Daishi HaradazbMATHGoogle Scholar
  3. BGH+13.
    Bartlett, P., Grünwald, P., Harremoës, P., Hedayati, F., Kotlowski, W.: Horizon-independent optimal prediction with log-loss in exponential families. In: Conference on Learning Theory (COLT), pp. 639–661 (2013)Google Scholar
  4. CBL06.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)CrossRefzbMATHGoogle Scholar
  5. GK10.
    Grünwald, P., Kotłowski, W.: Prequential plug-in codes that achieve optimal redundancy rates even if the model is wrong. In: 2010 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 1383–1387, IEEE (2010)Google Scholar
  6. Grü07.
    Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)Google Scholar
  7. HB12.
    Hedayati, F., Bartlett, P.: The optimality of Jeffreys prior for online density estimation and the asymptotic normality of maximum likelihood estimators. In: Conference on Learning Theory (COLT) (2012)Google Scholar
  8. KGDR10.
    Kotłowski, W., Grünwald, P., De Rooij, S.: Following the flattened leader. In: Conference on Learning Theory (COLT), pp. 106–118, Citeseer (2010)Google Scholar
  9. KT81.
    Krichevsky, R., Trofimov, V.: The performance of universal encoding. IEEE Trans. Inf. Theor. 27(2), 199–207 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
  10. RR08.
    Roos, T., Rissanen, J.: On sequentially normalized maximum likelihood models. In: Proceeedings of 1st Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-2008) (2008)Google Scholar
  11. RSKM08.
    Roos, T., Silander, T., Kontkanen, P., Myllymäki, P.: Bayesian network structure learning using factorized NML universal models. In: Information Theory and Applications Workshop 2008, pp. 272–276, IEEE (2008)Google Scholar
  12. TK86.
    Tierney, L., Kadane, J.B.: Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 81(393), 82–86 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  13. TW00.
    Takimoto, E., Warmuth, M.K.: The last-step minimax algorithm. In: Arimura, H., Sharma, A.K., Jain, S. (eds.) ALT 2000. LNCS (LNAI), vol. 1968, pp. 279–290. Springer, Heidelberg (2000) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.CNRS & LRIUniversité Paris-Saclay & INRIA-TAOParisFrance

Personalised recommendations