Stochastic Complexity and the Maximum Entropy Principle

  • Jorma Rissanen
Part of the Fundamental Theories of Physics book series (FTPH, volume 31-32)


This is an outline of a modeling principle based upon the search for the shortest code length of the data, defined to be the stochastic complexity. This principle is generally applicable to statistical problems, and when restricted to the special exponential family, arising in the maximum entropy formalism with a set of moment constraints, it provides a generalization which permits the set of the constraints or their number to be optimized as well.


Prediction Error Maximum Entropy Nuisance Parameter Code Length Minimum Description Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. van Campenhout, J.M. and Cover, T.M. (1981), ‘Maximum Entropy and Conditional Probability’, IEEE Trans. Inf. Thy, IT-27, Nr. 4, 483–489.CrossRefGoogle Scholar
  2. Chaitin, G.J. (1975), ‘A Theory of Program Size Formally Identical to to Information Theory’, J. ACM, 22, 329–340.MathSciNetzbMATHCrossRefGoogle Scholar
  3. Chaitin, G.J. (1987), Algorithmic Information Theory, Cambridge University Press, Cambridge, 175 pages.CrossRefGoogle Scholar
  4. Dawid, A.P. (1984), ‘Present Position and Potential Developments: Some Personal Views, Statistical Theory, The Prequential Approach’, J. Royal Stat. Soc. Series A, 147, Part 2,278–292 (with discussions).MathSciNetzbMATHCrossRefGoogle Scholar
  5. Hoel, A.E. and Kennard, R.W. (1970), “Ridge regression: Biased estimation for nonorthogonal problems’, Technometrics, 12, 55–68.Google Scholar
  6. James, W. and Stein, C. (1961), ‘Estimation with quadratic loss’, Proc. 4th Berkeley Symp. 1, 363–379.Google Scholar
  7. Jaynes, E. T. (1982), Papers on Probability, Statistics, and Statistical Physics, a reprint collection, D. Reidel, Dordrecht-Holland.Google Scholar
  8. Rao, C.R. (1975), ‘Simultaneous Estimation of Parameters in Different Linear Models and Applications to Biometric Problems’, Biometrics, 31, 545–554.MathSciNetzbMATHCrossRefGoogle Scholar
  9. Rao, C.R. (1981), ‘Prediction of future observations in polynomial growth curve models’, Proc. Indian Stat. Inst. Golden Jubilee Int. Con! on Statistics: Applications and New Directions., 512–520.Google Scholar
  10. Kolmogorov, A.N. (1965), ‘Three Approaches to the Quantitative Definition of Information’, Problems of Information Transmission 1, 4–7.MathSciNetGoogle Scholar
  11. Rissanen, J. (1978), ‘Modeling by shortest data description’, Automatica, 14, pp. 465–471.zbMATHCrossRefGoogle Scholar
  12. Rissanen, J. (1983), ‘A Universal Prior for Integers and Estimation by Minimum Description Length’, Annals of Statistics, 11, No.2, 416–431.MathSciNetzbMATHCrossRefGoogle Scholar
  13. Rissanen, J. (1984), ‘Universal Coding, Information, Prediction, and Estimation’, IEEE Trans. Inf. Thy, IT-30, Nr. 4, 629–636.MathSciNetCrossRefGoogle Scholar
  14. Rissanen, J. (1986a), ‘Stochastic Complexity and Modeling’, Annals of Statistics, 14, No 3, 1080–1100.MathSciNetzbMATHCrossRefGoogle Scholar
  15. Rissanen, J. (1986b), ‘A Predictive Least Squares Principle’, IMA J. of Math. Control and Information, 3, Nos 2–3, 211–222.MathSciNetzbMATHCrossRefGoogle Scholar
  16. Rissanen, J. (1987a), ‘Stochastic Complexity’, Journal of the Royal Statistical Society, Series B, 49, No.3, 223–265 (with discussions).MathSciNetzbMATHGoogle Scholar
  17. Rissanen, J. (1987b), ‘Stochastic Complexity and the MDL Principle’, Econometric Reviews, 6, nr 1, 85–102.MathSciNetzbMATHCrossRefGoogle Scholar
  18. Rissanen, J. (1987c), ‘Complexity and Information in Contingency Tables’, Proceedings of The Second International Tampere Conference in Statistics, June 1–4, Tampere, Finland.Google Scholar
  19. Schwarz, G. (1978), ‘Estimating the Dimension of a Model’, Annals of Statistics, 6, 461–464.MathSciNetzbMATHCrossRefGoogle Scholar
  20. Solomonoff, R.J. (1964), ‘A Formal Theory of Inductive Inference’. Part I, Information and Control 7, 1–22; Part II, Information and Control 7, 224–254.MathSciNetzbMATHCrossRefGoogle Scholar
  21. Wallace, C.S. and Boulton, D.M. (1968), ‘An Information Measure for Classification’, The Computer Journal, 11, No.2, 185–194.zbMATHGoogle Scholar
  22. Wallace, C.S. and Freeman, P.R. (1987), ‘Estimation and Inference by Compact Coding’, Journal of the Royal Statistical Society, Series B, 49, No.3, 239–265 (with discussions).MathSciNetGoogle Scholar

Copyright information

© Kluwer Academic Publishers 1988

Authors and Affiliations

  • Jorma Rissanen
    • 1
  1. 1.IBM Almaden Research CtrSan JoseUSA

Personalised recommendations