Abstract
This is a mathematically oriented survey about the method of maximum entropy or minimum I-divergence, with a critical treatment of its various justifications and relation to Bayesian statistics. Information theoretic ideas are given substantial attention, including “information geometry”. The axiomatic approach is considered as the best justification of maxent, as well as of alternate methods of minimizing some Bregman distance or f-divergence other than I-divergence. The possible interpretation of such alternate methods within the original maxent paradigm is also considered.
This work was supported by the Hungarian National Foundation for Scientific Research, Grant T016386.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Kuliback, Information Theory and Statistics, John Wiley and Sons, New York, 1959.
E.T. Jaynes (R.D. Rosenkrantz ed.), Papers on Probability, Statistics and Statistical Physics, Reidel, Dordrecht, 1983).
I. Csiszár and J. Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York, 1981.
J.M. Van Campenhout and T. Cover, “Maximum entropy and conditional probability,” IEEE Trans. Inform. Theory, 27, 483–489, 1981.
I. Csiszár, “Sanov property, generalized I-projection and a conditional limit theorem,” Ann. Probability, 12, 768–793, 1984.
I. Csiszár, “An extended maximum entropy principle and a Bayesian justification (with discussion),” Bayesian Statistics 2, J.M. Bernardo et al., pp. 83–89, North-Holland, Amsterdam, 1985.
J.M. Bernardo, “Reference posterior for Bayesian inference (with discussion),” J. Roy. Statist. Soc. B, 41, 113–147, 1979.
B. Clarke and A.R. Barron, “Jeffreys’ prior is asymptotically least favorable under entropy risk,” J. Statist. Planning and Inference, 41, pp. 37–60, 1994.
J.M. Bernardo and A.F.M. Smith, “Bayesian Theory,” John Wiley and Sons, New York, 1994.
I. Csiszár, “I-divergence geometry of probability distributions and minimization problems,” Ann. Probability, 3, pp. 146–158, 1975.
F. Topsoe, “Information theoretical optimization techniques”, Kybernetika, 15, pp. 7–17, 1979.
I. Csiszár and G. Tusnády, “Information geometry and alternating minimization procedures, ” Statist. Decisions, Suppl. 1, pp. 205–237, 1984.
I. Csiszár, “A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling,” Ann. Statist., 17, pp. 1409–1413, 1989.
L.D. Davisson and A. Leon-Garcia, “A source matching approach to finding minimax codes,” IEEE Trans. Inform. Theory, 26, pp. 166–174, 1980.
J. Burg, “Personal Communication,” 1995.
C. R. Rao and T. K. Nayak, “Cross entropy, dissimilarity measures, and characterization of quadratic entropy”, IEEE Trans. Inform. Theory, 31, pp. 589–593, 1985.
L. Jones and V. Trutzer, “Computationally feasible high-resolution minimum-distance procedures which extend the maximum-entropy method,” Inverse Problems, 5, pp. 749–766, 1989.
J. M. Borwein and A. S. Lewis, “Partially-finite programming in L 1 and the existence of maximum entropy estimates,” SIAM J. Optimization, 3, pp. 248–267, 1993.
L.M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,” USSR Comput. Math. and Math. Phys., 7, pp. 200–217, 1967.
I. Csiszár, “Generalized projections for non-negative functions”, Acta Math. Hungar., 68, pp. 161–185, 1995.
I. Csiszár, “Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizit von Markoffschen Ketten,” Publ. Math. Inst. Hungar. Acad. Sci, 8, pp. 85–108, 1963.
S.M. Ali and S.D. Silvey, “A general class of coefficients of divergence of one distribution from another,” J. Roy. Statist. Soc. Ser. B, 28, pp. 131–142, 1966.
J. E. Shore and R. W. Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Trans. Inform. Theory, 26, pp. 26–37, 1980.
I. Csiszár, “Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems,” Ann. Statist., 19, pp. 2032–2066, 1991.
A. Perez, “Barycenter” of a set of probability measures and its application in statistical decision, Compstat Lectures, pp. 154–159, Physica, Heidelberg, 1984.
J. Navaza, “The use of non-local constraints in maximum-entropy electron density reconstruction, ” Acta. Crystallographica, A42, pp. 212–223, 1986.
D. Dacunha-Castelle and F. Gamboa, “Maximum d’entropie et problème des moments,” Ann. Inst. H. Poincarè, 4, pp. 567-596, 1990.
F. Gamboaand G. Gassiat, “Bayesian methods and maximum entropy for ill posed inverse problems,” Ann. Statist., Submitted, 1994.
J.F. Bercher G. LeBesnerais and G. Demoment, “The Maximum Entropy on the Mean, Method, Noise and Sensitivity”, in Proc. 14th Int. Workshop Maximum Entropy and Bayesian Methods, S. Sibisi and J. Skilling, Kluwer Academic, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Csiszár, I. (1996). Maxent, Mathematics, and Information Theory. In: Hanson, K.M., Silver, R.N. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 79. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5430-7_5
Download citation
DOI: https://doi.org/10.1007/978-94-011-5430-7_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6284-8
Online ISBN: 978-94-011-5430-7
eBook Packages: Springer Book Archive