Cramér–Rao lower bounds arising from generalized Csiszár divergences

Abstract

We study the geometry of probability distributions with respect to a generalized family of Csiszár f-divergences. A member of this family is the relative \(\alpha \)-entropy which is also a Rényi analog of relative entropy in information theory and known as logarithmic or projective power divergence in statistics. We apply Eguchi’s theory to derive the Fisher information metric and the dual affine connections arising from these generalized divergence functions. This enables us to arrive at a more widely applicable version of the Cramér–Rao inequality, which provides a lower bound for the variance of an estimator for an escort of the underlying parametric probability distribution. We then extend the Amari–Nagaoka’s dually flat structure of the exponential and mixer models to other distributions with respect to the aforementioned generalized metric. We show that these formulations lead us to find unbiased and efficient estimators for the escort model. Finally, we compare our work with prior results on generalized Cramér–Rao inequalities that were derived from non-information-geometric frameworks.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    A divergence function is a non-negative function D on \(S\times S\) satisfying \(D(p,q) \ge 0\) with equality iff \(p=q\).

References

  1. 1.

    Amari, S.: Information geometry and its applications. Springer, New York (2016)

    Google Scholar 

  2. 2.

    Amari, S., Cichocki, A.: Information geometry of divergence functions. Bull. Polish Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)

    Google Scholar 

  3. 3.

    Amari, S., Nagaoka, H.: Methods of information geometry. Oxford University Press, Oxford (2000)

    Google Scholar 

  4. 4.

    Arıkan, E.: An inequality on guessing and its application to sequential decoding. IEEE Trans Inf Theory 42(1), 99–105 (1996)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry. Springer, New York (2017)

    Google Scholar 

  6. 6.

    Basu, A., Shioya, H., Park, C.: Statistical inference: The minimum distance approach. In: Monographs on Statistics and Applied Probability. Chapman & Hall/CRC Press, London (2011)

  7. 7.

    Bercher, J.F.: On a (\(\beta \), q)-generalized fisher information and inequalities involving q- gaussian distributions. J. Math. Phys. 53(063303), 1–12 (2012)

    MathSciNet  Google Scholar 

  8. 8.

    Bercher, J.F.: On generalized Cramér-Rao inequalities, generalized Fisher information and characterizations of generalized q-Gaussian distributions. J. Phys. A Math. Theor. 45(25), 255303 (2012)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Blumer, A.C., McEliece, R.J.: The Rényi redundancy of generalized Huffman codes. IEEE Trans. Inf. Theory 34(5), 1242–1249 (1988)

    Article  Google Scholar 

  10. 10.

    Bunte, C., Lapidoth, A.: Codes for tasks and Rényi entropy. IEEE Trans. Inf. Theory 60(9), 5065–5076 (2014)

    Article  Google Scholar 

  11. 11.

    Campbell, L.L.: A coding theorem and Rényi’s entropy. Inf. Control 8, 423–429 (1965)

    Article  Google Scholar 

  12. 12.

    Cichocki, A., Amari, S.: Families of alpha- beta- and gamma- divergences: Flexible and robust measures of similarities. Entropy 12, 1532–1568 (2010)

    MathSciNet  Article  Google Scholar 

  13. 13.

    Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, Hoboken (2012)

    Google Scholar 

  14. 14.

    Csiszár, I.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Eguchi, S.: Geometry of minimum contrast. Hiroshima Math. J. 22(3), 631–647 (1992)

    MathSciNet  Article  Google Scholar 

  16. 16.

    Eguchi, S., Kato, S.: Entropy and divergence associated with power function and the statistical application. Entropy 12(2), 262–274 (2010)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Eguchi, S., Komori, O., Kato, S.: Projective power entropy and maximum Tsallis entropy distributions. Entropy 13(10), 1746–1764 (2011)

    MathSciNet  Article  Google Scholar 

  18. 18.

    van Erven, T., Harremoës, P.: Rényi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)

    Article  Google Scholar 

  19. 19.

    Fujisawa, H., Eguchi, S.: Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 99, 2053–2081 (2008)

    MathSciNet  Article  Google Scholar 

  20. 20.

    Furuichi, S.: On the maximum entropy principle and the minimization of the Fisher information in Tsallis statistics. J. Math. Phys. 50(013303), 1–12 (2009)

    MathSciNet  MATH  Google Scholar 

  21. 21.

    Huleihel, W., Salamatian, S., Médard, M.: Guessing with limited memory. In: IEEE International Symposium on Information Theory, pp. 2253–2257 (2017)

  22. 22.

    Jones, M.C., Hjort, N.L., Harris, I.R., Basu, A.: A comparison of related density based minimum divergence estimators. Biometrika 88(3), 865–873 (2001)

    MathSciNet  Article  Google Scholar 

  23. 23.

    Karthik, P.N., Sundaresan, R.: On the equivalence of projections in relative \(\alpha \)-entropy and Rényi divergence. In: National Conference on Communication, pp. 1–6 (2018)

  24. 24.

    Kumar, M.A., Mishra, K.V.: Information geometric approach to Bayesian lower error bounds. In: IEEE International Symposium on Information Theory, pp. 746–750 (2018)

  25. 25.

    Kumar, M.A., Sason, I.: Projection theorems for the Rényi divergence on alpha-convex sets. IEEE Trans. Inf. Theory 62(9), 4924–4935 (2016)

    Article  Google Scholar 

  26. 26.

    Kumar, M.A., Sundaresan, R.: Minimization problems based on relative \(\alpha \)-entropy I: Forward projection. IEEE Trans. Inf. Theory 61(9), 5063–5080 (2015)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Kumar, M.A., Sundaresan, R.: Minimization problems based on relative \(\alpha \)-entropy II: Reverse projection. IEEE Trans. Inf.Theory 61(9), 5081–5095 (2015)

    MathSciNet  Article  Google Scholar 

  28. 28.

    Lutwak, E., Yang, D., Lv, S., Zhang, G.: Extensions of fisher information and stam’s inequality. IEEE Trans. Inf. Theory 58(3), 1319–1327 (2012)

    MathSciNet  Article  Google Scholar 

  29. 29.

    Lutwak, E., Yang, D., Zhang, G.: Cramér-Rao and moment-entropy inequalities for Rényi entropy and generalized Fisher information. IEEE Trans. Inf. Theory 51(1), 473–478 (2005)

    Article  Google Scholar 

  30. 30.

    Mishra, K.V., Kumar, M.A.: Generalized Bayesian Cramér-Rao inequality via information geometry of relative \(\alpha \)-entropy. In: IEEE Annual Conference on Information Science and Systems, pp. 1–6 (2020)

  31. 31.

    Naudts, J.: Estimators, escort probabilities, and \(\phi \)-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 5(4), 1–15 (2004)

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Naudts, J.: Generalised thermostatistics. Springer, New York (2011)

    Google Scholar 

  33. 33.

    Notsu, A., Komori, O., Eguchi, S.: Spontaneous clustering via minimum gamma-divergence. Neural Comput. 26(2), 421–448 (2014)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Rényi, A., et al.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, p. 547–561 (1961)

  35. 35.

    Sundaresan, R.: Guessing under source uncertainty. IEEE Trans. Inf. Theory 53(1), 269–287 (2007)

    MathSciNet  Article  Google Scholar 

  36. 36.

    Tsallis, C., Mendes, R.S., Plastino, A.R.: The role of constraints within generalized nonextensive statistics. Phys. A 261, 534–554 (1998)

    Article  Google Scholar 

  37. 37.

    Zhang, J.: Divergence function, duality, and convex analysis. Neural Comput. 16, 159–195 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

The authors are indebted to Prof. Rajesh Sundaresan of the Indian Institute of Science, Bengaluru for his helpful suggestions and discussions that improved the presentation of this material substantially. We sincerely thank the anonymous reviewers for their constructive suggestions that significantly improved the presentation of the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. Ashok Kumar.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Proof of Theorem 3

A Proof of Theorem 3

Taking log on both sides of (48),

$$\begin{aligned} \log p_{\theta }^{(\alpha )}(x)&= -\log M(\theta ) + {\frac{1}{1-\frac{1}{\alpha }}}\log \Big [ {q^{(\alpha )}(x)}^{1-\frac{1}{\alpha }} + \sum \limits _{i=1}^k \theta _i h_i(x) \Big ]. \end{aligned}$$
(62)

Partial derivative produces

$$\begin{aligned} \partial _i \log p_{\theta }^{(\alpha )}(x)&= -\partial _i\log M(\theta ) + {\frac{1}{1-\frac{1}{\alpha }}}\frac{h_i(x)}{ \Big [ {q^{(\alpha )}(x)}^{1-\frac{1}{\alpha }} +\sum \limits _{i=1}^k \theta _i h_i(x) \Big ]}, \end{aligned}$$
(63)

or

$$\begin{aligned} \partial _i \log p_{\theta }^{(\alpha )}(x)&= -\partial _i\log M(\theta ) + \frac{\alpha }{\alpha -1} \frac{f_i(x)}{ \Big [q(x)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(x) \Big ]}. \end{aligned}$$
(64)

Taking expectation on both sides of (64), we obtain

$$\begin{aligned} E_{\theta ^{(\alpha )}} \left[ \partial _i \log p_{\theta }^{(\alpha )}(x)\right]&= -\partial _i\log M(\theta ) + \frac{\alpha }{\alpha -1} E_{\theta ^{(\alpha )}} \Bigg [\frac{f_i(X)}{ q(X)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(X)}\Bigg ]. \end{aligned}$$
(65)

Since the expected value of the score function vanishes (left hand side of (65)), we have

$$\begin{aligned} \partial _i\log M(\theta ) = \frac{\alpha }{\alpha -1} E_{\theta ^{(\alpha )}} \Bigg [\frac{f_i(X)}{ q(X)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(X)}\Bigg ]. \end{aligned}$$
(66)

Substituting (66) into (64), we get

$$\begin{aligned} \partial _i \log p_{\theta }^{(\alpha )}(x)= & {} \frac{\alpha }{\alpha -1} \frac{f_i(x)}{ \Big [ q(x)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(x) \Big ]} \nonumber \\&- \frac{\alpha }{\alpha -1} E_{\theta ^{(\alpha )}} \Bigg [\frac{f_i(X)}{ q(X)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(X) }\Bigg ] \nonumber \\=: & {} \widehat{\eta }_i(x) - \eta _i, \end{aligned}$$
(67)

where

$$\begin{aligned} \widehat{\eta }_i(x) := \frac{\alpha }{\alpha -1}\frac{ f_i(x)}{ \Big [q(x)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(x) \Big ]} \text { and } \eta _i := E_{\theta ^{(\alpha )}}[\widehat{\eta }_i(X)] \end{aligned}$$

Moreover, (66) implies that \(\log M(\theta )\) should be the potential (if exists).

The Riemmanian metric becomes

$$\begin{aligned} g_{ij}^{(\alpha )}(\theta ) = \frac{1}{\alpha ^2}E_{\theta ^{(\alpha )}}[(\widehat{\eta }_i(X) - \eta _i)(\widehat{\eta }_j(X) - \eta _j)]. \end{aligned}$$
(68)

This further strengthens our expectation that the \(\eta _i\)’s are dual parameters to \(\theta _i\)’s. However, it is surprising that it is not so as we shall see now. We have

$$\begin{aligned} \eta _j= & {} \frac{\alpha }{\alpha -1}E_{\theta ^{(\alpha )}}\Bigg [ \frac{f_j(X)}{ q(X)^{\alpha -1} + \sum \limits _{j=1}^k\theta _j f_j(X) }\Bigg ]\nonumber \\= & {} \frac{\alpha }{\alpha -1}\sum \limits _x \Bigg [\frac{f_j(x)}{ q(x)^{\alpha -1} + \sum \limits _{j=1}^k\theta _j f_j(x)}\Bigg ] p_{\theta }^{(\alpha )}(x). \end{aligned}$$
(69)

Let \(R_{\theta }(x) = q(x)^{\alpha -1} + \sum \limits _{j=1}^k\theta _j f_j(x)\). Partial differentiation produces

$$\begin{aligned} \frac{\partial \eta _j}{\partial \theta _i}= & {} \frac{\alpha }{\alpha -1}\frac{\partial }{\partial \theta _i}\left( \sum _x \frac{f_j(x)p_{\theta }^{(\alpha )}(x)}{R_{\theta }(x)}\right) \nonumber \\= & {} \frac{\alpha }{\alpha -1}\sum _x \frac{R_{\theta }(x)f_j(x) \partial _i p_{\theta }^{(\alpha )}(x) - f_j(x) p_{\theta }^{(\alpha )}(x) \partial _i(R_{\theta }(x))}{(R_{\theta }(x))^2}\nonumber \\= & {} \frac{\alpha }{\alpha -1}\sum _x \left[ \frac{f_j(x)}{R_{\theta }(x)}\partial _i p_{\theta }^{(\alpha )}(x) - p_{\theta }^{(\alpha )}(x)\frac{f_j(x)}{R_{\theta }(x)} \frac{f_i(x)}{R_{\theta }(x)} \right] . \end{aligned}$$
(70)

From (64) - (67), we have

$$\begin{aligned} \frac{\alpha }{\alpha -1}\frac{f_i(x)}{ R_{\theta }}&= \partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i. \end{aligned}$$
(71)

Substituting (71) into (70) gives

$$\begin{aligned} \frac{\partial \eta _j}{\partial \theta _i}&= \frac{\alpha }{\alpha -1}\sum _x \bigg [ \partial _i p_{\theta }^{(\alpha )}(x) \frac{\alpha -1}{\alpha } (\partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i) \nonumber \\&- p_{\theta }^{(\alpha )}(x) \left( \frac{\alpha -1}{\alpha }\right) ^2 (\partial _j \log p_{\theta }^{(\alpha )}(x) + \eta _j) (\partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i) \bigg ]\nonumber \\&= \sum _x \bigg [\partial _i p_{\theta }^{(\alpha )}(x) (\partial _i \log p_{\theta }^{(\alpha )}(x) - \eta _i)\nonumber \\&- p_{\theta }^{(\alpha )}(x) \left( \frac{\alpha -1}{\alpha }\right) (\partial _j \log p_{\theta }^{(\alpha )}(x) + \eta _j) (\partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i)\bigg ]\nonumber \\&= \sum _x \bigg [\partial _i p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i \partial _i p_{\theta }^{(\alpha )}(x)\nonumber \\&- \left( \frac{\alpha -1}{\alpha }\right) p_{\theta }^{(\alpha )}(x) \bigg (\partial _j \log p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i \partial _i \log p_{\theta }^{(\alpha )}(x) \nonumber \\&+\eta _i \partial _j \log p_{\theta }^{(\alpha )}(x) + \eta _i \eta _j \bigg )\bigg ]\nonumber \end{aligned}$$
$$\begin{aligned}&= \sum _x \bigg [\partial _i p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) - p_{\theta }^{(\alpha )}(x) \frac{1}{p_{\theta }^{(\alpha )}(x)} \partial _j p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) \nonumber \\&+\frac{1}{\alpha }\bigg (p_{\theta }^{(\alpha )}(x)\partial _j \log p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x)\bigg ) - \left( \frac{\alpha -1}{\alpha }\right) \eta _i \eta _j p_{\theta }^{(\alpha )}(x) \bigg ) \bigg ]\nonumber \\&= \sum _x \bigg [\frac{1}{\alpha }\bigg (p_{\theta }^{(\alpha )}(x)\partial _j \log p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x)\bigg ) - \left( \frac{\alpha -1}{\alpha }\right) \eta _i \eta _j p_{\theta }^{(\alpha )}(x) \bigg ) \bigg ]\nonumber \\&= \alpha g_{ij}^{(\alpha )}(\theta ) - \left( \frac{\alpha -1}{\alpha }\right) \eta _i \eta _j.\end{aligned}$$
(72)

This shows that \(\eta _i\) cannot be the dual parameters of \(\theta _i\) for the statistical model \(\mathbb {M}^{(\alpha )}\). This completes the proof.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ashok Kumar, M., Vijay Mishra, K. Cramér–Rao lower bounds arising from generalized Csiszár divergences. Info. Geo. 3, 33–59 (2020). https://doi.org/10.1007/s41884-020-00029-z

Download citation

Keywords

  • Cramér–Rao lower bound
  • Csiszár f-divergence
  • Fisher information metric
  • escort distribution
  • relative entropy