Improved Cramér–Rao Type Integral Inequalities or Bayesian Cramér–Rao Bounds

Research Article
  • 23 Downloads

Abstract

New lower bounds on the mean square error for estimators of random parameter are obtained as applications of improved Cauchy–Schwarz inequality due to Walker (Stat Probab Lett 122:86–90, 2017).

Keywords

Bayesian Cramér–Rao bound Cauchy-Schwarz inequality Cramér–Rao type integral inequality Walker’s inequality 

1 Introduction

Cramér–Rao lower bound for the variance of an unbiased estimator of a parameter is well known for its use in statistical literature. There has been a large amount of work to obtain Cramér–Rao type integral inequalities leading to lower bounds for the risks associated with Bayesian estimators. Earlier results in this diretion are due to Schutzenberger (1957) and Gart (1959). Other works in this direction in the statistical literature are due to Borovkov and Sakhanenko (1980), Targhetta (1984, 1988, 1990), Shemyakin (1987), Babrovsky et al. (1987), Brown and Gajek (1990), Prakasa Rao (1992), Ghosh (1993) and Gill and Levit (1995). In engineering literature, this problem is considered under the subject “random parameter estimation”. Significant results in this area in the engineering literature are due to van Trees (1968), Ziv and Zakai (1969), Chazan et al. (1975), Miller and Chang (1978), Weinstein and Weiss (1985), Weiss and Weinstein (1985), Brown and Liu (1993) among others. Prakasa Rao (1991) gives a comprehensive survey of results obtained in this area till about 1990. Related results on Cramér–Rao type integral inequalities were obtained in Prakasa Rao (1996, 2000, 2001). In a voluminous work, van Trees and Bell (2007) give a survey of Bayesian bounds for parameter estimation and nonlinear filtering/tracking and edited a volume containing selected papers dealing with Bayesian Cramér–Rao bounds, global Bayesian bounds, hybrid Bayesian bounds, constrained Cramér–Rao bounds and their applications to nonlinear dynamic systems.

It is well known that either the Cramér–Rao inequality giving a lower bound for the quadratic risk of an estimator or the Bayesian versions of the Cramér–Rao inequality obtained by several authors are all consequences or applications of the Cauchy–Schwarz inequality for suitable functions of the observations and the parameter. In a recent paper, Walker (2017) obtained an improved Cauchy–Schwarz inequality. Our aim in this short note is to obtain some Bayesian Cramér–Rao bounds as applications of the improved version of Cauchy–Schwarz inequality. Walker (2017) obtained a generalized Cramér–Rao inequality as an application of the improved Cauchy–Schwarz inequality.

2 Main Results

Walker (2017) obtained an improved version of the Cauchy–Schwarz inequality which implies the following probabilistic version.

Theorem 2.1

(Walker’s inequality) If X and Y are random variables defined on a probabilty space \((\Omega , \mathcal{F},P)\) with finite second moments, then
$$\begin{aligned} |E(XY)|^2\le & {} E(X^2)E(Y^2)-(|E(X)|\sqrt{\text{ Var }(Y)}-|E(Y)|\sqrt{\text{ Var }(X)})^2. \end{aligned}$$
(2.1)
As has been pointed out by Walker (2017), the inequality (2.1) is a strict improvement over the Cauchy–Schwarz inequality. This can be seen from the following example due to Walker (2017). Suppose Y is a random variable with mean zero and variance 1 and X is a random variable with mean \(\mu \) and finite variance \(\sigma ^2.\) Then the Cauchy–Schwarz inequality implies
$$\begin{aligned}{}[E(XY)]^2\le E(X^2) E(Y^2)= (\sigma ^2+\mu ^2) \end{aligned}$$
(2.2)
where as Theorem 2.1 implies that
$$\begin{aligned}{}[E(XY)]^2 \le \text{ Var }(X) E(Y^2)= \sigma ^2. \end{aligned}$$
(2.3)
It is obvious that the upper bound given by the inequality (2.3) is better than the upper bound given by the inequality (2.2).
Suppose a random variable Y has mean zero but positive variance and X is another random variable with finite variance. Then it follows that
$$\begin{aligned} |E(XY)|^2\le & {} E(X^2)E(Y^2)-\,|E(X)|^2 E(Y^2)\nonumber \\= & {} \text{ Var }(X) E(Y^2) \end{aligned}$$
(2.4)
by Theorem 2.1. Hence
$$\begin{aligned} E(X^2) \ge (E(X))^2+ \frac{|E(XY)|^2}{E(Y^2)} \end{aligned}$$
and we have the following corollary.

Corollary 2.1

If X and Y are random variables defined on a probability space \((\Omega , \mathcal{F},P)\) with finite second moments and if \(E(Y)=0,\) , then
$$\begin{aligned} E(X^2) \ge (E(X))^2+\frac{|E(XY)|^2}{E(Y^2)}. \end{aligned}$$
(2.5)
Note that a direct application of the Cauchy–schwarz inequality implies that
$$\begin{aligned} E(X^2) \ge \frac{|E(XY)|^2}{E(Y^2)} \end{aligned}$$
which gives a weaker lowerbound for \(E(X^2)\) when \(E(Y)=0\) and \(E(Y^2)>0.\) We now discuss some applications of the inequality derived in Corollary 2.1.
Let Z be a random variable defined on a probability space \((\Omega , \mathcal{F}, P_\theta )\) where \(\theta \in \Theta \subset R.\) Suppose that the parameter \(\theta \) has a prior density \(\lambda (\theta )\) with respect to the Lebesgue measure on R and that \(f(z,\theta )\) is the probability density function of the random variable Z given the parameter \(\theta .\) Then the joint density of the random vector \((Z,\theta )\) is \(g(z,\theta )= f(z,\theta ) \lambda (\theta ).\) Let us consider a function \(\psi (z,\theta )\) such that \(E_\theta [\psi (Z,\theta )|Z]=0\) where \(E_\theta (\psi (Z,\theta )|Z)\) denotes the expectation of the random variable \(\psi (Z,\theta )\) with respect to the posterior distribution of \(\theta \) given Z. Let \(E(\psi (Z,\theta ))\) denote the expectation of the random variable \(\psi (Z,\theta )\) with respect to the joint distribution of the random vector \((Z, \theta ).\) Then, for any random variable \(\ell (Z),\) with \(E[|\ell (Z)|]< \infty ,\)
$$\begin{aligned} E(\ell (Z)\psi (Z,\theta ))= & {} E[E(\ell (Z)\psi (Z,\theta )|Z)]\nonumber \\= & {} E[\ell (Z)E(\psi (Z,\theta )|Z)]\nonumber \\= & {} 0 \end{aligned}$$
(2.6)
and hence
$$\begin{aligned} E((\theta -\ell (Z))\psi (Z,\theta ))= E(\theta \psi (Z,\theta )). \end{aligned}$$
(2.7)
Applying the Walker’s inequality given in Corollary 2.1 for the random variable \(X= \theta -\ell (Z)\) and for the random variable \(Y=\psi (Z, \theta )\) with conditional mean zero given the random variable Z and finite second moment, we obtain that
$$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2+\, \frac{(E[\theta \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)}. \end{aligned}$$
(2.8)
A standard application of Cauchy–Schwarz inequality shows that
$$\begin{aligned} E([\theta -\ell (Z)]^2)\ge \frac{(E[\theta \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)} \end{aligned}$$
which gives a weaker lower bound for \(E([\theta -\ell (Z)]^2).\) We now derive few special cases of the inequality (2.8) derived above using the Walker’s inequality.
Special Cases
  1. (i)
    Suppose we choose \(\psi (Z,\theta )= \theta -E(\theta |Z).\) It is obvious that \(E[\psi (Z,\theta )|Z]=0.\) Applying the inequality (2.8), we get that
    $$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2+\, \frac{(E[\theta \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)}\\= & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\, \frac{(E[\theta (\theta -E(\theta |Z))])^2 }{E([\theta -E(\theta |Z)]^2)}\\= & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\, \frac{(E[(\theta -E(\theta |Z)) (\theta -E(\theta |Z))])^2 }{E([\theta -E(\theta |Z)]^2)}\\= & {} (E[\theta -\ell (Z)])^2+\, E([\theta -E(\theta |Z)]^2).\\ \end{aligned}$$
     
  2. (ii)
    Let \(\pi (\theta |z)\) denote the posterior density function of the parameter \(\theta \) given the observation z. Let \(I(\theta )\) denote the Fisher information in the observation Z given the parameter \(\theta .\) Suppose we choose
    $$\begin{aligned} \psi (z,\theta )= \frac{\partial \log (\pi (\theta |z))}{\partial \theta }. \end{aligned}$$
    Observe that \(E[\psi (Z,\theta )|Z]=0\) and it is easy to check that
    $$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\,\frac{(E[(\theta -\ell (Z))\psi (Z,\theta )])^2}{E([\psi (Z,\theta )]^2)}. \end{aligned}$$
    (2.9)
    Let
    $$\begin{aligned} I(\lambda )= E\left[ \left( \frac{\partial \log \lambda (\theta )}{\partial \theta }\right) ^2\right] \end{aligned}$$
    and
    $$\begin{aligned} I(\theta )= E\left[ \left( \frac{\partial \log f(Z,\theta )}{\partial \theta }\right) ^2|\theta \right] . \end{aligned}$$
    Applying the inequality given by Corollary 2.1, we get that
    $$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\,\frac{(E[(\theta -\ell (Z)) \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)}\\= & {} (E[\theta -\ell (Z)])^2+\, \frac{(E[\theta \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)}\\= & {} (E[\theta -\ell (Z)])^2+\, \frac{(E[\theta \psi (Z,\theta )])^2 }{E(I(\theta ))+I(\lambda )}.\\ \end{aligned}$$
     
  3. (iii)
    We will now obtain an improved version of the van Trees inequality [cf. van Trees (1968), Gill and Levit (1995)]. Let
    $$\begin{aligned} \psi (z,\theta )= \frac{\partial \log (f(z,\theta )\lambda (\theta ))}{\partial \theta }. \end{aligned}$$
    Assuming that the prior density \(\lambda (\theta )\) converges to zero as \(\theta \) tends to the boundary of the set \(\Theta \) , it follows that
    $$\begin{aligned} \int _{\Theta }\frac{d[f(z,\theta ) \lambda (\theta )]}{d\theta } d\theta = [f(z,\theta )\lambda (\theta )]_{\partial \Theta }=0 \end{aligned}$$
    (2.10)
    and
    $$\begin{aligned} \int _{\Theta } \theta \frac{d[f(z,\theta ) \lambda (\theta )]}{d\theta } d\theta= & {} [\theta f(z,\theta )\lambda (\theta )]_{\partial \Theta }\nonumber \\&-\,\int _\Theta f(z,\theta ) \lambda (\theta )d\theta \\= & {} -\int _\Theta f(z,\theta )\lambda (\theta )d\theta . \end{aligned}$$
    Using the above equations, it follows that
    $$\begin{aligned} \int _{-\infty }^{\infty }\int _\Theta (\theta -\ell (z))\frac{d[f(z,\theta ) \lambda (\theta )]}{d\theta }d\theta dz= & {} \int _{-\infty }^{\infty }\int _\Theta f(z,\theta )\lambda (\theta )d\theta dz\\= & {} 1. \end{aligned}$$
    Observe that \(E[\psi (Z,\theta )|Z]=0.\) Applying Corollary 2.1, we get that
    $$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\,\frac{(E[(\theta -\ell (Z)) \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)}\\= & {} (E[\theta -\ell (Z)])^2+\, \frac{(E[\theta \psi (Z,\theta )])^2 }{E([\psi (Z,\theta )]^2)}\\= & {} (E[\theta -\ell (Z)])^2+\, \frac{(E[\theta \psi (Z,\theta )])^2 }{E(I(\theta ))+I(\lambda )}. \end{aligned}$$
     
  4. (iv)
    Define the likelihood ratio given by
    $$\begin{aligned} L(z;\theta _1,\theta _2)= \frac{g(z,\theta _1)}{g(z,\theta _2)}. \end{aligned}$$
     
For any fixed \(h \ne 0,\) and \(0<s<1,\) define
$$\begin{aligned} \psi (z,\theta )= L^s(z;\theta + h,\theta )-L^{1-s}(z;\theta - h,\theta ). \end{aligned}$$
Following Weiss and Weinstein (1985), it follows that
$$\begin{aligned} E[\ell (Z) \psi (Z,\theta )]= 0 \end{aligned}$$
and
$$\begin{aligned} E[\theta \psi (Z,\theta )]= -hE[L^{1-s}(Z; \theta -h,\theta )]. \end{aligned}$$
As an application of Corollary 2.1, we get that
$$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\, \frac{h^2 (E[L^{1-s}(Z; \theta -h,\theta )])^2}{E[\psi (Z,\theta )^2]}. \end{aligned}$$
(2.11)
Following arguments given in Weiss and Weinstein (1985), it follows that
$$\begin{aligned} E([\theta -\ell (Z)]^2)\ge & {} (E[\theta -\ell (Z)])^2\nonumber \\&+\,\frac{h^2e^{2\mu (s,h)}}{e^{\mu (2s,h)}+e^{\mu (2s-1),h)}-2e^{\mu (s,2h)}} \end{aligned}$$
(2.12)
where
$$\begin{aligned} \mu (s,h)= & {} \log E[L^s(Z;\theta + h,\theta )]\\= & {} \log \left[ \int _{-\infty }^{\infty }dz \int _{-\infty }^{\infty } [g(z,\theta +h)]^s [g(z,\theta )]^{1-s} d\theta \right] . \end{aligned}$$

Remark

In a similar fashion, it is possible to improve other lower bounds for the risk of Bayesian estimators using Corollary 2.1 as applications of the improved Cauchy–Schwarz inequality due to Walker (2017) and also obtain similar Bayesian bounds for functions of a parameter. Note that the bounds obtained by using Walker’s inequality are sharper than those derived using the Cauchy–Schwarz inequality as illustrated by the Eqs. (2.2) and (2.3). Sudheesh and Dewan (2016) obtained Bayesian lower bound in the Gaussian case as an application of the generalized moment identity derived by them. The class of lower bounds derived above are sharper than those derived in Weiss and Weinstein (1985). As can be seen from the computations made in the special case (iii) discussed above, the lower bound obtained here for the risk of the random parameter \(\theta \) is tighter than the lower bounds obtained earlier in the literature.

References

  1. Babrovsky BZ, Mayer-Wolf E, Zakai M (1987) Some classes of global Cramér–Rao bounds. Ann Stat 15:1421–1438CrossRefMATHGoogle Scholar
  2. Borovkov AA, Sakhanenko AI (1980) On estimates for the average quadratic risk. Probab Math Stat 1:185–195 (In Russian)MATHGoogle Scholar
  3. Brown LD, Gajek L (1990) Information inequalities for the Bayes risk. Ann Stat 18:1578–1594MathSciNetCrossRefMATHGoogle Scholar
  4. Brown LD, Liu RC (1993) Bounds on the Bayes and minimax risk for signal parameter estimation. IEEE Trans Inf Theory 39:1386–1394CrossRefMATHGoogle Scholar
  5. Chazan D, Ziv J, Zakai M (1975) Improved lower bounds on signal parameter estimation. IEEE Trans Inf Theory 21:90–93MathSciNetCrossRefMATHGoogle Scholar
  6. Gart John J (1959) An extension of Cramér–Rao inequality. Ann Math Stat 30:367–380CrossRefMATHGoogle Scholar
  7. Ghosh M (1993) Cramér-Rao bounds for posterior variances. Stat Probab Lett 17:173–178CrossRefMATHGoogle Scholar
  8. Gill RD, Levit Borris Y (1995) Application of the van Trees inequality: a Batesian Cramér–Rao bound. Bernoulli 1:59–79MathSciNetCrossRefGoogle Scholar
  9. Miller R, Chang C (1978) A modified Cramér–Rao bound and its applications. IEEE Trans Inf Theory 24:398–400CrossRefMATHGoogle Scholar
  10. Prakasa Rao BLS (1991) On Cramér–Rao type integral inequalities. Calcutta Stat Assoc Bull 40:183–205. Reprinted In: van Trees H, Bell KL (eds) Bayesian bounds for parameter estimation and nonlinear filtering/tracking. IEEE Press, Wiley, New York. pp 900–922Google Scholar
  11. Prakasa Rao BLS (1992) Cramér–Rao type integral inequalities for functions of multidimensional parameter. Sankhya Ser A 54:53–73MathSciNetMATHGoogle Scholar
  12. Prakasa Rao BLS (1996) Remarks on Cramér–Rao type integral inequalities for randomly censored data. In: Koul HL, Deshpande JV (ed) Analysis of censored data. IMS Lecture Notes No. 27. Institute of Mathematical Statistics, pp 160–176Google Scholar
  13. Prakasa Rao BLS (2000) Cramér–Rao type integral inequalities in Banach spaces. In: Basu AK, Ghosh JK, Sen PK, Sinha BK (eds) Perspectives in statistical sciences. Oxford University Press, New Delhi, pp 245–260Google Scholar
  14. Prakasa Rao BLS (2001) Cramér–Rao type integral inequalities for general loss functions. TEST 10:105–120MathSciNetCrossRefMATHGoogle Scholar
  15. Schutzenberger MP (1957) A generalization of the Frechet–Cramér inequality to the case of Bayes estimation. Bull Am Math Soc 63:142Google Scholar
  16. Shemyakin ML (1987) Rao–Cramér type integral inequalities for estimates of a vector parameter. Theory Probab Appl 32:426–434MathSciNetCrossRefMATHGoogle Scholar
  17. Sudheesh K, Dewan I (2016) On generalized moment identity and its application: a unified approach. Statistics 50:1149–1160MathSciNetCrossRefMATHGoogle Scholar
  18. Targhetta M (1984) On Bayesian analogues to Bhattacharya’s lower bounds. Arab Gulf J Sci Res 2:583–590MathSciNetMATHGoogle Scholar
  19. Targhetta M (1988) On the attainment of a lower bound for the Bayes risk in estimating a parametric function. Statistics 19:233–239MathSciNetCrossRefMATHGoogle Scholar
  20. Targhetta M (1990) A note on the mixing problem and the Schutzenberger inequality. Metrika 37:155–161MathSciNetCrossRefMATHGoogle Scholar
  21. van Trees Harry L (1968) Detection, estimation and modulation theory part 1. Wiley, New YorkMATHGoogle Scholar
  22. van Trees Harry L, Bell Kristine L (2007) Bayesian bounds for parameter estimation and nonlinear filtering/tracking. IEEE Press, Wiley, New YorkCrossRefMATHGoogle Scholar
  23. Walker SG (2017) A self-improvement to the Cauchy–Schwarz inequality. Stat Probab Lett 122:86–90MathSciNetCrossRefMATHGoogle Scholar
  24. Weinstein E, Weiss A (1985) Lower bounds on the mean square estimation error. Proc IEEE 73:1433–1434CrossRefGoogle Scholar
  25. Weiss A, Weinstein E (1985) A lower bound on the mean square error in random parameter estimation. IEEE Trans Inform Theory 31:680–682MathSciNetCrossRefMATHGoogle Scholar
  26. Ziv J, Zakai M (1969) Some lower bounds on signal parameter estimation. IEEE Trans Inform Theory 15:386–391MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Indian Society for Probability and Statistics (ISPS) 2017

Authors and Affiliations

  1. 1.CR Rao Advanced Institute of MathematicsStatistics and Computer ScienceHyderabadIndia

Personalised recommendations