On the efficiency of Gini’s mean difference

Gerstenberger, Carina; Vogel, Daniel

doi:10.1007/s10260-015-0315-x

On the efficiency of Gini’s mean difference

Published: 07 May 2015

Volume 24, pages 569–596, (2015)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Carina Gerstenberger¹ &
Daniel Vogel²

516 Accesses
31 Citations
1 Altmetric
Explore all metrics

Abstract

The asymptotic relative efficiency of the mean deviation with respect to the standard deviation is 88 % at the normal distribution. In his seminal 1960 paper A survey of sampling from contaminated distributions, J. W. Tukey points out that, if the normal distribution is contaminated by a small $\epsilon $-fraction of a normal distribution with three times the standard deviation, the mean deviation is more efficient than the standard deviation—already for $\epsilon < 1\,\%$. In the present article, we examine the efficiency of Gini’s mean difference (the mean of all pairwise distances). Our results may be summarized by saying Gini’s mean difference combines the advantages of the mean deviation and the standard deviation. In particular, an analytic expression for the finite-sample variance of Gini’s mean difference at the normal mixture model is derived by means of the residue theorem, which is then used to determine the contamination fraction in Tukey’s 1:3 normal mixture distribution that renders Gini’s mean difference and the standard deviation equally efficient. We further compute the influence function of Gini’s mean difference, and carry out extensive finite-sample simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gini’s mean difference and variance as measures of finite populations scales

Article 01 July 2015

The median of a jittered Poisson distribution

Article 06 February 2020

An Approximation to the Small Sample Distribution of the Trimmed Mean for Gaussian Mixture Models

Notes

Here, the choice of the location estimator is unambiguous: high breakdown point robustness is the main selling feature of the MAD.
For simplicity, we define the $p$-quantile of distribution $F$ as the value of the quantile function $F^{-1}(p) = \inf \{ x |\, F(x) \le p\}$. For all population distributions we consider, there is no ambiguity, but note that $\hat{F}_n^{-1}(1/2)$ and the sample median $md(\hat{F}_n)$ as defined above are generally different.
https://stat.ethz.ch/pipermail/r-help/2003-April/032820.html.

References

Ahlfors LV (1966) Complex analysis, 2nd edn. McGraw-Hill, New York
MATH Google Scholar
Babu GJ, Rao CR (1992) Expansions for statistics involving the mean absolute deviations. Ann Inst Stat Math 2(44):387–403
Article MathSciNet Google Scholar
Bickel PJ, Lehmann EL (1976) Descriptive statistics for nonparametric models, III. Dispersion. Ann Stat 6:1139–1148
Article MathSciNet Google Scholar
Gorard S (2005) Revisiting a 90-year-old debate: the advantages of the mean deviation. Br J Educ Stud 4:417–430
Article Google Scholar
Hall P, Welsh A (1985) Limit theorems for the median deviation. Ann Inst Stat Math 1(37):27–36
Article MathSciNet Google Scholar
Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69:383–393
Article MATH MathSciNet Google Scholar
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley series in probability and mathematical statistics. Wiley, New York
Google Scholar
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19:293–325
Article MATH MathSciNet Google Scholar
Hojo T (1931) Distribution of the median, quartiles and interquartile distance in samples from a normal population. Biometrika 3–4(23):315–360
Article Google Scholar
Huber PJ, Ronchetti EM (2009) Robust statistics. Wiley series in probability and statistics, 2nd edn. Wiley, Hoboken
Google Scholar
Kenney F, Keeping E (1952) Mathematics of statistics. Part two. D. Van Nostrand Company, Inc., Princeton
Google Scholar
Lax DA (1985) Robust estimators of scale: finite-sample performance in long-tailed symmetric distributions. J Am Stat Assoc 391(80):736–741
Article Google Scholar
Lomnicki ZA (1952) The standard error of Gini’s mean difference. Ann Math Stat 4(23):635–637
Article MathSciNet Google Scholar
Nair US (1936) The standard error of Gini’s mean difference. Biometrika 28:428–436
Article MATH Google Scholar
Pham-Gia T, Hung T (2001) The mean and median absolute deviations. Math Comput Model 7(34):921–936
Article MathSciNet Google Scholar
R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/. ISBN:3-900051-07-0
Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Manuel Koller MM (2014) robustbase: basic robust statistics. . R package version 0.91-1. http://CRAN.R-project.org/package=robustbase
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 424(88):1273–1283
Article MathSciNet Google Scholar
Tukey JW (1960) A survey of sampling from contaminated distributions. In: Olkin I et al (eds) Contributions to probability and statistics. Essays in honor of Harold Hotteling. Stanford University Press, Stanford, pp 448–485
Google Scholar
Yitzhaki S (2003) Gini’s mean difference: a superior measure of variability for non-normal distributions. Metron 2(61):285–316
MathSciNet Google Scholar

Download references

Acknowledgments

We are indebted to Herold Dehling for introducing us to the theory of U-statistics, to Roland Fried for introducing us to robust statistics, and to Alexander Dürre, who has demonstrated the benefit of complex analysis for solving statistical problems. Both authors were supported in part by the Collaborative Research Centre 823 Statistical modelling of nonlinear dynamic processes.

Author information

Authors and Affiliations

Fakultät für Mathematik, Ruhr-Universität Bochum, 44780, Bochum, Germany
Carina Gerstenberger
Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen, AB24 3UE, UK
Daniel Vogel

Authors

Carina Gerstenberger
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Vogel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carina Gerstenberger.

Appendices

Appendix 1: Proofs

Towards the proof of Theorem 1, we spare a few words about the derivation of the corresponding result for the normal distribution. When evaluating the integral $J$, cf. (9), for the standard normal distribution, one encounters the integral

$$\begin{aligned} I_1 = \int _{-\infty }^{\infty } x^2 \phi (x) \varPhi (x)^2 dx, \end{aligned}$$

where $\phi $ and $\varPhi $ denote the density and the cdf of the standard normal distribution, respectively. Nair (1936) gives the value $I_1 = 1/3 + 1/(2\pi \sqrt{3})$, resulting in $J = \sqrt{3}/(2 \pi ) - 1/6$, but does not provide a proof. The author refers to the derivation of a similar integral (integral 8 in Table I, Nair 1936, p. 433), where we find the result as well as the derivation doubtful, and to an article by Hojo (1931), which gives numerical values for several integrals, but does not contain an explanation for the value of $I_1$ either. We therefore include a proof here. Writing $\varPhi (x)$ as the integral of its density and changing the order of the integrals in thus obtained three-dimensional integral yields

$$\begin{aligned} I_1 = (2\pi )^{-3/2} \int _{y=-\infty }^0 \int _{z=\infty }^0 \int _{x=-\infty }^\infty x^2 e^{x^2/2} e^{(y+x)^2/2} e^{(z+x)^2/2} \, d x\, d z\, d y. \end{aligned}$$

Solving the inner integral, we obtain

$$\begin{aligned} I_1 = (18\pi \sqrt{3})^{-1} \int _{y=0}^\infty \int _{z=0}^\infty [ (y+z)^2 + 3 ] \exp \left\{ - \frac{1}{3} \left[ y^2 + z^2 - y z \right] \right\} \, d z\, d y. \end{aligned}$$

Introducing polar coordinates $\alpha , r$ such that $y = r \cos \alpha $, $z = r \sin \alpha $, and solving the integral with respect to $r$, we arrive at

$$\begin{aligned} I_1 = \frac{1}{4\pi \sqrt{3}} \int _{\alpha =0}^\pi \frac{4+\sin \alpha }{(2-\sin \alpha )^2} \, d \alpha . \end{aligned}$$

This remaining integral may be solved by means of the residue theorem (e.g. Ahlfors 1966, p. 149). Substituting $\gamma = e^{i \alpha }$ and using $\sin \alpha = (e^{i \alpha } - e^{- i \alpha })/(2i)$, we transform $I_1$ into the following line integral in the complex plane,

$$\begin{aligned} I_1 = \frac{1}{4\pi \sqrt{3}} \int _{\varGamma _0} \frac{\gamma ^2 + 8 i \gamma -1 }{(\gamma ^2- 4 i \gamma -1 )^2} \, d \gamma , \end{aligned}$$

(10)

where $\varGamma _0$ is the upper unit half circle in the complex plane, cp. Fig. 4. Let us call $h$ the integrand in (10), its poles (both of order two) are $\gamma _{1/2} = (2\pm \sqrt{3})i$, so that $\gamma _2$ lies within the closed upper half unit circle $\varGamma $. The residue of $h$ in $\gamma _2$ is $-\sqrt{3} i /2$. Integrating $h$ along $\varGamma _1$, i.e. the real line from $-$1 to 1, cf. Fig. 4, and applying the residue theorem to the closed line integral along $\varGamma $ completes the derivation.

Proof

(Proof of Theorem 1)

Evaluating the integral $J$ for the normal mixture distribution, we arrive after lengthy calculations at

$$\begin{aligned} J= & {} \Big [ \epsilon ^3 \lambda ^2 + (1-\epsilon )^3 \Big ] \Big [ 2 A(1) + C(1) + E(1) \Big ] \ - \ (\epsilon \lambda ^2 + 1 - \epsilon )B \\&\quad + \ \epsilon ^2(1-\epsilon ) \Big [ 2 (2+ \lambda ^2) A(1/\lambda ) + C(\lambda ) + 2\lambda ^2 D(1/\lambda )+ \lambda (2+ \lambda ^2)E(1/\lambda ) \Big ] \\&\quad + \ \epsilon (1-\epsilon )^2 \Big [ 2 (2\lambda ^2+ 1 ) A(\lambda ) + \lambda ^2C(1/\lambda ) + 2 D(\lambda )+ (\lambda ^{-1} + 2 \lambda )E(\lambda ) \Big ], \end{aligned}$$

where

$$\begin{aligned} A(\lambda )= & {} \int _{-\infty }^\infty x \phi ^2(x) \varPhi (x/\lambda ) dx \ = \ \frac{1}{4 \pi \sqrt{ 1 + 2 \lambda ^2 }}, \quad \\ B= & {} \int _{-\infty }^\infty x^2 \phi (x) \varPhi (x) dx \ = \ \frac{1}{2}, \\ C(\lambda )= & {} \int _{-\infty }^\infty x^2 \phi (x) \varPhi ^2(x/\lambda ) dx \ = \ \frac{1}{4} + \frac{\lambda }{\pi (1 + \lambda ^2)\sqrt{ 2 + \lambda ^2 }}\\&+ \frac{1}{2 \pi } \arctan \left( \frac{1}{\lambda \sqrt{2+\lambda ^2}}\right) , \\ D(\lambda )= & {} \int _{-\infty }^\infty x^2 \phi (x) \varPhi (x) \varPhi (x/\lambda ) dx \ = \ \frac{1}{4} + \frac{3\lambda ^2 + 1}{4 \pi (1 + \lambda ^2)\sqrt{ 2 \lambda ^2 + 1 }}\\&+ \frac{1}{2 \pi } \arctan \left( \frac{1}{\sqrt{2\lambda ^2+ 1}}\right) , \\ E(\lambda )= & {} \int _{-\infty }^\infty \phi ^2(x) \phi (x/\lambda ) dx \ = \ \frac{1}{2 \pi \sqrt{ 1 + 2 \lambda ^2 }}, \end{aligned}$$

for all $\lambda > 0$. As before, $\phi $ and $\varPhi $ denote the density and the cdf of standard normal distribution. The tricky integrals are $C(\lambda )$ and $D(\lambda )$, which, for $\lambda = 1$, both reduce to the integral $I_1$ above. Proceeding as before for the integral $I_1$, solving the respective two inner integrals yields

$$\begin{aligned}&C(\lambda ) = \frac{\lambda ^3}{2 \pi \sqrt{2+\lambda ^2}} \int _0^{\pi /2} \frac{ 3 + \lambda ^2 + \sin (2\alpha )}{ \{1 + \lambda ^2 - \sin (2 \alpha )\}^2} d\alpha , \\&D(\lambda ) = \frac{1}{2 \pi \sqrt{1+2\lambda ^2}} \int _0^{\pi /2} \frac{ 2 + \lambda ^2 (2+ \sin (2\alpha )) + (3\lambda ^4 - \lambda ^2 - 2) \sin ^2(\alpha )}{ \{ 2 - \sin (2\alpha ) + (\lambda ^2-1) \sin ^2(\alpha ) \}^2 } d \alpha . \end{aligned}$$

These integrals are again solved by the residue theorem, which completes the proof. $\square $

For the proof of Theorem 2, the following identities are helpful:

$$\begin{aligned}&\textstyle \int x \left( 1 + \frac{x^2}{\beta } \right) ^\alpha dx \ = \ \frac{\beta }{2(\alpha +1)} \left( 1 + \frac{x^2}{\beta }\right) ^{\alpha +1}, \quad \alpha \ne -1, \ \beta \ne 0. \end{aligned}$$

(11)

$$\begin{aligned}&\textstyle \int _{-\infty }^\infty \left( 1 + \frac{x^2}{\nu }\right) ^{-\nu } dx \ = \ \frac{1}{c_{2\nu -1}} \sqrt{\frac{\nu }{2\nu -1}}, \quad \nu > 0, \end{aligned}$$

(12)

$$\begin{aligned}&\textstyle \int _{-\infty }^\infty \left( 1 + \frac{x^2}{\nu }\right) ^{-\frac{3\nu -1}{2}} dx \ = \ \frac{1}{c_{3\nu -2}} \sqrt{\frac{\nu }{3\nu -2}}, \quad \nu > 0, \end{aligned}$$

(13)

where $c_{\nu }$ is the scaling factor of the $t_\nu $ density, cf. Table 1. The identities (12) and (13) can be obtained by transforming the respective left-hand sides into a $t_\nu $-densities by substituting $y = ((2\nu -1)/\nu )^{1/2} \, x$ and $y = ((3\nu -2)/\nu )^{1/2}\, x$, respectively.

Proof

(Proof of Theorem 2) For computing $g$, we evaluate (7), successively making use of (11) and (12), and obtain

$$\begin{aligned} g \ = \ 4 \frac{ \nu \, c_\nu ^2}{\nu -1} \int _{-\infty }^{\infty } \Big ( 1 + \frac{x^2}{\nu } \Big )^{-\nu } \, dx \ = \ \frac{4 \, \nu ^{3/2} \, c_{\nu }^2}{(\nu -1)\, \sqrt{2\nu -1}\, c_{2\nu -1}}, \end{aligned}$$

which can be written as in Theorem 2 by using $B(x,y) = \varGamma (x)\varGamma (y)/\varGamma (x+y)$. For evaluating $J$, we write $J$ as $J = \int _\mathbb {R}A(x) f_{\nu }(x)\, dx$ with $f_\nu $ being the $t_\nu $ density and

$$\begin{aligned} A(x)= & {} \int _{-\infty }^x \int _x^{\infty } x z f_\nu (z) f_\nu (y) \,dz\, dy \ - \ \int _{-\infty }^x \int _x^{\infty } y z f_\nu (z) f_\nu (y) \,dz\, dy \\&- \int _{-\infty }^x \int _x^{\infty } x^2 f_\nu (z) f_\nu (y) \,dz\, dy \ + \ \int _{-\infty }^x \int _x^{\infty } x y f_\nu (z) f_\nu (y) \,dz\, dy \\= & {} A_1(x) - A_2(x) - A_3(x) + A_4(x). \end{aligned}$$

Using (11), we obtain

$$\begin{aligned} A_1(x) + A_4(x) \ = \ \frac{c_\nu \, \nu \, x}{\nu -1}\left( 1 + \frac{x^2}{\nu } \right) ^{-\frac{\nu -1}{2}} \, \int _{-x}^x f_\nu (y) \, dy, \end{aligned}$$

and

$$\begin{aligned} - A_2(x) \ = \ \left( \frac{c_\nu \,\nu }{\nu -1} \right) ^2 \, \left( 1+ \frac{x^2}{\nu }\right) ^{-\nu +1}. \end{aligned}$$

Hence, $J = B_1 + B_2 - B_3$ with

$$\begin{aligned} \textstyle&B_1 \ = \ \int _{-\infty }^{\infty } \frac{c_\nu \, \nu \, x}{\nu -1} \left( 1 + \frac{x^2}{\nu } \right) ^{-\frac{\nu -1}{2}} f_\nu (x) \, \int _{-x}^x f_\nu (y) \, dy \, dx, \\&\textstyle B_2 \ = \ \int _{-\infty }^{\infty } \left( \frac{c_\nu \,\nu }{\nu -1} \right) ^2 \, \left( 1+ \frac{x^2}{\nu }\right) ^{-\nu +1} \! f_\nu (x) \, dx, \\&B_3 \ = \ \int _{-\infty }^{\infty } x^2 F_\nu (x) \left( 1 - F_\nu (x) \right) f_\nu (x) \, dx \ = \ \frac{\nu }{2(\nu -2)} - \int _{-\infty }^{\infty } x^2 f_\nu (x) F_\nu ^2(x) \, dx, \end{aligned}$$

where $F_\nu $ is the cdf of the $t_\nu $ distribution. By employing (11) and (13), we find

$$\begin{aligned} B_1 \ = \ B_2 \ = \ \frac{2}{c_{3\nu -2}}\, \left( \frac{c_\nu \,\nu }{\nu -1} \right) ^2 \, \sqrt{\frac{\nu }{3\nu -2}} \end{aligned}$$

and arrive, again by employing $B(x,y) = \varGamma (x)\varGamma (y)/\varGamma (x+y)$, at the expression for $J$ given in Theorem 2. $\square $

The remaining integral

$$\begin{aligned} K_\nu \ = \int _{-\infty }^{\infty } x^2 f_\nu (x) F_\nu ^2(x) \, dx \end{aligned}$$

cannot be solved by the same means as the analogous integral $I_1$ for the normal distribution, and we state this as an open problem. However, this one-dimensional integral can easily be approximated numerically, and the expression is quickly entered into a mathematical software like R (R Development Core Team 2010).

Proof

(Proof of Proposition 1) We have

$$\begin{aligned}&g(F_{\epsilon ,x}) = 2 \int _{-\infty }^{\infty } \int _{y}^{\infty } (z - y)\, d F_{\epsilon ,x} (z)\, d F_{\epsilon ,x}(y), \\&\quad = (1-\epsilon )^2 g(F) + 2 \epsilon (1-\epsilon ) \int _{-\infty }^{\infty } (x-z) \left\{ 1\!\!1_{ (-\infty ,x] } (y) - 1\!\!1_{ [x,\infty ) } (y) \right\} \, d F(y) \end{aligned}$$

and hence

$$\begin{aligned}&I\!F(x,g(\cdot ); F) = \lim _{\epsilon \searrow 0} \frac{1}{\epsilon } \{ g(F_{\epsilon ,x}) - g(F) \} \\&\quad = \ - 2 g(F) + 2 \left\{ x [ F(x) + F(x-) - 1 ] + E[X 1\!\!1_{ \{ X\ge x \} } ] - E[X 1\!\!1_{ \{ X \le x \} } ] \right\} , \end{aligned}$$

which completes the proof. $\square $

With the influence function known, it is also possible use the relationship

$$\begin{aligned} ASV(s_n;F) = \int _\mathbb {R}I\!F(x,s,F)^2 F(dx) \end{aligned}$$

instead of referring to the terms given in Sect. 2 to compute the asymptotic variance of the estimators. This leads to the same integrals.

Appendix 2: Miscellaneous

Lemma 1

For $X_1,\ldots , X_n$ being independent and $U(a,b)$ distributed for $a,b \in \mathbb {R}$, $a < b$, we have for the sample mean deviation (about the median)

$$\begin{aligned} E(d_n) = {\left\{ \begin{array}{ll} (b-a)/4 &{} \quad \hbox {for odd } n \, (n \ge 3), \\ \displaystyle \frac{b-a}{4}\frac{n^2}{n^2-1} &{} \quad \hbox { for even } n. \end{array}\right. } \end{aligned}$$

Proof

For notational convenience we restrict our attention to the case $a=0$, $b=1$. Let $X_{(i)}$ denote the $i$th order statistic, $1 \le i \le n$. The random variable $X_{(i)}$ has a Beta$(\alpha ,\beta )$ distribution with parameters $\alpha = i$ and $\beta = n+1-i$, and hence $E(X_{(i)}) = i/(n+1)$. If $n$ is odd, we write $d_n$ as $d_n = (n-1)^{-1} \sum _{i=1}^{\lfloor n/2 \rfloor } (X_{(n+1-i)} - X_{(i)})$ and obtain

$$\begin{aligned} E(d_n) = \frac{1}{n-1} \sum _{i=1}^{\lfloor n/2 \rfloor } \left( \frac{n+1-i}{n+1} - \frac{i}{n+1} \right) = \frac{1}{4}. \end{aligned}$$

If $n$ is even, we have $d_n = (n-1)^{-1} \sum _{i=1}^{n/2} (X_{(n+1-i)} - X_{(i)})$, and hence

$$\begin{aligned} E(d_n) = \frac{1}{n-1} \sum _{i=1}^{n/2} \left( \frac{n+1-i}{n+1} - \frac{i}{n+1} \right) = \frac{n^2}{4(n^2-1)}, \end{aligned}$$

which completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gerstenberger, C., Vogel, D. On the efficiency of Gini’s mean difference. Stat Methods Appl 24, 569–596 (2015). https://doi.org/10.1007/s10260-015-0315-x

Download citation

Accepted: 23 April 2015
Published: 07 May 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10260-015-0315-x

Keywords

Mathematics Subject Classification

Jel Classification

C13

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the efficiency of Gini’s mean difference

Abstract

Access this article

Similar content being viewed by others

Gini’s mean difference and variance as measures of finite populations scales

The median of a jittered Poisson distribution

An Approximation to the Small Sample Distribution of the Trimmed Mean for Gaussian Mixture Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs

Proof

Proof

Proof

Appendix 2: Miscellaneous

Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Jel Classification

Navigation

On the efficiency of Gini’s mean difference

Abstract

Access this article

Similar content being viewed by others

Gini’s mean difference and variance as measures of finite populations scales

The median of a jittered Poisson distribution

An Approximation to the Small Sample Distribution of the Trimmed Mean for Gaussian Mixture Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs

Proof

Proof

Proof

Appendix 2: Miscellaneous

Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Jel Classification

Search

Navigation