Skip to main content
Log in

On the estimation of the Lorenz curve under complex sampling designs

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

This paper focuses on the estimation of the concentration curve of a finite population, when data are collected according to a complex sampling design with different inclusion probabilities. A (design-based) Hájek type estimator for the Lorenz curve is proposed, and its asymptotic properties are studied. Then, a resampling scheme able to approximate the asymptotic law of the Lorenz curve estimator is constructed. Applications are given to the construction of (i) a confidence band for the Lorenz curve, (ii) confidence intervals for the Gini concentration ratio, and (iii) a test for Lorenz dominance. The merits of the proposed resampling procedure are evaluated through a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Anderson G (1996) Nonparametric tests of stochastic dominance in income distribution. Econometrica 64:1183–1193

    Article  Google Scholar 

  • Antal E, Tillé Y (2011) A direct bootstrap method for complex sampling designs from a finite population. J Am Stat Assoc 106(494):534–543

    Article  MathSciNet  Google Scholar 

  • Barabesi L, Diana G, Perri PF (2016) Linearization of inequality indices in the design-based framework. Statistics 50:1161–1172

    Article  MathSciNet  Google Scholar 

  • Barrett GF, Donald SG, Bhattacharya D (2014) Consistent nonparametric tests for Lorenz dominance. J Bus Econ Stat 32:1–13

    Article  MathSciNet  Google Scholar 

  • Bhattacharya D (2005) Asymptotic inference from multi-stage samples. J Econom 126:145–171

    Article  MathSciNet  Google Scholar 

  • Bhattacharya D (2007) Inference on inequality from household survey data. J Econom 137:674–707

    Article  MathSciNet  Google Scholar 

  • Bickel PJ, Freedman D (1981) Some asymptotic theory for the bootstrap. Ann Stat 9:1196–1216

    Article  MathSciNet  Google Scholar 

  • Boistard H, Lopuhaä R, Ruiz-Gazen A (2017) Functional central limit theorems for single-stage sampling designs. Ann Stat 45:1728–1758

    Article  MathSciNet  Google Scholar 

  • Chauvet G (2007) Méthodes de bootstrap en population finie. Ph.D. Dissertation, Laboratoire de statistique d’enquêtes, CREST-ENSAI, Universioté de Rennes 2

  • Conti PL, Di Iorio A (2018) Analytic inference in finite populations via resampling, with applications to confidence intervals and testing for independence. Preprint arXiv:1809.08035 (submitted for publication)

  • Conti PL, Marella D (2015) Inference for quantiles of a finite population: asymptotic vs. resampling results. Scand J Stat 42:545–561

    Article  MathSciNet  Google Scholar 

  • Csörgő M, Csörgő S, Horváth L (1986) An asymptotic theory for empirical reliability and concentration processes. Springer, Berlin

    Book  Google Scholar 

  • Davidson R (2009) Reliable inference for the Gini index. J Econom 150:30–40

    Article  MathSciNet  Google Scholar 

  • Gastwirth JL (1972) The estimation of Lorenz curve and Gini index. Rev Econ Stat 54:306–316

    Article  MathSciNet  Google Scholar 

  • Giorgi GM (1999) Income inequality measurement: the statistical approach. In: Silber J (ed) Hanbdbook of income inequtality measurement. Kluwer Academic Publishers, Boston

    Google Scholar 

  • Giorgi GM, Gigliarano C (2017) The Gini concentration index: a review of the inference literature. J Econ Surv 31:1130–1148

    Article  Google Scholar 

  • Goldie CM (1977) Convergence theorems for empirical Lorenz curve and their inverses. Ann Appl Probab 9:765–791

    Article  MathSciNet  Google Scholar 

  • Hájek J (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat 35:1491–1523

    Article  MathSciNet  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  Google Scholar 

  • Langel M, Tillé Y (2013) Variance estimation of the Gini index: revisiting a result several times published. J R Stat Soc Ser A 176:521540

    Article  MathSciNet  Google Scholar 

  • Leadbetter MR, Weissner JH (1969) On continuity and other analytic properties of stochastic process sample functions. Proc Am Math Soc 22:291–294

    Article  MathSciNet  Google Scholar 

  • Lifshits MA (1982) On the absolute continuity of distributions of functionals of random processes. Theory Probab Appl 27:600–607

    Article  MathSciNet  Google Scholar 

  • Marella D, Vicard P (2018) PC complex: PC algorithm for complex survey data. Working Paper n. 240, Dipartimento di Economia - Università Roma Tre. ISSN: 2279-6916 (submitted for publication)

  • Massart P (1990) The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann Probab 18:1269–1283

    Article  MathSciNet  Google Scholar 

  • Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337

    Article  Google Scholar 

  • Pfeffermann D, Sverchkov M (2004) Prediction of finite population totals based on the sample distribution. Surv Methodol 30:79–92

    Google Scholar 

  • Sen PK, Singer J (1993) Large sample methods in statistics. Champam & Hall, London

    Book  Google Scholar 

  • Tillé Y (2006) Sampling algorithms. Springer, New York

    MATH  Google Scholar 

  • van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Zheng B (2002) Testing Lorenz curves with non-simple random samples. Econometrica 70:1235–1243

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Funding was provided by Sapienza Università di Roma (C26A144TFX - Nuove metodologie di ricampionamento per indagini complesse con applicazioni alla stima di misure di disuguaglianza; C26A15W8EK - Un nuovo approccio all’imputazione singola e multipla).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pier Luigi Conti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Proposition 2

Suppose that \(y < t\). From (25) it is not difficult to see that

$$\begin{aligned} E[ ( {\mathcal {W}}^H (t) - {\mathcal {W}}^H (y) )^2 ]= & {} C_1 (y, \, y) + C_1 (t, \, t) - 2 C_1 (t, \, y) \nonumber \\&+ f ( C_2 (y, \, y) + C_2 (t, \, t) - 2 C_2 (t, \, y) ) \nonumber \\= & {} {\mathbb {E}} [ X_1 ] \left( T_{-1} (t) - T_{-1} (y) \right) + f \left( F(t) - F(y) \right) \nonumber \\&- \frac{f^{3}}{d} \left\{ \left( \frac{T_{1} (y)}{{\mathbb {E}} [ X_{1} ]} - F(y) \right) \left( \frac{T_{1} (t)}{{\mathbb {E}} [ X_{1} ]} - F(t) \right) \right\} ^2 \nonumber \\&- 2 {\mathbb {E}} [ X_1 ] ( T_{-1} (y) - T_{-1} (y) ) ( F(y) - F(t)) \nonumber \\&+ 2 {\mathbb {E}} [ X_1 ] \left( {\mathbb {E}} [ X_1^{-1} ] +1 \right) ( F(t) - F( y) )^2 \nonumber \\&+ f \{ (F(t) - F(y) ) - (F(t) - F(y) )^2 \} . \end{aligned}$$
(71)

Assumption C1 implies that

$$\begin{aligned} \left| T_{\alpha } (t) - T_{\alpha } (y) \right| \le M_{\alpha } \left| F (t) - F(y) \right| \end{aligned}$$

so that from (71) it is not difficult to see that

$$\begin{aligned} E[ ( {\mathcal {W}}^H (t) - {\mathcal {W}}^H (y) )^2 ] \le C \left| F(t) - F(y) \right| . \end{aligned}$$
(72)

C being an appropriate constant. Inequality (72) also holds when \(y>t\). Hence, in terms of the process \({\mathcal {B}}^H\) introduced above we may write

$$\begin{aligned} {\mathbb {E}} \left[ \left( {\mathcal {B}}^H (t) - {\mathcal {B}}^H (y) \right) ^2 \right] \le C \vert t-y \vert \quad \forall \, y, \, t \in [0, \, 1] . \end{aligned}$$
(73)

Inequality (73) and the Gaussianity of \({\mathcal {B}}^H (t) - {\mathcal {B}}^H (y)\), in their turn, imply that

$$\begin{aligned} {\mathbb {E}} \left[ \left( {\mathcal {B}}^H (t) - {\mathcal {B}}^H (y) \right) ^2 \right] \le \frac{C}{\left| \log ( t-y ) \right| ^{\beta }} \quad \forall \, \beta > 1 , \quad \forall \, y, \, t \in [0, \, 1] . \end{aligned}$$
(74)

Observing that \(P ( {\mathcal {B}}^H (0) =0) = P ({\mathcal {B}}^H (1) =0) =1\), Proposition 2 now follows from (74) and Leadbetter and Weissner (1969). \(\square \)

Proof of Proposition 6

Let

$$\begin{aligned} R^{*}_{n} (z) = P_{P^{*}} \left( \left. Z^{*}_{n,m} \le z \, \right| {\varvec{Y}}_N, \, {\varvec{X}}_N, \, {\varvec{D}}_N, \, N^*_1 , \ldots N^*_M \right) \end{aligned}$$

be the (resampling) d.f. of \(Z^{*}_{n,m} \) (48). By Dvoretzky–Kiefer–Wolfowitz inequality (cfr. Massart 1990), we have first

$$\begin{aligned} P \left( \left. \sup _{z} \left| {\widehat{R}}^{*}_{n,M} (z) - R^{*}_{n} (z) \right| > \epsilon \, \right| {\varvec{Y}}_N, \, {\varvec{X}}_N, \, {\varvec{D}}_N, \, N^*_1 , \ldots N^*_M \right) \le 2 \exp \left\{ -2 M \epsilon ^2 \right\} . \nonumber \\ \end{aligned}$$
(75)

Using the Borel–Cantelli first lemma, and taking into account that \(R^{*}_{n} (z)\) converges uniformly to \(P ( \sup _p \vert {\mathcal {L}}^H (p) \vert \le z )\), (51) immediately follows. Statement (52) follows from (51) and the absolute continuity of the distribution of \(\sup _p \vert {\mathcal {L}}^H (p) \vert \) (cfr. Lifshits 1982). \(\square \)

Proof of Proposition 7

Proof of (57) and (58) is similar to Proposition 6. As far as (59) is concerned, it is a consequence of Th. 2.5.5. in Sen and Singer (1993) (pp. 90–91). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conti, P.L., Di Iorio, A., Guandalini, A. et al. On the estimation of the Lorenz curve under complex sampling designs. Stat Methods Appl 29, 1–24 (2020). https://doi.org/10.1007/s10260-019-00478-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-019-00478-6

Keywords

Navigation