On the estimation of the Lorenz curve under complex sampling designs

Conti, Pier Luigi; Di Iorio, Alberto; Guandalini, Alessio; Marella, Daniela; Vicard, Paola; Vitale, Vincenzina

doi:10.1007/s10260-019-00478-6

On the estimation of the Lorenz curve under complex sampling designs

Original Paper
Published: 19 June 2019

Volume 29, pages 1–24, (2020)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Pier Luigi Conti¹,
Alberto Di Iorio⁵,
Alessio Guandalini²,
Daniela Marella³,
Paola Vicard⁴ &
…
Vincenzina Vitale⁶

293 Accesses
2 Citations
Explore all metrics

Abstract

This paper focuses on the estimation of the concentration curve of a finite population, when data are collected according to a complex sampling design with different inclusion probabilities. A (design-based) Hájek type estimator for the Lorenz curve is proposed, and its asymptotic properties are studied. Then, a resampling scheme able to approximate the asymptotic law of the Lorenz curve estimator is constructed. Applications are given to the construction of (i) a confidence band for the Lorenz curve, (ii) confidence intervals for the Gini concentration ratio, and (iii) a test for Lorenz dominance. The merits of the proposed resampling procedure are evaluated through a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Estimation of the Distribution Function of a Finite Population Under High Entropy Sampling Designs, with Applications

Article 19 July 2014

Application of the full Bayesian significance test to model selection under informative sampling

Article 08 September 2016

Comparison of Accuracy Properties of Point Estimators for the Ratio of Binomial Proportions with the Inverse-Direct Sampling Scheme

Article 01 April 2020

References

Anderson G (1996) Nonparametric tests of stochastic dominance in income distribution. Econometrica 64:1183–1193
Article Google Scholar
Antal E, Tillé Y (2011) A direct bootstrap method for complex sampling designs from a finite population. J Am Stat Assoc 106(494):534–543
Article MathSciNet Google Scholar
Barabesi L, Diana G, Perri PF (2016) Linearization of inequality indices in the design-based framework. Statistics 50:1161–1172
Article MathSciNet Google Scholar
Barrett GF, Donald SG, Bhattacharya D (2014) Consistent nonparametric tests for Lorenz dominance. J Bus Econ Stat 32:1–13
Article MathSciNet Google Scholar
Bhattacharya D (2005) Asymptotic inference from multi-stage samples. J Econom 126:145–171
Article MathSciNet Google Scholar
Bhattacharya D (2007) Inference on inequality from household survey data. J Econom 137:674–707
Article MathSciNet Google Scholar
Bickel PJ, Freedman D (1981) Some asymptotic theory for the bootstrap. Ann Stat 9:1196–1216
Article MathSciNet Google Scholar
Boistard H, Lopuhaä R, Ruiz-Gazen A (2017) Functional central limit theorems for single-stage sampling designs. Ann Stat 45:1728–1758
Article MathSciNet Google Scholar
Chauvet G (2007) Méthodes de bootstrap en population finie. Ph.D. Dissertation, Laboratoire de statistique d’enquêtes, CREST-ENSAI, Universioté de Rennes 2
Conti PL, Di Iorio A (2018) Analytic inference in finite populations via resampling, with applications to confidence intervals and testing for independence. Preprint arXiv:1809.08035 (submitted for publication)
Conti PL, Marella D (2015) Inference for quantiles of a finite population: asymptotic vs. resampling results. Scand J Stat 42:545–561
Article MathSciNet Google Scholar
Csörgő M, Csörgő S, Horváth L (1986) An asymptotic theory for empirical reliability and concentration processes. Springer, Berlin
Book Google Scholar
Davidson R (2009) Reliable inference for the Gini index. J Econom 150:30–40
Article MathSciNet Google Scholar
Gastwirth JL (1972) The estimation of Lorenz curve and Gini index. Rev Econ Stat 54:306–316
Article MathSciNet Google Scholar
Giorgi GM (1999) Income inequality measurement: the statistical approach. In: Silber J (ed) Hanbdbook of income inequtality measurement. Kluwer Academic Publishers, Boston
Google Scholar
Giorgi GM, Gigliarano C (2017) The Gini concentration index: a review of the inference literature. J Econ Surv 31:1130–1148
Article Google Scholar
Goldie CM (1977) Convergence theorems for empirical Lorenz curve and their inverses. Ann Appl Probab 9:765–791
Article MathSciNet Google Scholar
Hájek J (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat 35:1491–1523
Article MathSciNet Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article MathSciNet Google Scholar
Langel M, Tillé Y (2013) Variance estimation of the Gini index: revisiting a result several times published. J R Stat Soc Ser A 176:521540
Article MathSciNet Google Scholar
Leadbetter MR, Weissner JH (1969) On continuity and other analytic properties of stochastic process sample functions. Proc Am Math Soc 22:291–294
Article MathSciNet Google Scholar
Lifshits MA (1982) On the absolute continuity of distributions of functionals of random processes. Theory Probab Appl 27:600–607
Article MathSciNet Google Scholar
Marella D, Vicard P (2018) PC complex: PC algorithm for complex survey data. Working Paper n. 240, Dipartimento di Economia - Università Roma Tre. ISSN: 2279-6916 (submitted for publication)
Massart P (1990) The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann Probab 18:1269–1283
Article MathSciNet Google Scholar
Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337
Article Google Scholar
Pfeffermann D, Sverchkov M (2004) Prediction of finite population totals based on the sample distribution. Surv Methodol 30:79–92
Google Scholar
Sen PK, Singer J (1993) Large sample methods in statistics. Champam & Hall, London
Book Google Scholar
Tillé Y (2006) Sampling algorithms. Springer, New York
MATH Google Scholar
van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book Google Scholar
Zheng B (2002) Testing Lorenz curves with non-simple random samples. Econometrica 70:1235–1243
Article MathSciNet Google Scholar

Download references

Acknowledgements

Funding was provided by Sapienza Università di Roma (C26A144TFX - Nuove metodologie di ricampionamento per indagini complesse con applicazioni alla stima di misure di disuguaglianza; C26A15W8EK - Un nuovo approccio all’imputazione singola e multipla).

Author information

Authors and Affiliations

Dipartimento di Scienze Statistiche, Sapienza Università di Roma, P.le A. Moro, 5, 00185, Rome, Italy
Pier Luigi Conti
ISTAT, Via Cesare Balbo, 16, 00184, Rome, Italy
Alessio Guandalini
Dipartimento di Scienze della Formazione, Università Roma Tre, via del Castro Pretorio 20, 00185, Rome, Italy
Daniela Marella
Dipartimento di Economia, Università Roma Tre, Via Silvio D’Amico, 77, 00145, Rome, Italy
Paola Vicard
Banca D’Italia, Via Nazionale, 91, 00184, Roma, Italy
Alberto Di Iorio
Dipartimento di Scienze Sociali ed Economiche, Sapienza Università di Roma, P.le A. Moro, 5, 00185, Roma, Italy
Vincenzina Vitale

Authors

Pier Luigi Conti
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Di Iorio
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Guandalini
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Marella
View author publications
You can also search for this author in PubMed Google Scholar
Paola Vicard
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzina Vitale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pier Luigi Conti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Proposition 2

Suppose that $y < t$. From (25) it is not difficult to see that

$$\begin{aligned} E[ ( {\mathcal {W}}^H (t) - {\mathcal {W}}^H (y) )^2 ]= & {} C_1 (y, \, y) + C_1 (t, \, t) - 2 C_1 (t, \, y) \nonumber \\&+ f ( C_2 (y, \, y) + C_2 (t, \, t) - 2 C_2 (t, \, y) ) \nonumber \\= & {} {\mathbb {E}} [ X_1 ] \left( T_{-1} (t) - T_{-1} (y) \right) + f \left( F(t) - F(y) \right) \nonumber \\&- \frac{f^{3}}{d} \left\{ \left( \frac{T_{1} (y)}{{\mathbb {E}} [ X_{1} ]} - F(y) \right) \left( \frac{T_{1} (t)}{{\mathbb {E}} [ X_{1} ]} - F(t) \right) \right\} ^2 \nonumber \\&- 2 {\mathbb {E}} [ X_1 ] ( T_{-1} (y) - T_{-1} (y) ) ( F(y) - F(t)) \nonumber \\&+ 2 {\mathbb {E}} [ X_1 ] \left( {\mathbb {E}} [ X_1^{-1} ] +1 \right) ( F(t) - F( y) )^2 \nonumber \\&+ f \{ (F(t) - F(y) ) - (F(t) - F(y) )^2 \} . \end{aligned}$$

(71)

Assumption C1 implies that

$$\begin{aligned} \left| T_{\alpha } (t) - T_{\alpha } (y) \right| \le M_{\alpha } \left| F (t) - F(y) \right| \end{aligned}$$

so that from (71) it is not difficult to see that

$$\begin{aligned} E[ ( {\mathcal {W}}^H (t) - {\mathcal {W}}^H (y) )^2 ] \le C \left| F(t) - F(y) \right| . \end{aligned}$$

(72)

C being an appropriate constant. Inequality (72) also holds when $y>t$. Hence, in terms of the process ${\mathcal {B}}^H$ introduced above we may write

$$\begin{aligned} {\mathbb {E}} \left[ \left( {\mathcal {B}}^H (t) - {\mathcal {B}}^H (y) \right) ^2 \right] \le C \vert t-y \vert \quad \forall \, y, \, t \in [0, \, 1] . \end{aligned}$$

(73)

Inequality (73) and the Gaussianity of ${\mathcal {B}}^H (t) - {\mathcal {B}}^H (y)$, in their turn, imply that

$$\begin{aligned} {\mathbb {E}} \left[ \left( {\mathcal {B}}^H (t) - {\mathcal {B}}^H (y) \right) ^2 \right] \le \frac{C}{\left| \log ( t-y ) \right| ^{\beta }} \quad \forall \, \beta > 1 , \quad \forall \, y, \, t \in [0, \, 1] . \end{aligned}$$

(74)

Observing that $P ( {\mathcal {B}}^H (0) =0) = P ({\mathcal {B}}^H (1) =0) =1$, Proposition 2 now follows from (74) and Leadbetter and Weissner (1969). $\square $

Proof of Proposition 6

Let

$$\begin{aligned} R^{*}_{n} (z) = P_{P^{*}} \left( \left. Z^{*}_{n,m} \le z \, \right| {\varvec{Y}}_N, \, {\varvec{X}}_N, \, {\varvec{D}}_N, \, N^*_1 , \ldots N^*_M \right) \end{aligned}$$

be the (resampling) d.f. of $Z^{*}_{n,m} $ (48). By Dvoretzky–Kiefer–Wolfowitz inequality (cfr. Massart 1990), we have first

$$\begin{aligned} P \left( \left. \sup _{z} \left| {\widehat{R}}^{*}_{n,M} (z) - R^{*}_{n} (z) \right| > \epsilon \, \right| {\varvec{Y}}_N, \, {\varvec{X}}_N, \, {\varvec{D}}_N, \, N^*_1 , \ldots N^*_M \right) \le 2 \exp \left\{ -2 M \epsilon ^2 \right\} . \nonumber \\ \end{aligned}$$

(75)

Using the Borel–Cantelli first lemma, and taking into account that $R^{*}_{n} (z)$ converges uniformly to $P ( \sup _p \vert {\mathcal {L}}^H (p) \vert \le z )$, (51) immediately follows. Statement (52) follows from (51) and the absolute continuity of the distribution of $\sup _p \vert {\mathcal {L}}^H (p) \vert $ (cfr. Lifshits 1982). $\square $

Proof of Proposition 7

Proof of (57) and (58) is similar to Proposition 6. As far as (59) is concerned, it is a consequence of Th. 2.5.5. in Sen and Singer (1993) (pp. 90–91). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conti, P.L., Di Iorio, A., Guandalini, A. et al. On the estimation of the Lorenz curve under complex sampling designs. Stat Methods Appl 29, 1–24 (2020). https://doi.org/10.1007/s10260-019-00478-6

Download citation

Accepted: 02 June 2019
Published: 19 June 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10260-019-00478-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the estimation of the Lorenz curve under complex sampling designs

Abstract

Access this article

Similar content being viewed by others

On the Estimation of the Distribution Function of a Finite Population Under High Entropy Sampling Designs, with Applications

Application of the full Bayesian significance test to model selection under informative sampling

Comparison of Accuracy Properties of Point Estimators for the Ratio of Binomial Proportions with the Inverse-Direct Sampling Scheme

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Proposition 2

Proof of Proposition 6

Proof of Proposition 7

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the estimation of the Lorenz curve under complex sampling designs

Abstract

Access this article

Similar content being viewed by others

On the Estimation of the Distribution Function of a Finite Population Under High Entropy Sampling Designs, with Applications

Application of the full Bayesian significance test to model selection under informative sampling

Comparison of Accuracy Properties of Point Estimators for the Ratio of Binomial Proportions with the Inverse-Direct Sampling Scheme

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Proposition 2

Proof of Proposition 6

Proof of Proposition 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation