Abstract
Lifetime data is often right-censored. Recent literature on the Gini index estimation with censored data focuses on independent censoring. However, the censoring mechanism is likely to be dependent censoring in practice. This paper proposes two estimators of the Gini index under independent censoring and covariate-dependent censoring, respectively. The proposed estimators are consistent and asymptotically normal. We also evaluate the performance of our estimators in finite samples through Monte Carlo simulations. Finally, the proposed methods are applied to real data.
Similar content being viewed by others
References
Aalen OO (1980) A model for nonparametric regression analysis of counting processes. In: Mathematical statistics and probability theory. Springer, New York
Aalen OO (1989) A linear regression model for the analysis of life times. Stat Med 8(8):907–925
Aalen OO (1993) Further results on the non-parametric linear regression model in survival analysis. Stat Med 12(17):1569–1588
Andersen P, Borgan O, Gill R, Keiding N (1993) Statistical models based on counting processes. Springer, New York
Beran R (1981) Nonparametric regression with randomly censored survival data. Tech. rep., Technical Report, Univ. California, Berkeley
Berrebi ZM, Silber J (1985) The Gini coefficient and negative income: a comment. Oxford Econ Pap 37(3):525–526
Bhattacharya D (2007) Inference on inequality from household survey data. J Econo 137(2):674–707
Bonetti M, Gigliarano C, Muliere P (2009) The Gini concentration test for survival data. Lifetime Data Anal 15(4):493–518
Ceriani L, Verme P (2012) The origins of the Gini index: extracts from variabilità e mutabilità (1912) by Corrado Gini. J Econ Inequal 10(3):421–443
Chen CN, Tsaur TW, Rhai TS (1982) The Gini coefficient and negative income. Oxford Econ Pap 34(3):473–478
Dabrowska DM (1989) Uniform consistency of the kernel conditional Kaplan–Meier estimate. Ann Stat 17(3):1157–1167
Datta S, Satten GA (2002) Estimation of integrated transition hazards and stage occupation probabilities for non-Markov systems under dependent censoring. Biometrics 58(4):792–802
David H (1968) Miscellanea: Gini’s mean difference rediscovered. Biometrika 55(3):573–575
Davidson R (2009) Reliable inference for the Gini index. J Econom 150(1):30–40
Fleming T, Harrington D (1991) Counting processes and survival analysis. Wiley, New York
Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306–316
Gigliarano C, Muliere P (2013) Estimating the Lorenz curve and Gini index with right censored data: a polya tree approach. Metron 71(2):105–122
Gill RD (1980) Censoring and stochastic integrals. Stat Neerl 34(2):124–124
Gini C (1912) Variabilità e mutabilità. Reprinted in Memorie di metodologica statistica (Ed Pizetti E, Salvemini, T) Rome: Libreria Eredi Virgilio Veschi 1
Hanada K (1983) A formula of Gini’s concentration ratio and its application to life tables. J Jpn Stat Soc 13(2):95–98
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19(3):293–325
Kendall M, Stuart A (1977) The advanced theory of statistics, vol 1., Distribution theoryMacmillan, New York
Lambert PJ, Aronson JR (1993) Inequality decomposition analysis and the Gini coefficient revisited. Econ J 103(420):1221–1227
Langel M, Tillé Y (2013) Variance estimation of the Gini index: revisiting a result several times published. J Roy Stat Soc A Sta 176(2):521–540
Leconte E, Poiraud-Casanova S, Thomas-Agnan C (2002) Smooth conditional distribution function and quantiles under random censorship. Lifetime Data Anal 8(3):229–246
Lorenz MO (1905) Methods of measuring the concentration of wealth. Publ Am Stat Assoc 9(70):209–219
Lubrano M (2012) The econometrics of inequality and poverty. Lecture 4: Lorenz curves, the Gini Coefficient and parametric distributions
Martinussen T, Scheike TH (2006) Dynamic regression models for survival data. Springer, New York
McCall BP (1996) Unemployment insurance rules, joblessness, and part-time work. Econometrica 64(3):647–682
Michetti B, Dall’Aglio G (1957) La differenza semplice media. Statistica 7(2):159–255
Ogwang T (2000) A convenient method of computing the Gini index and its standard error. Oxford B Econ Stat 62(1):123–129
Peng L (2011) Empirical likelihood methods for the Gini index. Aust Nz J Stat 53(2):131–139
Qin Y, Rao J, Wu C (2010) Empirical likelihood confidence intervals for the Gini measure of income inequality. Econ Model 27(6):1429–1435
Raffinetti E, Siletti E, Vernizzi A (2015) On the Gini coefficient normalization when attributes with negative values are considered. Stat Method Appl 24(3):507–521
Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: AIDS Epidemiology (pp. 297–331). Birkhäuser, Boston
Robins JM, Rotnitzky A (2005) Inverse probability weighted estimation in survival analysis. In: Encyclopedia of Biostatistics (pp. 2619–2625). Wiley, New York
Satten GA, Datta S, Robins JM (2001) An estimator for the survival function when data are subject to dependent censoring. Stat Probab Lett 54:397–403
Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 94(448):1096–1120
Sen A (1973) On economic inequality. Oxford University Press, Oxford
Sendler W (1979) On statistical inference in concentration measurement. Metrika 26(1):109–122
Sengupta M (2009) Unemployment duration and the measurement of unemployment. J Econ Inequal 7(3):273–294
Shorrocks AF (1980) The class of additively decomposable inequality measures. Econometrica 48(3):613–625
Sun L, Song X, Zhang Z (2012) Mean residual life models with time-dependent coefficients under right censoring. Biometrika 99(1):185–197
Sun Y, Lee J (2011) Testing independent censoring for longitudinal data. Stat Sinica 21(3):1315
Tse SM (2006) Lorenz curve for truncated and censored data. Ann Inst Stat Math 58(4):675–686
Xu K (2003) How has the literature on Gini’s index evolved in the past 80 years? Dalhousie University, Economics Working Paper
Yitzhaki S (1991) Calculating jackknife variance estimators for parameters of the Gini method. J Bus Econ Stat 9(2):235–239
Yitzhaki S, Schechtman E (2013) The Gini methodology: a primer on a statistical methodology. Springer, New York
Acknowledgments
The authors are grateful to the Editor, the Associate Editor, two anonymous Referees for their critical and insightful comments, which led to great improvements in the revised manuscript. This work was supported by the National Natural Science Foundation of China (No. 71501159 and 71401112) and the Fundamental Research Funds for the Central Universities (JBK160113).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
Let \(\check{F}_{cn}(t)\), \(\check{\mu }\), \(\check{\eta }\), and \(\check{G}\) be the counterparts, respectively, obtained by substituting \(\hat{K}_c(T_i)\) with \(K_c(T_i)\) in \(F_{cn}(t)\), \(\hat{\mu }\), \(\hat{\eta }\), and \(\hat{G}\). Given Assumptions 1 and 2, we know \(\hat{K}_c(t)=K_c(t)+o_p(1)\). Because both \(\hat{\mu }\) and \(F_{cn}(t)\) are continuous at \(K_c(T_i)\), we then have \(\hat{\mu }=\check{\mu }+o_p(1)\), and \(F_{cn}(t)=\check{F}_{cn}(t)+o_p(1)\). Because \(\hat{\eta }\) is continuous at \((K_c(T_i), \check{F}_{cn}(T_i))\), we then have \(\hat{\eta }=\check{\eta }+o_p(1)\). Given that \(\hat{G}\) is continuous at \((\check{\mu }, \check{\eta })\), we then have \(\hat{G}=\check{G}+o_p(1)\). Applying the law of large numbers and the law of iterated expectations to \(\check{\mu }\) and \(\check{\eta }\) yields \(\check{\mu }=\mu +o_p(1)\) and \(\check{\eta }=\eta +o_p(1)\). Thus, \(\check{G}=G+o_p(1)\) since \(\check{G}\) is continuous at \((\mu , \eta )\). Then, we have \(\hat{G}=G+o_p(1)\). Therefore, we complete the proof of the first part of Theorem 1.
We take three steps to prove the second part of Theorem 1. The first is to approximate \(\hat{\mu }-\mu \) using a sum of i.i.d. variables. The second is to approximate \(\hat{\eta }-\eta \) using a sum of i.i.d. variables. Finally, we complete the proof of Theorem 1.
For \(\hat{\mu }-\mu \), we have
Similar to Robins and Rotnitzky (1992), we have
Then, we have
According to the martingale representation (Gill 1980), we have
Substituting (14) into \(I_{n2}\) obtains
Then, substituting (13) and (15) into (11) obtains
Let \(h_{ij}=I(\tilde{T}_j\le \tilde{T}_i)\tilde{T}_i+I(\tilde{T}_i\le \tilde{T}_j) \tilde{T}_j\). Then, for \(\hat{\eta }-\eta \), we have
\(J_{n1}\) is a U-statistic. Then, we have the following by the projection method of U-statistics
Substituting (12) into (18) yields
We decompose \(J_{n2}\) into the following three components:
Given that \(\frac{K_c(T_i)-\hat{K}_c(T_i)}{K_c(T_i)}=O_p(n^{-1/2})\) and \(\frac{K_c(T_j)-\hat{K}_c(T_j)}{K_c(T_j)}=O_p(n^{-1/2})\) (see Gill 1980), we can prove \(J_{n2,1}=o_p(n^{-1/2})\). We approximate \(J_{n2,2}\) as follows:
\(J_{n2,3}=J_{n2,2}\). Therefore, we have
Central limit theory implies that \(\hat{\mu }-\mu =O_p(n^{-1/2})\) and \(\hat{\eta }-\eta =O_p(n^{-1/2})\). Then, taking the Taylor series expansion of \(\hat{G}\) at \((\eta ,\mu )\) yields that
Combining (16), (22) and (23), we obtain
Applying the central limit theory (Fleming and Harrington 1991) to (24), we can prove the second part of Theorem 1.\(\square \)
Proof of Theorem 2
Note that
Based on (16), the martingale central limit theory implies that
where
Then, \(\hat{\mu }=\mu +o_p(1)\). Based on (25), we have
Similarly, we can prove that the other 10 elements of \(\hat{{\varOmega }}\) converge in probability to the corresponding ones of \({\varOmega }\). Therefore, we complete the proof of Theorem 2.\(\square \)
Proof of Theorem 3
Given Assumptions 1’ and 2’, the consistency of Aalen’s estimator implies \(\hat{K}_c^{z_i}(t)=K_c^{z_i}(t)+o_p(1)\). Then, similar to the proof of the first part of Theorem 1, we can complete the proof of the first part of Theorem 3. We omit details here for the sake of conciseness.
We now prove the second part of Theorem 3. Given covariate-dependent censoring, along the lines similar to (11) and (13), we have
Note that
Based on (28) and (29), we have
For covariate-dependent censoring, along lines similar to (21) and (22), we have
Similar to the argument of (29), we have
Then, we have
Similar to the argument of (23), we have
Based on (31), (33) and (34), we have
Let \(Q_i=2h_\eta (\tilde{T}_i)-(G+1)\tilde{T}_i\). Similar to 7.4.7 of Andersen et al. (1993), we have
where
We can obtain the martingale representation of \(\sqrt{n} (\hat{G}_z-G)\) by substituting (36) into (35). Then, we can obtain the asymptotic normality of \(\sqrt{n} (\hat{G}_z-G)\) using Rebolledos central limit theorem (see Theorem II.5.1 and proposition II.4.1 in Andersen et al. (1993)).\(\square \)
Rights and permissions
About this article
Cite this article
Lv, X., Zhang, G. & Ren, G. Gini index estimation for lifetime data. Lifetime Data Anal 23, 275–304 (2017). https://doi.org/10.1007/s10985-016-9357-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-016-9357-0