Abstract
This paper discusses regression analysis of doubly censored failure time data when there may exist a cured subgroup. By doubly censored data, we mean that the failure time of interest denotes the elapsed time between two related events and the observations on both event times can suffer censoring (Sun in The statistical analysis of interval-censored failure time data. Springer, New York, 2006). One typical example of such data is given by an acquired immune deficiency syndrome cohort study. Although many methods have been developed for their analysis (De Gruttola and Lagakos in Biometrics 45:1–12, 1989; Sun et al. in Biometrics 55:909–914, 1999; 60:637–643, 2004; Pan in Biometrics 57:1245–1250, 2001), it does not seem to exist an established method for the situation with a cured subgroup. This paper discusses this later problem and presents a sieve approximation maximum likelihood approach. In addition, the asymptotic properties of the resulting estimators are established and an extensive simulation study indicates that the method seems to work well for practical situations. An application is also provided.
Similar content being viewed by others
References
Choi S, Huang X, Chen Y-H (2014) A class of semiparametric transformation models for survival data with a cured proportion. Lifetime Data Anal 20:369–386
De Gruttola V, Lagakos SW (1989) Analysis of doubly-censored survival data, with application to AIDS. Biometrics 45:1–12
Fang HB, Li G, Sun J (2005) Maximum likelihood estimation in a semiparametric logistic/proportional hazards mixture model. Scand J Stat 32:59–75
Farewell VT (1986) Mixture models in survival analysis: Are they worth the risk? Can J Stat 14:257–262
Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42:845–854
Gómez G, Lagakos SW (1994) Estimation of the infection time and latency distribution of AIDS with doubly censored data. Biometrics 50:204–212
Hu T, Xiang L (2016) Partially linear transformation cure models for interval-censored data. Comput Stat Data Anal 93:257–269
Huang J (1996) Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 24:540–568
Huang J, Rossini AJ (1997) Sieve estimation for the proportional odds failure-time regression model with interval censoring. J Am Stat Assoc 92:960–967
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Kim Y, De Gruttola V, Lagakos SW (1993) Analyzing doubly censored data with covariates, with application to AIDS. Biometrics 49:13–22
Lam KF, Xue H (2005) A semiparametric regression cure model with current status data. Biometrika 92:573–586
Lu W, Ying Z (2004) On semiparametric transformation cure models. Biometrika 91:331–343
Ma S (2009) Cure model with current status data. Stat Sin 19:233–249
Ma S (2010) Mixed case interval censored data with a cured subgroup. Stat Sin 20:1165–1181
Pan W (2001) A multiple approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 57:1245–1250
Sun J (1997) Self-consistency estimation of distributions based on truncated and doubly censored data with applications to AIDS cohort studies. Lifetime Data Anal 3:305–313
Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York
Sun J, Liao Q, Pagano M (1999) Regression analysis of doubly censored failure time data with applications to AIDS studies. Biometrics 55:909–914
Sun L, Kim Y, Sun J (2004) Regression analysis of doubly censored failure time data using the additive hazards model. Biometrics 60:637–643
Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38:290–295
Acknowledgements
The authors wish to thank the Editor-in-Chief, Dr. Mei-Ling Lee, the Associate Editor and two reviewers for their many helpful comments and suggestions that greatly improved the paper. This work was partly supported by the National Nature Science Foundation of China Grant Nos. 11371062, 11671338, 11731011, 11671168 and the Science and Technology Developing Plan of Jilin Province Grant No. 20170101061JC.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of the asymptotic properties of the estimated regression parameters
Appendix: Proofs of the asymptotic properties of the estimated regression parameters
In this appendix, we will sketch the proof of the asymptotic properties of \(\hat{\psi }_n\) with the focus on the asymptotic normality. Note that the proof for the consistency of \(\hat{\psi }_n\) is essentially the same as that for the consistency result given in Huang (1996) and Huang and Rossini (1997) and thus omitted. For the proof, we need the following regularity conditions.
-
C1.
The censoring intervals \(( L_i , R_i )\) and \(( U_i , V_i )\) are independent of the covariates \(Z_i\)’s.
-
C2.
The parameter space for regression parameters \(\theta \) is a compact subset of \(\mathbb {R}^{1+2d}\), where d denotes the dimension of the covariates \(Z_i\)’s.
-
C3.
The covariates \(Z_i\)’s are bounded and their distribution is not concentrated on any proper affine subspace of \(R^d\).
-
C4.
There exist some constants \(0<\tau _0<\tau _1\) and \(0<m_0<M_0<\infty \) such that \(P(\tau _0\le L\le \min (R,U,V)\le \tau _1)=1\), \(P(\tau _0\le \max (L,R,U)\le V\le \tau _1)=1\) and \(m_0<\Lambda _0(\tau _0)<\Lambda _0(\tau _1)<M_0\).
-
C5.
For \(r = 1\) or 2, \(\Lambda _{0}\in {\mathcal {H}}_r\), \(h^*\in {\mathcal {H}}_r\), where \(h^*\) is defined at (A2) below,
$$\begin{aligned} {\mathcal {H}}_r=\{h: r\hbox {th derivatives of}~ h~\hbox {are bounded and continuous over}~[\tau _{0},\tau _{1}]\} \end{aligned}$$and \(q_n = O ( n^{\kappa } )\) with \(1 / 4 r< \kappa <1/2\).
-
C6.
The Fisher Information matrix \(I(\psi _0)\) is positive definite and bounded, where \(I(\psi _0)\) is defined at (A4) below.
-
C7.
Assume that \(\sup _{t\in [\tau _0,\tau _1]}|\hat{H}(t)-H_0(t)|=O_p(n^{-r\kappa })\).
Note that the conditions described above are commonly used in the interval-censored data literature. In particular, condition C1 is to ensure the independent censorship, and conditions C3 and C4 usually hold in typical medical studies. Condition C3 is for the identification of the model and condition C4 is commonly required to ensure the existence of interval censoring. Condition C5 is necessary for the proof and conditions C6 and C7 are different from conditions 1–5 due to the involvement of the cumulative distribution function \(H_0\). To establish the asymptotic normality and efficiency of the estimator \(\hat{\psi }_n\), we will first derive the efficient score function and calculate the information bound for \(\psi _0\).
Define
and
By Condition C4, both \(1-A_1\) and \(A_2-A_1\) are positive functions of (x, u, v, z). Then the score function for \(\psi ^T=(\alpha , \beta ^T, \theta ^T)\) is given by
Consider a small perturbation of \(\Lambda _0\) defined by \(\Lambda _s=\Lambda _0+sh\) with \(s\thicksim 0\) and \(h\in L_2(P)\) such that \(\Lambda _s\) satisfies Condition C4. Then the score operator for \(\Lambda _0\) is
Let F be the distribution function corresponding to \(\Lambda _0\) and P the joint probability measure of \((\delta _1 , \delta _2 , X , U , V , Z)\). Then the score operator \(\dot{l}_{\Lambda _0}\) maps \(L_2^0(F)\) to \(L_2^0(P)\), where \(L_2^0(F)\equiv \{a:\int a dF=0 \hbox { and } \int a^2 dF<\infty \}\), and \(L_2^0(P)\) is defined similarly as \(L_2^0(F)\). Let \(\dot{l}_{\Lambda _0}^T\) : \(L_2^0(P)\rightarrow L_2^0(F)\) be the adjoint operator of \(\dot{l}_{\Lambda _0}\), i.e., for any \(a \in L_2^0(F)\) and \(b \in L_2^0(P)\),
where \(\langle \cdot ,\cdot \rangle _P\) and \(\langle \cdot ,\cdot \rangle _F\) are the inner products in \(L_2^0(P)\) and \(L_2^0(F)\), respectively. We need to find \(h^*\) such that \(\dot{l}_\psi \mid x\, - \, \dot{l}_{\Lambda _0}[h^*]\mid x\) is orthogonal to \(\dot{l}_{\Lambda _0}[h]\mid x\) in \(L_2^0(P)\). This amounts to solving the following normal equation:
First note that we have
Also by the conditions above, we have
where g(u, v|z) is the conditional density of (U, V) given Z. Let
We can obtain
After some straightforward calculations, one can obtain the derivative of L(t) as
where
Similarly, the derivative of \(R(t)\equiv \dot{l}_{\Lambda _0}^T \dot{l}_\psi \mid x\) has the form
where
and
So the equation (A1) reduces to
Again by Condition C4, we have \(\inf _{\tau _0\le t\le \tau _1}b(t)>0\). Let \(d(t)=-r(t)/b(t)\) and \(K(x,t,w)=\big \{B_3(x,t+x,w)1_{[t+x\le w\le \tau _1]}+ B_2(x,w,t+x)1_{[\tau _0\le w\le t+x ]}\big \}/b(t)\). Then we obtain a Fredholm integral equation of the second kind
By the classical results on Fredholm integral equations, there exists a resolvent \(\Gamma (x,t,w)\) (completely determined by K) such that
It follows that the efficient score function is given by
and the information matrix has the form
which is assumed to be positive definite.
Let P be the probability measure and \(P_n\) its empirical probability measure. Also let \((\hat{\psi }_n^*,\hat{\Lambda }_n^*)\) be the maximum likelihood estimator of \(l_n(\psi ,\Lambda _0,H_0)\). We will establish the efficiency and asymptotic normality through the following two lemmas.
Lemma A1
Under Conditions (C1)–(C6), we have
Proof
To prove the result, it follows from Huang (1996) that it suffices to verify that the conditions below hold. Denote the one sample log-likelihood function as \(l(\psi ,\Lambda )\) and similarly for other functions mentioned below.
-
(1)
By the definition of the maximum likelihood function and the fact that \(\dot{l}_{\Lambda _0} (\psi ,\Lambda )\) belongs to P-Donsker class, we have that \( P_n \dot{l}_{\psi } (\hat{\psi }_{n}^* )=0\) and \(P_n \dot{l}_{\Lambda _0}[h^{*}] (\hat{\psi }_n^*,\hat{\Lambda }_n^*)=o_p(n^{-1/2})\).
-
(2)
(Rate of convergence) \(\Vert \hat{\psi }_n^* - \psi _0\Vert ^2_2 + \Vert \widehat{\Lambda }_{n}^*-\Lambda _0\Vert ^2_2\) \( = O_p(\max \{n^{-1-\kappa },n^{-2r\kappa }\})\). This condition is quite common in the sieve estimation literature and the proof is quite the same and thus omitted here.
-
(3)
Stochastic equicontinuity and smoothness of the model can be easily verified by the forms of the corresponding functions.
-
(4)
Assume that the Fisher Information matrix is positive definite and bounded.
\(\square \)
This implies that the conditions are all satisfied and thus the lemma holds.
Lemma A2
Under Conditions (C1)–(C7), we have
Proof
First note that the working efficient score function for \(\psi \) is given by \( P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d\hat{H}(x)\), where \(h_n^*\) is the projection function of \(h^*\) into the sieve subspace. Then
Thus the lemma follows since \(P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d{H}_0(x)\) is the efficient score function used to obtain \(\hat{\psi }_n^*\). \(\square \)
Rights and permissions
About this article
Cite this article
Wang, P., Tong, X. & Sun, J. A semiparametric regression cure model for doubly censored data. Lifetime Data Anal 24, 492–508 (2018). https://doi.org/10.1007/s10985-017-9406-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-017-9406-3