Skip to main content

Advertisement

Log in

A semiparametric regression cure model for doubly censored data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

This paper discusses regression analysis of doubly censored failure time data when there may exist a cured subgroup. By doubly censored data, we mean that the failure time of interest denotes the elapsed time between two related events and the observations on both event times can suffer censoring (Sun in The statistical analysis of interval-censored failure time data. Springer, New York, 2006). One typical example of such data is given by an acquired immune deficiency syndrome cohort study. Although many methods have been developed for their analysis (De Gruttola and Lagakos in Biometrics 45:1–12, 1989; Sun et al. in Biometrics 55:909–914, 1999; 60:637–643, 2004; Pan in Biometrics 57:1245–1250, 2001), it does not seem to exist an established method for the situation with a cured subgroup. This paper discusses this later problem and presents a sieve approximation maximum likelihood approach. In addition, the asymptotic properties of the resulting estimators are established and an extensive simulation study indicates that the method seems to work well for practical situations. An application is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Choi S, Huang X, Chen Y-H (2014) A class of semiparametric transformation models for survival data with a cured proportion. Lifetime Data Anal 20:369–386

    Article  MathSciNet  MATH  Google Scholar 

  • De Gruttola V, Lagakos SW (1989) Analysis of doubly-censored survival data, with application to AIDS. Biometrics 45:1–12

    Article  MathSciNet  MATH  Google Scholar 

  • Fang HB, Li G, Sun J (2005) Maximum likelihood estimation in a semiparametric logistic/proportional hazards mixture model. Scand J Stat 32:59–75

    Article  MathSciNet  MATH  Google Scholar 

  • Farewell VT (1986) Mixture models in survival analysis: Are they worth the risk? Can J Stat 14:257–262

    Article  MathSciNet  Google Scholar 

  • Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42:845–854

    Article  MathSciNet  MATH  Google Scholar 

  • Gómez G, Lagakos SW (1994) Estimation of the infection time and latency distribution of AIDS with doubly censored data. Biometrics 50:204–212

    Article  MATH  Google Scholar 

  • Hu T, Xiang L (2016) Partially linear transformation cure models for interval-censored data. Comput Stat Data Anal 93:257–269

    Article  MathSciNet  Google Scholar 

  • Huang J (1996) Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 24:540–568

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Rossini AJ (1997) Sieve estimation for the proportional odds failure-time regression model with interval censoring. J Am Stat Assoc 92:960–967

    Article  MathSciNet  MATH  Google Scholar 

  • Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Kim Y, De Gruttola V, Lagakos SW (1993) Analyzing doubly censored data with covariates, with application to AIDS. Biometrics 49:13–22

    Article  MATH  Google Scholar 

  • Lam KF, Xue H (2005) A semiparametric regression cure model with current status data. Biometrika 92:573–586

    Article  MathSciNet  MATH  Google Scholar 

  • Lu W, Ying Z (2004) On semiparametric transformation cure models. Biometrika 91:331–343

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S (2009) Cure model with current status data. Stat Sin 19:233–249

    MathSciNet  MATH  Google Scholar 

  • Ma S (2010) Mixed case interval censored data with a cured subgroup. Stat Sin 20:1165–1181

    MathSciNet  MATH  Google Scholar 

  • Pan W (2001) A multiple approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 57:1245–1250

    Article  MathSciNet  MATH  Google Scholar 

  • Sun J (1997) Self-consistency estimation of distributions based on truncated and doubly censored data with applications to AIDS cohort studies. Lifetime Data Anal 3:305–313

    Article  MATH  Google Scholar 

  • Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York

    MATH  Google Scholar 

  • Sun J, Liao Q, Pagano M (1999) Regression analysis of doubly censored failure time data with applications to AIDS studies. Biometrics 55:909–914

    Article  MATH  Google Scholar 

  • Sun L, Kim Y, Sun J (2004) Regression analysis of doubly censored failure time data using the additive hazards model. Biometrics 60:637–643

    Article  MathSciNet  MATH  Google Scholar 

  • Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38:290–295

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the Editor-in-Chief, Dr. Mei-Ling Lee, the Associate Editor and two reviewers for their many helpful comments and suggestions that greatly improved the paper. This work was partly supported by the National Nature Science Foundation of China Grant Nos. 11371062, 11671338, 11731011, 11671168 and the Science and Technology Developing Plan of Jilin Province Grant No. 20170101061JC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peijie Wang.

Appendix: Proofs of the asymptotic properties of the estimated regression parameters

Appendix: Proofs of the asymptotic properties of the estimated regression parameters

In this appendix, we will sketch the proof of the asymptotic properties of \(\hat{\psi }_n\) with the focus on the asymptotic normality. Note that the proof for the consistency of \(\hat{\psi }_n\) is essentially the same as that for the consistency result given in Huang (1996) and Huang and Rossini (1997) and thus omitted. For the proof, we need the following regularity conditions.

  1. C1.

    The censoring intervals \(( L_i , R_i )\) and \(( U_i , V_i )\) are independent of the covariates \(Z_i\)’s.

  2. C2.

    The parameter space for regression parameters \(\theta \) is a compact subset of \(\mathbb {R}^{1+2d}\), where d denotes the dimension of the covariates \(Z_i\)’s.

  3. C3.

    The covariates \(Z_i\)’s are bounded and their distribution is not concentrated on any proper affine subspace of \(R^d\).

  4. C4.

    There exist some constants \(0<\tau _0<\tau _1\) and \(0<m_0<M_0<\infty \) such that \(P(\tau _0\le L\le \min (R,U,V)\le \tau _1)=1\), \(P(\tau _0\le \max (L,R,U)\le V\le \tau _1)=1\) and \(m_0<\Lambda _0(\tau _0)<\Lambda _0(\tau _1)<M_0\).

  5. C5.

    For \(r = 1\) or 2, \(\Lambda _{0}\in {\mathcal {H}}_r\), \(h^*\in {\mathcal {H}}_r\), where \(h^*\) is defined at (A2) below,

    $$\begin{aligned} {\mathcal {H}}_r=\{h: r\hbox {th derivatives of}~ h~\hbox {are bounded and continuous over}~[\tau _{0},\tau _{1}]\} \end{aligned}$$

    and \(q_n = O ( n^{\kappa } )\) with \(1 / 4 r< \kappa <1/2\).

  6. C6.

    The Fisher Information matrix \(I(\psi _0)\) is positive definite and bounded, where \(I(\psi _0)\) is defined at (A4) below.

  7. C7.

    Assume that \(\sup _{t\in [\tau _0,\tau _1]}|\hat{H}(t)-H_0(t)|=O_p(n^{-r\kappa })\).

Note that the conditions described above are commonly used in the interval-censored data literature. In particular, condition C1 is to ensure the independent censorship, and conditions C3 and C4 usually hold in typical medical studies. Condition C3 is for the identification of the model and condition C4 is commonly required to ensure the existence of interval censoring. Condition C5 is necessary for the proof and conditions C6 and C7 are different from conditions 1–5 due to the involvement of the cumulative distribution function \(H_0\). To establish the asymptotic normality and efficiency of the estimator \(\hat{\psi }_n\), we will first derive the efficient score function and calculate the information bound for \(\psi _0\).

Define

$$\begin{aligned} A_1(x,u,v,z)=\exp (-\Lambda _0(u-x)\exp (\theta ^{T}z)), \end{aligned}$$

and

$$\begin{aligned} A_2(x,u,v,z)=\exp (-\Lambda _0(v-x)\exp (\theta ^{T}z)). \end{aligned}$$

By Condition C4, both \(1-A_1\) and \(A_2-A_1\) are positive functions of (xuvz). Then the score function for \(\psi ^T=(\alpha , \beta ^T, \theta ^T)\) is given by

$$\begin{aligned} \dot{l}_{\psi }\mid x= & {} \left\{ \frac{\delta _{1}+\delta _{2}}{p(z)}+\frac{(1-\delta _{1}-\delta _{2}) (A_2(x,u,v,z)-1)}{p(z)A_2(x,u,v,z)+1-p(z)}\right\} \frac{\partial p(z)}{\partial \psi }\\&+\left\{ \frac{\delta _{1}A_1(x,u,v,z)\Lambda _0(u-x)}{1-A_1(x,u,v,z)} +\frac{\delta _{2}(-A_1(x,u,v,z)\Lambda _0(u-x)+A_2(x,u,v,z)\Lambda _0(v-x))}{A_1(x,u,v,z)-A_2(x,u,v,z)}\right. \\&\left. \quad +\frac{-(1-\delta _{1}-\delta _{2})p(z)A_2(x,u,v,z)\Lambda _0(v-x)}{p(z)A_2(x,u,v,z)+1-p(z)}\right\} \frac{\partial \exp (\theta ^{T}z)}{\partial \psi }. \end{aligned}$$

Consider a small perturbation of \(\Lambda _0\) defined by \(\Lambda _s=\Lambda _0+sh\) with \(s\thicksim 0\) and \(h\in L_2(P)\) such that \(\Lambda _s\) satisfies Condition C4. Then the score operator for \(\Lambda _0\) is

$$\begin{aligned} \dot{l}_{\Lambda _0}[h]\mid x= & {} \left\{ \frac{\delta _{1}A_1(x,u,v,z)h(u-x)}{1-A_1(x,u,v,z)} +\frac{\delta _{2}(-A_1(x,u,v,z)h(u-x)+A_2(x,u,v,z)h(v-x))}{A_1(x,u,v,z)-A_2(x,u,v,z)}\right. \\&\left. +\frac{-(1-\delta _{1}-\delta _{2})p(z)A_2(x,u,v,z)h(v-x)}{p(z)A_2(x,u,v,z)+1-p(z)}\right\} \exp (\theta ^{T}z). \end{aligned}$$

Let F be the distribution function corresponding to \(\Lambda _0\) and P the joint probability measure of \((\delta _1 , \delta _2 , X , U , V , Z)\). Then the score operator \(\dot{l}_{\Lambda _0}\) maps \(L_2^0(F)\) to \(L_2^0(P)\), where \(L_2^0(F)\equiv \{a:\int a dF=0 \hbox { and } \int a^2 dF<\infty \}\), and \(L_2^0(P)\) is defined similarly as \(L_2^0(F)\). Let \(\dot{l}_{\Lambda _0}^T\) : \(L_2^0(P)\rightarrow L_2^0(F)\) be the adjoint operator of \(\dot{l}_{\Lambda _0}\), i.e., for any \(a \in L_2^0(F)\) and \(b \in L_2^0(P)\),

$$\begin{aligned} \langle b , \dot{l}_{\Lambda _0}a \rangle _P \, = \, \langle \dot{l}_{\Lambda _0}^Tb , a \rangle _F , \end{aligned}$$

where \(\langle \cdot ,\cdot \rangle _P\) and \(\langle \cdot ,\cdot \rangle _F\) are the inner products in \(L_2^0(P)\) and \(L_2^0(F)\), respectively. We need to find \(h^*\) such that \(\dot{l}_\psi \mid x\, - \, \dot{l}_{\Lambda _0}[h^*]\mid x\) is orthogonal to \(\dot{l}_{\Lambda _0}[h]\mid x\) in \(L_2^0(P)\). This amounts to solving the following normal equation:

$$\begin{aligned} \dot{l}_{\Lambda _0}^T \dot{l}_{\Lambda _0}[h^*]\mid x\, = \, \dot{l}_{\Lambda _0}^T \dot{l}_\psi \mid x. \end{aligned}$$
(A1)

First note that we have

$$\begin{aligned} \dot{l}_{\Lambda _0}^T \dot{l}_{\Lambda _0}[h] (t)\mid x \, = \, E[\dot{l}_{\Lambda _0}[h]|x, T \, = \, t] \, = \, E_ZE[\dot{l}_{\Lambda _0}[h]|x, T \, = \, t, Z \, = \ z]. \end{aligned}$$

Also by the conditions above, we have

$$\begin{aligned} E[\dot{l}_{\Lambda _0}[h]|x, T \,= & {} \, t, Z \, = \ z]= \left\{ \int _{t+x}^{\tau _1}\int _{u}^{\tau _1}\frac{A_1(x,u,v,z)h(u-x)}{1-A_1(x,u,v,z)}g(u,v|z)dvdu \right. \\&\left. \qquad +\int _{\tau _0}^{t+x}\int _{t+x}^{\tau _1}\frac{-A_1(x,u,v,z)h(u-x)+A_2(x,u,v,z)h(v-x)}{A_1(x,u,v,z)-A_2(x,u,v,z)}g(u,v|z)dvdu \right. \\&\left. \qquad +\int _{\tau _0}^{\tau _1}\int _{u}^{t+x}\frac{p(z)A_2(x,u,v,z)h(v-x)}{p(z)A_2(x,u,v,z)+1-p(z)}g(u,v|z)dvdu \right\} \exp (\theta ^Tz), \end{aligned}$$

where g(uv|z) is the conditional density of (UV) given Z. Let

$$\begin{aligned} B_1(x,u,v)= & {} E_Z\left[ \frac{A_1(x,u,v,Z)}{1-A_1(x,u,v,Z)}g(u,v|Z)\exp (\theta ^TZ)\right] ,\\ B_2(x,u,v)= & {} E_Z\left[ \frac{A_1(x,u,v,Z)}{A_1(x,u,v,Z)-A_2(x,u,v,Z)}g(u,v|Z)\exp (\theta ^TZ)\right] ,\\ B_3(x,u,v)= & {} E_Z\left[ \frac{A_2(x,u,v,Z)}{A_1(x,u,v,Z)-A_2(x,u,v,Z)}g(u,v|Z)\exp (\theta ^TZ)\right] ,\\ B_4(x,u,v)= & {} E_Z\left[ \frac{p(Z)A_2(x,u,v,Z)}{p(Z)A_2(x,u,v,Z)+1-p(Z)}g(u,v|Z)\exp (\theta ^TZ)\right] . \end{aligned}$$

We can obtain

$$\begin{aligned} L(t)\equiv \dot{l}_{\Lambda _0}^T \dot{l}_{\Lambda _0}[h] (t)\mid x= & {} \left\{ \int _{t+x}^{\tau _1}\int _{u}^{\tau _1}B_1(x,u,v)h(u-x)dvdu \right. \\&\left. +\int _{\tau _0}^{t+x}\int _{t+x}^{\tau _1}B_2(x,u,v)h(u-x)+B_3(x,u,v)h(v-x)dvdu \right. \\&\left. +\int _{\tau _0}^{\tau _1}\int _{u}^{t+x}B_4(x,u,v)h(v-x)dvdu \right\} . \end{aligned}$$

After some straightforward calculations, one can obtain the derivative of L(t) as

$$\begin{aligned} L^{'}(t)= & {} -b(t)h(t)+\int _{t+x}^{\tau _1}B_3(x,t+x,v)h(v-x)dv\\&+\int _{\tau _0}^{t+x}B_2(x,u,t+x)h(u-x)du, \end{aligned}$$

where

$$\begin{aligned} b(t)= & {} \int _{t+x}^{\tau _1}B_1(x,t+x,v)+B_2(x,t+x,v)dv\\&+\int _{\tau _0}^{t+x}B_3(x,u,t+x)+B_4(x,u,t+x)du. \end{aligned}$$

Similarly, the derivative of \(R(t)\equiv \dot{l}_{\Lambda _0}^T \dot{l}_\psi \mid x\) has the form

$$\begin{aligned} r(t)\equiv R^{'}(t)= & {} \int _{t+x}^{\tau _1}\big \{-C_3(x,t+x,v)-C_4(x,t+x,v)+C_5(x,t+x,v)\big \}dv\\&+\int _{\tau _0}^{t+x}\big \{-C_1(x,u,t+x)+C_2(x,u,t+x)\\&+C_4(x,u,t+x)-C_5(x,u,t+x)-C_6(x,u,t+x)\big \}du, \end{aligned}$$

where

$$\begin{aligned} C_1(x,u,v)= & {} E_Z\left[ \frac{1}{p(Z)}\frac{\partial p(Z)}{\partial \psi }g(u,v|Z)\right] ,\\ C_2(x,u,v)= & {} E_Z\left[ \frac{(A_2(x,u,v,Z)-1)}{p(Z)A_2(x,u,v,Z)+1-p(Z)}\frac{\partial p(Z)}{\partial \psi }g(u,v|Z)\right] ,\\ C_3(x,u,v)= & {} E_Z\left[ \frac{A_1(x,u,v,Z)\Lambda _0(u-x)}{1-A_1(x,u,v,Z)}\frac{\partial \exp (\theta ^{T}Z)}{\partial \psi }g(u,v|Z)\right] ,\\ C_4(x,u,v)= & {} E_Z\left[ \frac{A_1(x,u,v,Z)\Lambda _0(u-x)}{A_1(x,u,v,Z)-A_2(x,u,v,Z)} \frac{\partial \exp (\theta ^{T}Z)}{\partial \psi }g(u,v|Z)\right] ,\\ C_5(x,u,v)= & {} E_Z\left[ \frac{A_2(x,u,v,Z)\Lambda _0(v-x)}{A_1(x,u,v,Z)-A_2(x,u,v,Z)} \frac{\partial \exp (\theta ^{T}Z)}{\partial \psi }g(u,v|Z)\right] , \end{aligned}$$

and

$$\begin{aligned} C_6(x,u,v)=E_Z\left[ \frac{p(Z)A_2(x,u,v,Z)\Lambda _0(v-x)}{p(Z)A_2(x,u,v,Z)+1-p(Z)} \frac{\partial \exp (\theta ^{T}Z)}{\partial \psi }g(u,v|Z)\right] . \end{aligned}$$

So the equation (A1) reduces to

$$\begin{aligned}&-b(t)h(t)+\int _{t+x}^{\tau _1}B_3(x,t+x,v)h(v-x)dv\\&\quad +\int _{\tau _0}^{t+x}B_2(x,u,t+x)h(u-x)du=r(t). \end{aligned}$$

Again by Condition C4, we have \(\inf _{\tau _0\le t\le \tau _1}b(t)>0\). Let \(d(t)=-r(t)/b(t)\) and \(K(x,t,w)=\big \{B_3(x,t+x,w)1_{[t+x\le w\le \tau _1]}+ B_2(x,w,t+x)1_{[\tau _0\le w\le t+x ]}\big \}/b(t)\). Then we obtain a Fredholm integral equation of the second kind

$$\begin{aligned} h^{*}(t)-\int K(x,t,w)h^{*}(w)dw=d(t). \end{aligned}$$

By the classical results on Fredholm integral equations, there exists a resolvent \(\Gamma (x,t,w)\) (completely determined by K) such that

$$\begin{aligned} h^{*}(t) \, = \, d(t)+\int \Gamma (x,t,w)d(w)dw. \end{aligned}$$
(A2)

It follows that the efficient score function is given by

$$\begin{aligned} \dot{l}^{*}_{\psi }(\psi ,\Lambda ) =\int [ \dot{l}_{\psi }(x) -\dot{l}_{\Lambda _0}[h^{*}](x)] dH_0(x), \end{aligned}$$
(A3)

and the information matrix has the form

$$\begin{aligned} I ( \psi _{0} ) = E(\dot{l}^{*}_{ \psi }(\psi _0,\Lambda _0) )^{\otimes 2} , \end{aligned}$$
(A4)

which is assumed to be positive definite.

Let P be the probability measure and \(P_n\) its empirical probability measure. Also let \((\hat{\psi }_n^*,\hat{\Lambda }_n^*)\) be the maximum likelihood estimator of \(l_n(\psi ,\Lambda _0,H_0)\). We will establish the efficiency and asymptotic normality through the following two lemmas.

Lemma A1

Under Conditions (C1)–(C6), we have

$$\begin{aligned} \sqrt{n}(\hat{\psi }_n^*- \psi _0)\rightarrow N(0,I^{-1}(\psi _0)). \end{aligned}$$

Proof

To prove the result, it follows from Huang (1996) that it suffices to verify that the conditions below hold. Denote the one sample log-likelihood function as \(l(\psi ,\Lambda )\) and similarly for other functions mentioned below.

  1. (1)

    By the definition of the maximum likelihood function and the fact that \(\dot{l}_{\Lambda _0} (\psi ,\Lambda )\) belongs to P-Donsker class, we have that \( P_n \dot{l}_{\psi } (\hat{\psi }_{n}^* )=0\) and \(P_n \dot{l}_{\Lambda _0}[h^{*}] (\hat{\psi }_n^*,\hat{\Lambda }_n^*)=o_p(n^{-1/2})\).

  2. (2)

    (Rate of convergence) \(\Vert \hat{\psi }_n^* - \psi _0\Vert ^2_2 + \Vert \widehat{\Lambda }_{n}^*-\Lambda _0\Vert ^2_2\) \( = O_p(\max \{n^{-1-\kappa },n^{-2r\kappa }\})\). This condition is quite common in the sieve estimation literature and the proof is quite the same and thus omitted here.

  3. (3)

    Stochastic equicontinuity and smoothness of the model can be easily verified by the forms of the corresponding functions.

  4. (4)

    Assume that the Fisher Information matrix is positive definite and bounded.

\(\square \)

This implies that the conditions are all satisfied and thus the lemma holds.

Lemma A2

Under Conditions (C1)–(C7), we have

$$\begin{aligned} \sqrt{n}(\hat{\psi }_n^*-\hat{\psi }_n)=o_p(1). \end{aligned}$$

Proof

First note that the working efficient score function for \(\psi \) is given by \( P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d\hat{H}(x)\), where \(h_n^*\) is the projection function of \(h^*\) into the sieve subspace. Then

$$\begin{aligned} \begin{array}{rl} &{}\displaystyle P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d\hat{H}(x)-P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d{H}_0(x)\\ &{}\qquad =\displaystyle P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d[\hat{H}(x)- {H}_0(x)]\\ &{}\qquad = \displaystyle P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h ^*] ]d[\hat{H}(x)- {H}_0(x)]- P_n \int [ \dot{l}_{\Lambda }[h_n^*]\\ &{}\qquad \quad -\dot{l}_{\Lambda }[h ^*] ]d[\hat{H}(x)- {H}_0(x)]\\ &{}\qquad =o_p(n^{-1/2})+o_p(n^{-2rk})\\ &{}\qquad =o_p(n^{-1/2}). \end{array} \end{aligned}$$

Thus the lemma follows since \(P_n \int [\dot{l}_\psi -\dot{l}_{\Lambda }[h_n^*] ]d{H}_0(x)\) is the efficient score function used to obtain \(\hat{\psi }_n^*\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, P., Tong, X. & Sun, J. A semiparametric regression cure model for doubly censored data. Lifetime Data Anal 24, 492–508 (2018). https://doi.org/10.1007/s10985-017-9406-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-017-9406-3

Keywords

Navigation