Abstract
An approximate likelihood approach is developed for regression analysis of censored competing-risks data. This approach models directly the cumulative incidence function, instead of the cause-specific hazard function, in terms of explanatory covariates under a proportional subdistribution hazards assumption. It uses a self-consistent iterative procedure to maximize an approximate semiparametric likelihood function, leading to an asymptotically normal and efficient estimator of the vector of regression parameters. Simulation studies demonstrate its advantages over previous methods.
Similar content being viewed by others
References
Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes., Springer series in statisticsSpringer, New York. doi:10.1007/978-1-4612-4348-9
Bellman RE (1964) Perturbation techniques in mathematics, engineering and physics. Holt, Rinehart & Winston, New York
Beyersmann J, Scheike TH (2013) Classical regression models for competing risks. In: Klein JP, van Houwelingen HC, Ibrahim JG, Scheike TH (eds) Handbook of survival analysis, handbooks of modern statistical methods. Chapman and Hall/CRC, Boca Raton, pp 157–177
Binder N, Gerds TA, Andersen PK (2014) Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal 20(2):303–315. doi:10.1007/s10985-013-9247-7
Coddington EA, Levinson N (1955) Theory of ordinary differential equations. McGraw-Hill Book Co., New York
Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94(446):496–509. doi:10.2307/2670170
Geskus RB (2011) Cause-specific cumulative incidence estimation and the Fine and Gray model under both left truncation and right censoring. Biometrics 67(1):39–49. doi:10.1111/j.1541-0420.2010.01420.x
Gorfine M, Zucker DM, Hsu L (2009) Case-control survival analysis with a general semiparametric shared frailty model-a pseudo full likelihood approach. Ann Stat 37(3):1489
Graw F, Gerds TA, Schumacher M (2009) On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal 15(2):241–255. doi:10.1007/s10985-008-9107-z
Gray RJ (1988) A class of \(K\)-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 16(3):1141–1154. doi:10.1214/aos/1176350951
Kay R (1977) Proportional hazard regression models and the analysis of censored survival data. J Roy Statist Soc Ser C 26(3):227–237, jstor.org/stable/2346962
Khalil H (2002) Nonlinear systems, 2nd edn. Prentice Hall, Upper Saddle River
Klein J, Shu Y (2002) Multi-state models for bone marrow transplantation studies. Stat Methods Med Res 11(2):117–139. doi:10.1191/0962280202sm277ra
Klein JP, Andersen PK (2005) Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics 61(1):223–229. doi:10.1111/j.0006-341X.2005.031209.x
Klein JP, Gerster M, Andersen PK, Tarima S, Perme MP (2008) SAS and R functions to compute pseudo-values for censored data regression. Comp Methods Progr Biomed 89(3):289–300. doi:10.1016/j.cmpb.2007.11.017
Lai TL, Ying Z (1988) Stochastic integrals of empirical-type processes with applications to censored regression. J Multivar Anal 27(2):334–358
Lai TL, Ying Z (1991) Large sample theory of a modified Buckley–James estimator for regression analysis with censored data. Ann Stat 19(3):1370–1402
Lai TL, Ying Z (1992) Asymptotically efficient estimation in censored and truncated regression models. Stat Sin 2(1):17–46
Lai TL, Ying Z (1994) A missing information principle and \(M\)-estimators in regression analysis with censored and truncated data. Ann Stat 22(3):1222–1255
Lang S (1968) Analysis I. Addison-Wesley Publishing Co., Boston
Logan BR, Wang T (2013) Pseudo-value regression models. In: Klein JP, van Houwelingen HC, Ibrahim JG, Scheike TH (eds) Handbook of survival analysis, handbooks of modern statistical methods. Chapman and Hall/CRC, Boca Raton, pp 199–219
Martinussen T, Scheike TH, Zucker DM (2011) The Aalen additive gamma frailty hazards model. Biometrika 98(4):831–843
Scheike TH, Zhang MJ, Gerds TA (2008) Predicting cumulative incidence probability by direct binomial regression. Biometrika 95(1):205–220. doi:10.1093/biomet/asm096
van der Vaart AW (1998) Asymptotic statistics, cambridge series in statistical and probabilistic mathematics, vol 3. Cambridge University Press, Cambridge. doi:10.1017/CBO9780511802256
Zeng D, Lin DY (2007) Maximum likelihood estimation in semiparametric regression models with censored data. J R Stat Soc Ser B 69(4):507–564, doi:10.1111/j.1369-7412.2007.00606.x, with discussion and a reply by the authors
Zeng D, Lin DY (2010) A general asymptotic theory for maximum likelihood stimation in semiparametric regression models with censored data. Stat Sin 20(2):871–910
Zucker DM (2005) A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J Am Stat Assoc 100(472):1264–1277
Zucker DM, Gorfine M, Hsu L (2008) Pseudo-full likelihood estimation for prospective survival analysis with a general semiparametric shared frailty model: Asymptotic theory. J Stat Plan Inference 138(7):1998–2016
Acknowledgments
We thank the reviewers for their careful reading of the paper and valuable comments that led to its substantial improvement. The research of the second author was supported by National Science Foundation grant DMS 1407828 and National Cancer Institute Grant 1 P30 CA124435.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Lemma 1
To begin with, we have applied at the beginning of Sect. 3.2 the theory of integral equations to show the existence of \((\widetilde{A}_j,\widetilde{F}_j)\), \(j=1,2\). We now use arguments involving exponential inequalities of empirical processes to relate quantities like \({\hat{A}}_j(t;B)\) and \({\hat{F}}_j(s-;Z_i,B)\) to their population versions, similar to those in Lai & Ying (1991, Appendix A; 1994, Sect. 4.2) to show that there exists \(\rho >0\) such that
where Dg denotes the Jacobian matrix of the partial derivatives of g with respect to the components of B and \(||C|| = \max |c_{ij}|\) is the max-norm of a matrix \(C=(c_{ij})\). A key tool in these arguments is provided by the aforementioned empirical process theory that enables us to approximate the averages of random functions by their expectations. Another important tool is to introduce the smooth weight function \(p_n\) that satisfies (C1) and (C2) to circumvent difficulties when the denominator \(1-{\hat{F}}_+(s-;Z_i,B)\) becomes too small. The smoothness of \({\hat{S}}_j\) and \({\hat{A}}_j\) makes the arguments considerably simpler than those in Lai and Ying (1991, (1994) since we can differentiate them with respect to the components of B and use assumptions (12) and (C2) to bound the derivatives. Before analyzing the partial derivatives in \(D{\hat{A}}_j - D\widetilde{A}_j\), we first prove the result for \({\hat{A}}_j - \widetilde{A}_j\) in (31) and the corresponding result for \({\hat{F}}_j - \widetilde{F}_j\). Recalling that \(\pi _n(u)=p_n(u)/u\), our basic idea is to identify \({\hat{A}}_j(t;B)\) in (8) and \({\hat{F}}_j(t-;Z, B)\) as sample versions of (16) that replaces \(E\{p_n(\cdots )dN_j(s)\}\) in (16) by
and \(E\{I_{\{T\ge s\}} \cdots \exp (b_j^T Z(s))\} \) in (16) by \(n^{-1}{\hat{S}}_j(s;B)\) defined in (9), in which \({\hat{F}}_j(t-;Z_i, B)\) corresponds to the second equation in (16) with \(\widetilde{A}_j\) replaced by \({\hat{A}}_j\). Perturbation analysis (Bellman 1964; Khalil 2002) of the system of integral equations (16) together with exponential inequalities of empirical processes (Lai and Ying 1988, 1991) can then be used to show uniform convergence of these sample versions of (16) to their population counterparts \(\widetilde{A}_j\) and \(\widetilde{F}_j\) at rate \(O_p(n^{-1/2})\). Making use of this and the exponential inequalities for empirical processes again, we can replace sample means in the integrands and integrators in the stochastic integral representation (35) of \(n^{-1/2}U_j(B)\), thereby proving the uniform convergence of \(n^{-1} U_j(b_1, b_2) - \widetilde{U}_j(b_1, b_2)\) in (19). Since \(U_j(B)\) and \(\widetilde{U}_j(B)\) are smooth functions of the components of B, \(D{\hat{A}}_j - D\widetilde{A}_j\) and \(D{\hat{F}}_j - D\widetilde{F}_j\) can be analyzed similarly, thereby proving (31), which can then be used to prove the corresponding result in \((\partial /\partial b_n)U_j(b_1, b_2)/n\) in (19). \(\square \)
Proof of Lemma 2
Let \(\hat{p}_n(t-;Z_i)=p_n(1-{\hat{F}}_+(t-;Z_i,B_0))\) and define \(p_n(t-;Z_i)\) similarly with \({\hat{F}}_+\) replaced by \(F_+\). Note that
To simplify the arguments and highlights the main ideas, we focus on the case where \(Z_i\) does not vary with time, for which \(1-F_j(t;Z_i)=\exp (-e^{\beta _j^\top Z_i}A_j(t))\), thus expressing \(F_j(\cdot ;Z_i)\) as a smooth increasing function of \(A_j(\cdot )\). In this case, the right-hand side of (32) has two terms, the first of which is
The second summand in (33) can be written as \(n^{-1/2}\sum _{i=1}^n(\epsilon _{n,ij}(t)/S_j(t))dM_{ij}(t)\), in which \(\epsilon _{n,ij}(t)\) is a predictable process such that \(\max _{t<\tau , 1 \le i \le n}|\epsilon _{n,ij}(t)|/p_n(t-;Z_i)\mathop {\longrightarrow }\limits ^{P}0\). Similarly, the second term on the right-hand side of (32) can be expressed as
in which \(\bar{j}\ne j\), \(\delta _{n,j}(t)\) is a predictable process such that \(\int ^{\tau }_0(|\delta _{n,j}(t)|/S_j(t))dA_j(t)\mathop {\longrightarrow }\limits ^{P}0\), and \(\psi _{jj}(t)dA_j(t)/S_j(t)\) and \(\psi _{\bar{j}j}(t)dA_j(t)/S_j(t)\) are the (j, j)th and \((\bar{j},j)\)th elements of \(d\varPsi _n(t)\), respectively. This follows from applying the first-order Taylor expansion to the summands and then the law of large numbers to the sum in the second term on the right-hand side of (32).
Expressing \(1-F_j(\cdot ;Z_i)\) as a function of \(A_j(\cdot )\) as noted at the beginning of last paragraph, we can take the derivative of the quotient in (32) with respect to \(A_j\), and also with respect to \(A_{\bar{j}}\) because the denominator involves both \(F_j\) and \(F_{\bar{j}}\), explaining the formula in (23) in which the expectation is a consequence of the law of large numbers. Replacing the \(o_p(1)\) term in (24) by
the solution (25) of (24) follows from Theorem II.6.3 in Andersen et al. (1993).
For time-varying \(Z_i\), we can modify the preceding argument. Although the basic idea is the same, the technical details are more complicated as the Taylor expansion has to be carried out directly instead of by differentiation with respect to \(A_j\); see (34) below for example. Since \(dN_{i1}(t)dN_{i2}(t)=0\), \(\langle M_{i1}, M_{i2} \rangle =0\), i.e., \(M_{i1}\) and \(M_{i2}\) are orthogonal martingales, and therefore we can apply the martingale central limit theorem (Andersen et al. 1993) and (25) to conclude that (26) converges weakly to a Gaussian martingale in \(D^k[0, \tau ]\). \(\square \)
Proof of Lemma 3
Since \({\hat{F}}_j\) is defined by replacing \(A_j\) in (5) by \({\hat{A}}_j\), it follows that
We next combine this with (24) and apply Taylor’s theorem and the law of large numbers for the i.i.d. \((Z_i, C_i)\) to decompose \(n^{1/2}U_j(B_0)\) into six martingales summands plus a \(o_p(1)\) remainder.
From (13), (24) and (5), it follows that \(n^{-1/2}U_j(B_0)\) can be written as
Making use of (24), it can be shown similarly that the third summand in (35) can be expressed as
Replacing \((j,\bar{j})\) by \((\bar{j},j)\) and \(\eta _{ij}\) by \(\phi _{ij}\) in (36), the fourth summand in (35) can be expressed similarly. The terms inside the fifth and the sixth summands in (35) can also be replaced by the corresponding expected values, again using the law of large numbers. Hence Lemma 3 follows from (35) and the weak convergence of (26) to a Gaussian martingale. \(\square \)
Proof of Theorem 2
Although the nuisance parameters \(F_1\) and \(F_2\) in the semiparametric likelihood (7) are infinite-dimensional, we can use parametric submodels such that the information bound for estimating B in the parametric submodel is attained by \({\hat{B}}\), proving its asymptotic efficiency which is formulated in Theorem 2 in terms of the Hájek convolution theorem; see Andersen et al. (1993, Sections VIII.1.3 and VIII.2.4) and Vaart (1998, Chap. 25). Moreover, standard likelihood theory for the least favorable parametric submodel shows that \(V=-\varSigma \). In fact, if we impose the stronger assumptions that (a) the support \({\mathcal {Z}}\) of \(Z_i\) is bounded, (b) ess \(\sup C_i=\tau <\infty \), and (c) \(\inf _{z\in {\mathcal {Z}}}\{1-F_1(\tau ,z)-F_2(\tau ,z)\}>0\), then we do not need to introduce the weight function \(p_n(\cdot )\) in (10), which then corresponds to the semiparametric log-likelihood function considered by Zeng and Lin (2007, (2010), whose asymptotic efficiency theory for semiparametric maximum likelihood estimators can be applied to this case. Removing these assumptions requires modification of the likelihood function to address the difficulties caused by small values of \(1-{\hat{F}}_+(T_i-;Z_i,B)\). This is the idea, introduced by Lai and Ying (1991, (1992, (1994), behind the smooth weight function \(p_n(1-{\hat{F}}_+(t-;Z_i,B))\) that is asymptotically negligible and can yield the same information bound as that under the additional restrictive assumptions (a), (b), and (c).
Rights and permissions
About this article
Cite this article
Jin, Y., Lai, T.L. A new approach to regression analysis of censored competing-risks data. Lifetime Data Anal 23, 605–625 (2017). https://doi.org/10.1007/s10985-016-9378-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-016-9378-8