Abstract
Variable selection in the presence of grouped variables is troublesome for competing risks data: while some recent methods deal with group selection only, simultaneous selection of both groups and within-group variables remains largely unexplored. In this context, we propose an adaptive group bridge method, enabling simultaneous selection both within and between groups, for competing risks data. The adaptive group bridge is applicable to independent and clustered data. It also allows the number of variables to diverge as the sample size increases. We show that our new method possesses excellent asymptotic properties, including variable selection consistency at group and within-group levels. We also show superior performance in simulated and real data sets over several competing approaches, including group bridge, adaptive group lasso, and AIC / BIC-based methods.
Similar content being viewed by others
References
Cai J, Fan J, Li R, Zhou H (2005) Variable selection for multivariate failure time data. Biometrika 92:303–316
Commenges D, Andersen PK (1995) Score test of homogeneity for survival data. Lifetime Data Anal 1:145–156
Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty properties. J Am Stat Assoc 30:74–99
Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94:496–509
Fu Z (2015) crrp: Penalized variable selection in competing risks regression. http://CRAN.R-project.org/package=crrp, r package version 1.0
Fu Z, Parikh CR, Zhou B (2016a) Penalized variable selection in competing risks regression. Lifetime Data Anal. doi:10.1007/s10985-016-9362-3
Fu Z, Ma S, Lin H, Parikh CR, Zhou B (2016b) Penalized variable selection for multi-center competing risks data. Stat Biosci. doi:10.1007/s12561-016-9181-9
Gray RJ (1988) A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 16:1141–1154
Ha ID, Lee M, Oh S, Jeong JH, Sylvester R, Lee Y (2014) Variable selection in subdistribution hazard frailty models with competing risks data. Stat Med 33:4590–604
Huang J, Ma S, Xie H, Zhang CH (2009) A group bridge approach for variable selection. Biometrika 96:339–355
Huang J, Li L, Liu Y, Zhao X (2014) Group selection in the Cox model with a diverging number of covariates. Stat Sin 24:1787–1810
Kim HT, Zhang MJ, Woolfrey AE, Martin AS, Chen J, Saber W, Perales MA, Armand P, Eapen M (2016) Donor and recipient sex in allogeneic stem cell transplantation: what really matters. Haematologica 101:1260–1266
Kroger N, Solano C, Wolschke C et al (2016) Antilymphocyte globulin for prevention of chronic graft-versus-host disease. New Engl J Med 374:43–53
Kuk D, Varadhan R (2013) Model selection in competing risks regression. Stat Med 32:3077–3088
Logan B, Zhang MJ, Klein JP (2011) Marginal models for clustered time to event data with competing risks using pseudovalues. Biometrics 67:1–7
Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554
Rubio MT, Labopin M, Blaise D et al (2015) The impact of graft-versus-host disease prophylaxis in reduced-intensity conditioning allogeneic stem cell transplant in acute myeloid leukemia: a study from the acute leukemia working party of the European group for blood and marrow transplantation. Haematologica 100:683–689
Seetharaman I (2013) Consistent bi-level variable selection via composite group bridge penalized regression. Master’s thesis, Kansas State University, KS, USA
Shaw PJ, Kan F, Ahn KW, Spellman SR, Aljurf M, Ayas M et al (2010) Outcomes of pediatric bone marrow transplantation for leukemia and myelodysplasia using matched sibling, mismatched related, or matched unrelated donors. Blood 116:4007–4015
Varadhan R, Kuk D (2015) crrstep: Stepwise covariate selection for the Fine and Gray competing risks regression model. http://CRAN.R-project.org/package=crrstep, r package version 2015-2.1
Wang HJ, Zhou J, Li Y (2013) Variable selection for censored quantile regression. Stat Sin 23:145–167
Wu TT, Wang S (2013) Doubly regularized Cox regression for high-dimensional survival data with group structures. Stat Interface 6:175–186
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67
Zhou B, Fine J, Latouche A, Labopin M (2012) Competing risks regression for clustered data. Biostatistics 13:371–383
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
The US National Cancer Institute (U24CA076518) partially supported this work. The authors would like to thank the Associate Editor and two reviewers for their helpful comments that significantly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
10985_2017_9400_MOESM1_ESM.pdf
Supplementary Materials The proofs of Theorem 2 and Corollary 1 and a list of variables for the bone marrow transplant data are available online. (PDF 60KB).
Appendix
Appendix
To describe the conditions to justify the asymptotics, we define
where \(\mathbf{a}^{\otimes 0}=\mathbf{1},\mathbf{a}^{\otimes 1}=\mathbf{a},\) and \(\mathbf{a}^{\otimes 2}=\mathbf{a}{} \mathbf{a}^T\). Let \(N_{ij}^c(t)=I(C_{ij}\le t)\) and \(\varLambda ^c(t)\) be the cumulative hazard function by treating censored observations as events. Define \(M_{ij}^c(t)=N_{ij}^c(t)-\int _0^tI(X_{ij}\ge u)d\varLambda ^c(u)\). Consider the following conditions:
-
(C1)
\(\int _0^\tau \lambda _{10}(u)du<\infty \) and \(|Z_{ijq}(0)|+\int _0^\tau |dZ_{ijq}(u)|\le M\) for all i, j, q, where M is a positive constant;
-
(C2)
There exists a neighborhood \(\mathcal {B}\) of \({\varvec{\beta }}_0\) and \(\sup _{t\in [0,\tau ],{\varvec{\beta }}\in \mathcal {B}}\Vert \mathbf{S}^{(r)}({\varvec{\beta }},\theta )-\mathbf{s}^{(r)}({\varvec{\beta }},\theta )\Vert _2\) converges in probability to 0 for \(r=0,1,2\). There also exists a matrix \(\varGamma =\varGamma ({\varvec{\beta }}_0)\) such that \(\Vert m^{-1}\sum _{i=1}^mVar(\mathbf{D}_i)-\varGamma \Vert \rightarrow 0\), where
$$\begin{aligned} \begin{aligned} \mathbf{D}_i&=\sum _{j=1}^{n_i}\Big [\int _0^\tau \{\mathbf{Z}_{ij}(u)-\mathbf{e}({\varvec{\beta }}_0,u)\}w_{ij}(u)dM_{ij}({\varvec{\beta }}_0,u)\\&\quad +\int _0^\tau \sum _{i=1}^m\mathbf{q}(u)/\pi (u)dM_{ij}^c(u)\Big ]. \end{aligned} \end{aligned}$$In addition, there exist constants \(C_1\) and \(C_2\) such that \(0<C_1<\kappa _\text {min}(\varGamma )\le \kappa _\text {max}(\varGamma )<C_2<\infty \) for all m, where \(\kappa _\text {min}(\mathbf{A})\) and \(\kappa _\text {max}(\mathbf{A})\) are the minimal and maximal eigenvalues of a matrix \(\mathbf{A}\), respectively;
-
(C3)
For \(r=0,1,2\), \(\mathbf{s}^{(r)}({\varvec{\beta }},t)\)’s are continuous in \({\varvec{\beta }}\in \mathcal {B}\) uniformly in \(t\in [0,\tau ]\) and are bounded on \(\mathcal {B}\times [0,\tau ]\). \({ s}^{(0)}({\varvec{\beta }},t)\) is bounded away from 0 on \(\mathcal {B}\times [0,\tau ]\). The true parameter \(\beta _{j0}\) is bounded away from 0 for \(j\in B_1\). Let \(\varOmega =\int _0^\tau \mathbf{v}({\varvec{\beta }}_0,u){s}^{(0)}({\varvec{\beta }}_0,u)\lambda _{10}(u)du\). There exists \(C_3\) and \(C_4\) such that \(0<C_3<\kappa _\text {min}(\varOmega )\le \kappa _\text {max}(\varOmega )<C_4<\infty \);
-
(C4)
There exists a constant \(C_5\) such that \(\sup _{1\le i\le m}E(D_{ij}^2D_{ij^{'}}^2)\le C_5<\infty \) for all \(1\le j,\ j^{'}\le d_m\). \(C_m^*=\max _j\sum _{k=1}^KI(j\in A_k)\) is bounded;
-
(C5)
\(d_m^4/m\rightarrow 0\);
-
(C6)
\(\sum _{k=1}^{K_1}c_k \left\{ \Big (\sum _{j\in A_k\cap B_1}|\beta _{j0}|^{1-\nu }\Big )^{\gamma -1} \sum _{j\in A_k\cap B_1}1/|{\beta }_{j0}|^\nu \right\} \le M_m\), where \(M_m=O_p(1)\);
-
(C7)
\(\lambda _m/\sqrt{m}\rightarrow 0\), \(\sqrt{m/d_m}\tilde{\beta }_j=O_p(1)\), and \(\min \big (\lambda _m m^{(\nu -1)/2}d_m^{-(1+\nu )/2}\),
\(\lambda _m m^{\gamma (\nu -1)/2}d_m^{-1+\gamma (1-\nu )/2}\big )\rightarrow \infty \).
Conditions (C1)–(C3) are standard conditions for the marginal subdistribution hazards model (Zhou et al. 2012). Conditions (C1)–(C5) are similar to the conditions of Cai et al. (2005) and Huang et al. (2014) so that they guarantee local asymptotic quadratic property of \(l({\varvec{\beta }})\) and the existence of local minimizer of \({\mathcal {L}}_m({\varvec{\beta }})\). Once Conditions (C3) and (C4) are met, Condition (C6) is satisfied if the true non-zero \(\beta _{j0}\)’s are bounded above. Condition (C7) controls \(\lambda _m\) to have the oracle property.
Rights and permissions
About this article
Cite this article
Ahn, K.W., Banerjee, A., Sahr, N. et al. Group and within-group variable selection for competing risks data. Lifetime Data Anal 24, 407–424 (2018). https://doi.org/10.1007/s10985-017-9400-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-017-9400-9