Skip to main content

Advertisement

Log in

Group and within-group variable selection for competing risks data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Variable selection in the presence of grouped variables is troublesome for competing risks data: while some recent methods deal with group selection only, simultaneous selection of both groups and within-group variables remains largely unexplored. In this context, we propose an adaptive group bridge method, enabling simultaneous selection both within and between groups, for competing risks data. The adaptive group bridge is applicable to independent and clustered data. It also allows the number of variables to diverge as the sample size increases. We show that our new method possesses excellent asymptotic properties, including variable selection consistency at group and within-group levels. We also show superior performance in simulated and real data sets over several competing approaches, including group bridge, adaptive group lasso, and AIC / BIC-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cai J, Fan J, Li R, Zhou H (2005) Variable selection for multivariate failure time data. Biometrika 92:303–316

    Article  MathSciNet  MATH  Google Scholar 

  • Commenges D, Andersen PK (1995) Score test of homogeneity for survival data. Lifetime Data Anal 1:145–156

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty properties. J Am Stat Assoc 30:74–99

    MATH  Google Scholar 

  • Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94:496–509

    Article  MathSciNet  MATH  Google Scholar 

  • Fu Z (2015) crrp: Penalized variable selection in competing risks regression. http://CRAN.R-project.org/package=crrp, r package version 1.0

  • Fu Z, Parikh CR, Zhou B (2016a) Penalized variable selection in competing risks regression. Lifetime Data Anal. doi:10.1007/s10985-016-9362-3

  • Fu Z, Ma S, Lin H, Parikh CR, Zhou B (2016b) Penalized variable selection for multi-center competing risks data. Stat Biosci. doi:10.1007/s12561-016-9181-9

  • Gray RJ (1988) A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 16:1141–1154

    Article  MathSciNet  MATH  Google Scholar 

  • Ha ID, Lee M, Oh S, Jeong JH, Sylvester R, Lee Y (2014) Variable selection in subdistribution hazard frailty models with competing risks data. Stat Med 33:4590–604

    Article  MathSciNet  Google Scholar 

  • Huang J, Ma S, Xie H, Zhang CH (2009) A group bridge approach for variable selection. Biometrika 96:339–355

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Li L, Liu Y, Zhao X (2014) Group selection in the Cox model with a diverging number of covariates. Stat Sin 24:1787–1810

    MathSciNet  MATH  Google Scholar 

  • Kim HT, Zhang MJ, Woolfrey AE, Martin AS, Chen J, Saber W, Perales MA, Armand P, Eapen M (2016) Donor and recipient sex in allogeneic stem cell transplantation: what really matters. Haematologica 101:1260–1266

    Article  Google Scholar 

  • Kroger N, Solano C, Wolschke C et al (2016) Antilymphocyte globulin for prevention of chronic graft-versus-host disease. New Engl J Med 374:43–53

    Article  Google Scholar 

  • Kuk D, Varadhan R (2013) Model selection in competing risks regression. Stat Med 32:3077–3088

    Article  MathSciNet  Google Scholar 

  • Logan B, Zhang MJ, Klein JP (2011) Marginal models for clustered time to event data with competing risks using pseudovalues. Biometrics 67:1–7

    Article  MathSciNet  MATH  Google Scholar 

  • Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554

    Article  MATH  Google Scholar 

  • Rubio MT, Labopin M, Blaise D et al (2015) The impact of graft-versus-host disease prophylaxis in reduced-intensity conditioning allogeneic stem cell transplant in acute myeloid leukemia: a study from the acute leukemia working party of the European group for blood and marrow transplantation. Haematologica 100:683–689

    Article  Google Scholar 

  • Seetharaman I (2013) Consistent bi-level variable selection via composite group bridge penalized regression. Master’s thesis, Kansas State University, KS, USA

  • Shaw PJ, Kan F, Ahn KW, Spellman SR, Aljurf M, Ayas M et al (2010) Outcomes of pediatric bone marrow transplantation for leukemia and myelodysplasia using matched sibling, mismatched related, or matched unrelated donors. Blood 116:4007–4015

    Article  Google Scholar 

  • Varadhan R, Kuk D (2015) crrstep: Stepwise covariate selection for the Fine and Gray competing risks regression model. http://CRAN.R-project.org/package=crrstep, r package version 2015-2.1

  • Wang HJ, Zhou J, Li Y (2013) Variable selection for censored quantile regression. Stat Sin 23:145–167

    MATH  Google Scholar 

  • Wu TT, Wang S (2013) Doubly regularized Cox regression for high-dimensional survival data with group structures. Stat Interface 6:175–186

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou B, Fine J, Latouche A, Labopin M (2012) Competing risks regression for clustered data. Biostatistics 13:371–383

    Article  MATH  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The US National Cancer Institute (U24CA076518) partially supported this work. The authors would like to thank the Associate Editor and two reviewers for their helpful comments that significantly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwang Woo Ahn.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10985_2017_9400_MOESM1_ESM.pdf

Supplementary Materials The proofs of Theorem 2 and Corollary 1 and a list of variables for the bone marrow transplant data are available online. (PDF 60KB).

Appendix

Appendix

To describe the conditions to justify the asymptotics, we define

$$\begin{aligned} \begin{aligned} \ w_{ij}(t)&=\frac{I(C_{ij}\ge T_{ij}\wedge t)G(t)}{G(X_{ij}\wedge t)}\\ \mathbf{q}(u)&=-\lim _{m\rightarrow \infty }\frac{1}{m}\sum _{i=1}^m\sum _{j=1}^{n_i}\int _0^\tau \{\mathbf{Z}_{ij}(t)-\mathbf{e}({\varvec{\beta }}_0,t)\}w_{ij}(t)I(X_{ij}<u\le t)dM_{ij}({\varvec{\beta }}_0,t),\\ \pi (u)&=\lim _{m\rightarrow \infty }\frac{1}{m}\sum _{i=1}^m\sum _{j=1}^{n_i} I(X_{ij}\ge u). \end{aligned} \end{aligned}$$

where \(\mathbf{a}^{\otimes 0}=\mathbf{1},\mathbf{a}^{\otimes 1}=\mathbf{a},\) and \(\mathbf{a}^{\otimes 2}=\mathbf{a}{} \mathbf{a}^T\). Let \(N_{ij}^c(t)=I(C_{ij}\le t)\) and \(\varLambda ^c(t)\) be the cumulative hazard function by treating censored observations as events. Define \(M_{ij}^c(t)=N_{ij}^c(t)-\int _0^tI(X_{ij}\ge u)d\varLambda ^c(u)\). Consider the following conditions:

  1. (C1)

    \(\int _0^\tau \lambda _{10}(u)du<\infty \) and \(|Z_{ijq}(0)|+\int _0^\tau |dZ_{ijq}(u)|\le M\) for all ijq, where M is a positive constant;

  2. (C2)

    There exists a neighborhood \(\mathcal {B}\) of \({\varvec{\beta }}_0\) and \(\sup _{t\in [0,\tau ],{\varvec{\beta }}\in \mathcal {B}}\Vert \mathbf{S}^{(r)}({\varvec{\beta }},\theta )-\mathbf{s}^{(r)}({\varvec{\beta }},\theta )\Vert _2\) converges in probability to 0 for \(r=0,1,2\). There also exists a matrix \(\varGamma =\varGamma ({\varvec{\beta }}_0)\) such that \(\Vert m^{-1}\sum _{i=1}^mVar(\mathbf{D}_i)-\varGamma \Vert \rightarrow 0\), where

    $$\begin{aligned} \begin{aligned} \mathbf{D}_i&=\sum _{j=1}^{n_i}\Big [\int _0^\tau \{\mathbf{Z}_{ij}(u)-\mathbf{e}({\varvec{\beta }}_0,u)\}w_{ij}(u)dM_{ij}({\varvec{\beta }}_0,u)\\&\quad +\int _0^\tau \sum _{i=1}^m\mathbf{q}(u)/\pi (u)dM_{ij}^c(u)\Big ]. \end{aligned} \end{aligned}$$

    In addition, there exist constants \(C_1\) and \(C_2\) such that \(0<C_1<\kappa _\text {min}(\varGamma )\le \kappa _\text {max}(\varGamma )<C_2<\infty \) for all m, where \(\kappa _\text {min}(\mathbf{A})\) and \(\kappa _\text {max}(\mathbf{A})\) are the minimal and maximal eigenvalues of a matrix \(\mathbf{A}\), respectively;

  3. (C3)

    For \(r=0,1,2\), \(\mathbf{s}^{(r)}({\varvec{\beta }},t)\)’s are continuous in \({\varvec{\beta }}\in \mathcal {B}\) uniformly in \(t\in [0,\tau ]\) and are bounded on \(\mathcal {B}\times [0,\tau ]\). \({ s}^{(0)}({\varvec{\beta }},t)\) is bounded away from 0 on \(\mathcal {B}\times [0,\tau ]\). The true parameter \(\beta _{j0}\) is bounded away from 0 for \(j\in B_1\). Let \(\varOmega =\int _0^\tau \mathbf{v}({\varvec{\beta }}_0,u){s}^{(0)}({\varvec{\beta }}_0,u)\lambda _{10}(u)du\). There exists \(C_3\) and \(C_4\) such that \(0<C_3<\kappa _\text {min}(\varOmega )\le \kappa _\text {max}(\varOmega )<C_4<\infty \);

  4. (C4)

    There exists a constant \(C_5\) such that \(\sup _{1\le i\le m}E(D_{ij}^2D_{ij^{'}}^2)\le C_5<\infty \) for all \(1\le j,\ j^{'}\le d_m\). \(C_m^*=\max _j\sum _{k=1}^KI(j\in A_k)\) is bounded;

  5. (C5)

    \(d_m^4/m\rightarrow 0\);

  6. (C6)

    \(\sum _{k=1}^{K_1}c_k \left\{ \Big (\sum _{j\in A_k\cap B_1}|\beta _{j0}|^{1-\nu }\Big )^{\gamma -1} \sum _{j\in A_k\cap B_1}1/|{\beta }_{j0}|^\nu \right\} \le M_m\), where \(M_m=O_p(1)\);

  7. (C7)

    \(\lambda _m/\sqrt{m}\rightarrow 0\), \(\sqrt{m/d_m}\tilde{\beta }_j=O_p(1)\), and \(\min \big (\lambda _m m^{(\nu -1)/2}d_m^{-(1+\nu )/2}\),

    \(\lambda _m m^{\gamma (\nu -1)/2}d_m^{-1+\gamma (1-\nu )/2}\big )\rightarrow \infty \).

Conditions (C1)–(C3) are standard conditions for the marginal subdistribution hazards model (Zhou et al. 2012). Conditions (C1)–(C5) are similar to the conditions of Cai et al. (2005) and Huang et al. (2014) so that they guarantee local asymptotic quadratic property of \(l({\varvec{\beta }})\) and the existence of local minimizer of \({\mathcal {L}}_m({\varvec{\beta }})\). Once Conditions (C3) and (C4) are met, Condition (C6) is satisfied if the true non-zero \(\beta _{j0}\)’s are bounded above. Condition (C7) controls \(\lambda _m\) to have the oracle property.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahn, K.W., Banerjee, A., Sahr, N. et al. Group and within-group variable selection for competing risks data. Lifetime Data Anal 24, 407–424 (2018). https://doi.org/10.1007/s10985-017-9400-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-017-9400-9

Keywords

Navigation