Group and within-group variable selection for competing risks data

Ahn, Kwang Woo; Banerjee, Anjishnu; Sahr, Natasha; Kim, Soyoung

doi:10.1007/s10985-017-9400-9

Group and within-group variable selection for competing risks data

Published: 04 August 2017

Volume 24, pages 407–424, (2018)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Kwang Woo Ahn ORCID: orcid.org/0000-0003-4567-8037¹,
Anjishnu Banerjee¹,
Natasha Sahr¹ &
…
Soyoung Kim¹

658 Accesses
9 Citations
Explore all metrics

Abstract

Variable selection in the presence of grouped variables is troublesome for competing risks data: while some recent methods deal with group selection only, simultaneous selection of both groups and within-group variables remains largely unexplored. In this context, we propose an adaptive group bridge method, enabling simultaneous selection both within and between groups, for competing risks data. The adaptive group bridge is applicable to independent and clustered data. It also allows the number of variables to diverge as the sample size increases. We show that our new method possesses excellent asymptotic properties, including variable selection consistency at group and within-group levels. We also show superior performance in simulated and real data sets over several competing approaches, including group bridge, adaptive group lasso, and AIC / BIC-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive group Lasso for high-dimensional generalized linear models

Article 03 February 2017

Bayesian group selection with non-local priors

Article 27 May 2021

A group VISA algorithm for variable selection

Article 27 August 2014

References

Cai J, Fan J, Li R, Zhou H (2005) Variable selection for multivariate failure time data. Biometrika 92:303–316
Article MathSciNet MATH Google Scholar
Commenges D, Andersen PK (1995) Score test of homogeneity for survival data. Lifetime Data Anal 1:145–156
Article MathSciNet MATH Google Scholar
Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty properties. J Am Stat Assoc 30:74–99
MATH Google Scholar
Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94:496–509
Article MathSciNet MATH Google Scholar
Fu Z (2015) crrp: Penalized variable selection in competing risks regression. http://CRAN.R-project.org/package=crrp, r package version 1.0
Fu Z, Parikh CR, Zhou B (2016a) Penalized variable selection in competing risks regression. Lifetime Data Anal. doi:10.1007/s10985-016-9362-3
Fu Z, Ma S, Lin H, Parikh CR, Zhou B (2016b) Penalized variable selection for multi-center competing risks data. Stat Biosci. doi:10.1007/s12561-016-9181-9
Gray RJ (1988) A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 16:1141–1154
Article MathSciNet MATH Google Scholar
Ha ID, Lee M, Oh S, Jeong JH, Sylvester R, Lee Y (2014) Variable selection in subdistribution hazard frailty models with competing risks data. Stat Med 33:4590–604
Article MathSciNet Google Scholar
Huang J, Ma S, Xie H, Zhang CH (2009) A group bridge approach for variable selection. Biometrika 96:339–355
Article MathSciNet MATH Google Scholar
Huang J, Li L, Liu Y, Zhao X (2014) Group selection in the Cox model with a diverging number of covariates. Stat Sin 24:1787–1810
MathSciNet MATH Google Scholar
Kim HT, Zhang MJ, Woolfrey AE, Martin AS, Chen J, Saber W, Perales MA, Armand P, Eapen M (2016) Donor and recipient sex in allogeneic stem cell transplantation: what really matters. Haematologica 101:1260–1266
Article Google Scholar
Kroger N, Solano C, Wolschke C et al (2016) Antilymphocyte globulin for prevention of chronic graft-versus-host disease. New Engl J Med 374:43–53
Article Google Scholar
Kuk D, Varadhan R (2013) Model selection in competing risks regression. Stat Med 32:3077–3088
Article MathSciNet Google Scholar
Logan B, Zhang MJ, Klein JP (2011) Marginal models for clustered time to event data with competing risks using pseudovalues. Biometrics 67:1–7
Article MathSciNet MATH Google Scholar
Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554
Article MATH Google Scholar
Rubio MT, Labopin M, Blaise D et al (2015) The impact of graft-versus-host disease prophylaxis in reduced-intensity conditioning allogeneic stem cell transplant in acute myeloid leukemia: a study from the acute leukemia working party of the European group for blood and marrow transplantation. Haematologica 100:683–689
Article Google Scholar
Seetharaman I (2013) Consistent bi-level variable selection via composite group bridge penalized regression. Master’s thesis, Kansas State University, KS, USA
Shaw PJ, Kan F, Ahn KW, Spellman SR, Aljurf M, Ayas M et al (2010) Outcomes of pediatric bone marrow transplantation for leukemia and myelodysplasia using matched sibling, mismatched related, or matched unrelated donors. Blood 116:4007–4015
Article Google Scholar
Varadhan R, Kuk D (2015) crrstep: Stepwise covariate selection for the Fine and Gray competing risks regression model. http://CRAN.R-project.org/package=crrstep, r package version 2015-2.1
Wang HJ, Zhou J, Li Y (2013) Variable selection for censored quantile regression. Stat Sin 23:145–167
MATH Google Scholar
Wu TT, Wang S (2013) Doubly regularized Cox regression for high-dimensional survival data with group structures. Stat Interface 6:175–186
Article MathSciNet MATH Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67
Article MathSciNet MATH Google Scholar
Zhou B, Fine J, Latouche A, Labopin M (2012) Competing risks regression for clustered data. Biostatistics 13:371–383
Article MATH Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The US National Cancer Institute (U24CA076518) partially supported this work. The authors would like to thank the Associate Editor and two reviewers for their helpful comments that significantly improved the paper.

Author information

Authors and Affiliations

Division of Biostatistics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
Kwang Woo Ahn, Anjishnu Banerjee, Natasha Sahr & Soyoung Kim

Authors

Kwang Woo Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Anjishnu Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Natasha Sahr
View author publications
You can also search for this author in PubMed Google Scholar
Soyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwang Woo Ahn.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10985_2017_9400_MOESM1_ESM.pdf

Supplementary Materials The proofs of Theorem 2 and Corollary 1 and a list of variables for the bone marrow transplant data are available online. (PDF 60KB).

Appendix

To describe the conditions to justify the asymptotics, we define

$$\begin{aligned} \begin{aligned} \ w_{ij}(t)&=\frac{I(C_{ij}\ge T_{ij}\wedge t)G(t)}{G(X_{ij}\wedge t)}\\ \mathbf{q}(u)&=-\lim _{m\rightarrow \infty }\frac{1}{m}\sum _{i=1}^m\sum _{j=1}^{n_i}\int _0^\tau \{\mathbf{Z}_{ij}(t)-\mathbf{e}({\varvec{\beta }}_0,t)\}w_{ij}(t)I(X_{ij}<u\le t)dM_{ij}({\varvec{\beta }}_0,t),\\ \pi (u)&=\lim _{m\rightarrow \infty }\frac{1}{m}\sum _{i=1}^m\sum _{j=1}^{n_i} I(X_{ij}\ge u). \end{aligned} \end{aligned}$$

where $\mathbf{a}^{\otimes 0}=\mathbf{1},\mathbf{a}^{\otimes 1}=\mathbf{a},$ and $\mathbf{a}^{\otimes 2}=\mathbf{a}{} \mathbf{a}^T$. Let $N_{ij}^c(t)=I(C_{ij}\le t)$ and $\varLambda ^c(t)$ be the cumulative hazard function by treating censored observations as events. Define $M_{ij}^c(t)=N_{ij}^c(t)-\int _0^tI(X_{ij}\ge u)d\varLambda ^c(u)$. Consider the following conditions:

(C1)
$\int _0^\tau \lambda _{10}(u)du<\infty $ and $|Z_{ijq}(0)|+\int _0^\tau |dZ_{ijq}(u)|\le M$ for all i, j, q, where M is a positive constant;
(C2)
There exists a neighborhood $\mathcal {B}$ of ${\varvec{\beta }}_0$ and $\sup _{t\in [0,\tau ],{\varvec{\beta }}\in \mathcal {B}}\Vert \mathbf{S}^{(r)}({\varvec{\beta }},\theta )-\mathbf{s}^{(r)}({\varvec{\beta }},\theta )\Vert _2$ converges in probability to 0 for $r=0,1,2$. There also exists a matrix $\varGamma =\varGamma ({\varvec{\beta }}_0)$ such that $\Vert m^{-1}\sum _{i=1}^mVar(\mathbf{D}_i)-\varGamma \Vert \rightarrow 0$, where
$$\begin{aligned} \begin{aligned} \mathbf{D}_i&=\sum _{j=1}^{n_i}\Big [\int _0^\tau \{\mathbf{Z}_{ij}(u)-\mathbf{e}({\varvec{\beta }}_0,u)\}w_{ij}(u)dM_{ij}({\varvec{\beta }}_0,u)\\&\quad +\int _0^\tau \sum _{i=1}^m\mathbf{q}(u)/\pi (u)dM_{ij}^c(u)\Big ]. \end{aligned} \end{aligned}$$
In addition, there exist constants $C_1$ and $C_2$ such that $0<C_1<\kappa _\text {min}(\varGamma )\le \kappa _\text {max}(\varGamma )<C_2<\infty $ for all m, where $\kappa _\text {min}(\mathbf{A})$ and $\kappa _\text {max}(\mathbf{A})$ are the minimal and maximal eigenvalues of a matrix $\mathbf{A}$, respectively;
(C3)
For $r=0,1,2$, $\mathbf{s}^{(r)}({\varvec{\beta }},t)$’s are continuous in ${\varvec{\beta }}\in \mathcal {B}$ uniformly in $t\in [0,\tau ]$ and are bounded on $\mathcal {B}\times [0,\tau ]$. ${ s}^{(0)}({\varvec{\beta }},t)$ is bounded away from 0 on $\mathcal {B}\times [0,\tau ]$. The true parameter $\beta _{j0}$ is bounded away from 0 for $j\in B_1$. Let $\varOmega =\int _0^\tau \mathbf{v}({\varvec{\beta }}_0,u){s}^{(0)}({\varvec{\beta }}_0,u)\lambda _{10}(u)du$. There exists $C_3$ and $C_4$ such that $0<C_3<\kappa _\text {min}(\varOmega )\le \kappa _\text {max}(\varOmega )<C_4<\infty $;
(C4)
There exists a constant $C_5$ such that $\sup _{1\le i\le m}E(D_{ij}^2D_{ij^{'}}^2)\le C_5<\infty $ for all $1\le j,\ j^{'}\le d_m$. $C_m^*=\max _j\sum _{k=1}^KI(j\in A_k)$ is bounded;
(C5)
$d_m^4/m\rightarrow 0$;
(C6)
$\sum _{k=1}^{K_1}c_k \left\{ \Big (\sum _{j\in A_k\cap B_1}|\beta _{j0}|^{1-\nu }\Big )^{\gamma -1} \sum _{j\in A_k\cap B_1}1/|{\beta }_{j0}|^\nu \right\} \le M_m$, where $M_m=O_p(1)$;
(C7)
$\lambda _m/\sqrt{m}\rightarrow 0$, $\sqrt{m/d_m}\tilde{\beta }_j=O_p(1)$, and $\min \big (\lambda _m m^{(\nu -1)/2}d_m^{-(1+\nu )/2}$,

$\lambda _m m^{\gamma (\nu -1)/2}d_m^{-1+\gamma (1-\nu )/2}\big )\rightarrow \infty $.

Conditions (C1)–(C3) are standard conditions for the marginal subdistribution hazards model (Zhou et al. 2012). Conditions (C1)–(C5) are similar to the conditions of Cai et al. (2005) and Huang et al. (2014) so that they guarantee local asymptotic quadratic property of $l({\varvec{\beta }})$ and the existence of local minimizer of ${\mathcal {L}}_m({\varvec{\beta }})$. Once Conditions (C3) and (C4) are met, Condition (C6) is satisfied if the true non-zero $\beta _{j0}$’s are bounded above. Condition (C7) controls $\lambda _m$ to have the oracle property.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahn, K.W., Banerjee, A., Sahr, N. et al. Group and within-group variable selection for competing risks data. Lifetime Data Anal 24, 407–424 (2018). https://doi.org/10.1007/s10985-017-9400-9

Download citation

Received: 14 June 2016
Accepted: 23 July 2017
Published: 04 August 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10985-017-9400-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group and within-group variable selection for competing risks data

Abstract

Access this article

Similar content being viewed by others

Adaptive group Lasso for high-dimensional generalized linear models

Bayesian group selection with non-local priors

A group VISA algorithm for variable selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

10985_2017_9400_MOESM1_ESM.pdf

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Group and within-group variable selection for competing risks data

Abstract

Access this article

Similar content being viewed by others

Adaptive group Lasso for high-dimensional generalized linear models

Bayesian group selection with non-local priors

A group VISA algorithm for variable selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

10985_2017_9400_MOESM1_ESM.pdf

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation