Skip to main content

Empirical Study on High-Dimensional Variable Selection and Prediction Under Competing Risks

  • Chapter
  • First Online:
New Frontiers of Biostatistics and Bioinformatics

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

  • 1071 Accesses

Abstract

Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include (1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and (2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. We study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Beyersmann, J., Dettenkofer, M., Bertz, H., & Schumacher, M. (2007). A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine, 26(30), 5360–5369.

    Article  MathSciNet  Google Scholar 

  • Beyersmann, J., Latouche, A., Buchholz, A., & Schumacher, M. (2009). Simulating competing risks data in survival analysis. Statistics in Medicine, 28(6), 956–971.

    Article  MathSciNet  Google Scholar 

  • Binder, H., Allignol, A., Schumacher, M., & Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25(7), 890–896.

    Article  Google Scholar 

  • Bradic, J., Fan, J., & Jiang, J. (2011). Regularization for Cox’s proportional hazards model with np-dimensionality. Annals of Statistics, 39(6), 3092.

    Article  MathSciNet  Google Scholar 

  • Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics, 30, 89–99.

    Article  Google Scholar 

  • Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34(2), 559–583.

    Article  MathSciNet  Google Scholar 

  • Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin: Springer.

    Book  Google Scholar 

  • Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.

    Article  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fine, J. P., & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446), 496–509.

    Article  MathSciNet  Google Scholar 

  • Fleming, T. R., & Harrington, D. P. (2011). Counting processes and survival analysis (Vol. 169). Hoboken: John Wiley & Sons.

    MATH  Google Scholar 

  • Freund, Y., & Schapire, R. E. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.

    Article  MathSciNet  Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

    Article  MathSciNet  Google Scholar 

  • Fu, Z., Parikh, C. R., & Zhou, B. (2016). Penalized variable selection in competing risks regression. Lifetime Data Analysis, 23, 353–376. https://doi.org/10.1007/s10985-016-9362-3.

    Article  MathSciNet  Google Scholar 

  • Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4(4), 2150–2180.

    Article  MathSciNet  Google Scholar 

  • Geskus, R. B. (2011). Cause-specific cumulative incidence estimation and the Fine-Gray model under both left truncation and right censoring. Biometrics, 67(1), 39–49.

    Article  MathSciNet  Google Scholar 

  • Geskus, R. B. (2016). Data analysis with competing risks and intermediate states. Boca Raton, FL: Taylor & Francis Group, LLC.

    Google Scholar 

  • Gray, R. J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics, 16(3), 1141–1154.

    Article  MathSciNet  Google Scholar 

  • Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3):355–373.

    Article  Google Scholar 

  • Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data (Vol. 360). Hoboken: John Wiley & Sons.

    MATH  Google Scholar 

  • Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.

    Article  Google Scholar 

  • Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.

    Article  MathSciNet  Google Scholar 

  • Mukherjee, R., Pillai, N. S., & Lin, X. (2015). Hypothesis testing for high-dimensional sparse binary regression. Annals of Statistics, 43(1), 352.

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.

    Article  Google Scholar 

  • Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society, Series B, 63(2), 411–423.

    Article  MathSciNet  Google Scholar 

  • Verweij, P. J. M., & Van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24), 2305–2314.

    Article  Google Scholar 

  • Xu, R., Vaida, F., & Harrington, D. P. (2009). Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 19, 819–842.

    MathSciNet  MATH  Google Scholar 

  • Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.

    Article  MathSciNet  Google Scholar 

  • Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika, 94(3), 691–703.

    Article  MathSciNet  Google Scholar 

  • Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7(Nov), 2541–2563.

    MathSciNet  MATH  Google Scholar 

  • Zheng, C., Dai, R., Hari, P. N., & Zhang, M.-J. (2017). Instrumental variable with competing risk model. Statistics in Medicine, 36, 1240–1255.

    Article  MathSciNet  Google Scholar 

  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hou, J., Xu, R. (2018). Empirical Study on High-Dimensional Variable Selection and Prediction Under Competing Risks. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_21

Download citation

Publish with us

Policies and ethics