Empirical Study on High-Dimensional Variable Selection and Prediction Under Competing Risks

  • Jiayi Hou
  • Ronghui Xu
Part of the ICSA Book Series in Statistics book series (ICSABSS)


Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include (1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and (2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. We study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method.


  1. Beyersmann, J., Dettenkofer, M., Bertz, H., & Schumacher, M. (2007). A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine, 26(30), 5360–5369.MathSciNetCrossRefGoogle Scholar
  2. Beyersmann, J., Latouche, A., Buchholz, A., & Schumacher, M. (2009). Simulating competing risks data in survival analysis. Statistics in Medicine, 28(6), 956–971.MathSciNetCrossRefGoogle Scholar
  3. Binder, H., Allignol, A., Schumacher, M., & Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25(7), 890–896.CrossRefGoogle Scholar
  4. Bradic, J., Fan, J., & Jiang, J. (2011). Regularization for Cox’s proportional hazards model with np-dimensionality. Annals of Statistics, 39(6), 3092.MathSciNetCrossRefGoogle Scholar
  5. Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics, 30, 89–99.CrossRefGoogle Scholar
  6. Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34(2), 559–583.MathSciNetCrossRefGoogle Scholar
  7. Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin: Springer.CrossRefGoogle Scholar
  8. Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.CrossRefGoogle Scholar
  9. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.MathSciNetCrossRefGoogle Scholar
  10. Fine, J. P., & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446), 496–509.MathSciNetCrossRefGoogle Scholar
  11. Fleming, T. R., & Harrington, D. P. (2011). Counting processes and survival analysis (Vol. 169). Hoboken: John Wiley & Sons.zbMATHGoogle Scholar
  12. Freund, Y., & Schapire, R. E. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.MathSciNetCrossRefGoogle Scholar
  13. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.MathSciNetCrossRefGoogle Scholar
  14. Fu, Z., Parikh, C. R., & Zhou, B. (2016). Penalized variable selection in competing risks regression. Lifetime Data Analysis, 23, 353–376. Scholar
  15. Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4(4), 2150–2180.MathSciNetCrossRefGoogle Scholar
  16. Geskus, R. B. (2011). Cause-specific cumulative incidence estimation and the Fine-Gray model under both left truncation and right censoring. Biometrics, 67(1), 39–49.MathSciNetCrossRefGoogle Scholar
  17. Geskus, R. B. (2016). Data analysis with competing risks and intermediate states. Boca Raton, FL: Taylor & Francis Group, LLC.Google Scholar
  18. Gray, R. J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics, 16(3), 1141–1154.MathSciNetCrossRefGoogle Scholar
  19. Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3):355–373.CrossRefGoogle Scholar
  20. Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data (Vol. 360). Hoboken: John Wiley & Sons.zbMATHGoogle Scholar
  21. Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.CrossRefGoogle Scholar
  22. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.MathSciNetCrossRefGoogle Scholar
  23. Mukherjee, R., Pillai, N. S., & Lin, X. (2015). Hypothesis testing for high-dimensional sparse binary regression. Annals of Statistics, 43(1), 352.MathSciNetCrossRefGoogle Scholar
  24. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.MathSciNetzbMATHGoogle Scholar
  25. Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.CrossRefGoogle Scholar
  26. Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society, Series B, 63(2), 411–423.MathSciNetCrossRefGoogle Scholar
  27. Verweij, P. J. M., & Van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24), 2305–2314.CrossRefGoogle Scholar
  28. Xu, R., Vaida, F., & Harrington, D. P. (2009). Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 19, 819–842.MathSciNetzbMATHGoogle Scholar
  29. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.MathSciNetCrossRefGoogle Scholar
  30. Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika, 94(3), 691–703.MathSciNetCrossRefGoogle Scholar
  31. Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7(Nov), 2541–2563.MathSciNetzbMATHGoogle Scholar
  32. Zheng, C., Dai, R., Hari, P. N., & Zhang, M.-J. (2017). Instrumental variable with competing risk model. Statistics in Medicine, 36, 1240–1255.MathSciNetCrossRefGoogle Scholar
  33. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jiayi Hou
    • 1
  • Ronghui Xu
    • 2
    • 3
  1. 1.Altman Clinical and Translational Research InstituteUniversity of California, San DiegoLa JollaUSA
  2. 2.Department of Family Medicine and Public HealthUniversity of California, San DiegoLa JollaUSA
  3. 3.Department of MathematicsUniversity of California, San DiegoLa JollaUSA

Personalised recommendations