Abstract
Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include (1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and (2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. We study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beyersmann, J., Dettenkofer, M., Bertz, H., & Schumacher, M. (2007). A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine, 26(30), 5360–5369.
Beyersmann, J., Latouche, A., Buchholz, A., & Schumacher, M. (2009). Simulating competing risks data in survival analysis. Statistics in Medicine, 28(6), 956–971.
Binder, H., Allignol, A., Schumacher, M., & Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25(7), 890–896.
Bradic, J., Fan, J., & Jiang, J. (2011). Regularization for Cox’s proportional hazards model with np-dimensionality. Annals of Statistics, 39(6), 3092.
Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics, 30, 89–99.
Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34(2), 559–583.
Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin: Springer.
Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Fine, J. P., & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446), 496–509.
Fleming, T. R., & Harrington, D. P. (2011). Counting processes and survival analysis (Vol. 169). Hoboken: John Wiley & Sons.
Freund, Y., & Schapire, R. E. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Fu, Z., Parikh, C. R., & Zhou, B. (2016). Penalized variable selection in competing risks regression. Lifetime Data Analysis, 23, 353–376. https://doi.org/10.1007/s10985-016-9362-3.
Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4(4), 2150–2180.
Geskus, R. B. (2011). Cause-specific cumulative incidence estimation and the Fine-Gray model under both left truncation and right censoring. Biometrics, 67(1), 39–49.
Geskus, R. B. (2016). Data analysis with competing risks and intermediate states. Boca Raton, FL: Taylor & Francis Group, LLC.
Gray, R. J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics, 16(3), 1141–1154.
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3):355–373.
Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data (Vol. 360). Hoboken: John Wiley & Sons.
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.
Mukherjee, R., Pillai, N. S., & Lin, X. (2015). Hypothesis testing for high-dimensional sparse binary regression. Annals of Statistics, 43(1), 352.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.
Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society, Series B, 63(2), 411–423.
Verweij, P. J. M., & Van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24), 2305–2314.
Xu, R., Vaida, F., & Harrington, D. P. (2009). Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 19, 819–842.
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika, 94(3), 691–703.
Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7(Nov), 2541–2563.
Zheng, C., Dai, R., Hari, P. N., & Zhang, M.-J. (2017). Instrumental variable with competing risk model. Statistics in Medicine, 36, 1240–1255.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hou, J., Xu, R. (2018). Empirical Study on High-Dimensional Variable Selection and Prediction Under Competing Risks. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-99389-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99388-1
Online ISBN: 978-3-319-99389-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)