Empirical Study on High-Dimensional Variable Selection and Prediction Under Competing Risks

Hou, Jiayi; Xu, Ronghui

doi:10.1007/978-3-319-99389-8_21

Jiayi Hou⁵ &
Ronghui Xu^6,7

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1071 Accesses

Abstract

Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include (1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and (2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. We study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing penalty parameters in each method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beyersmann, J., Dettenkofer, M., Bertz, H., & Schumacher, M. (2007). A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine, 26(30), 5360–5369.
Article MathSciNet Google Scholar
Beyersmann, J., Latouche, A., Buchholz, A., & Schumacher, M. (2009). Simulating competing risks data in survival analysis. Statistics in Medicine, 28(6), 956–971.
Article MathSciNet Google Scholar
Binder, H., Allignol, A., Schumacher, M., & Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25(7), 890–896.
Article Google Scholar
Bradic, J., Fan, J., & Jiang, J. (2011). Regularization for Cox’s proportional hazards model with np-dimensionality. Annals of Statistics, 39(6), 3092.
Article MathSciNet Google Scholar
Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics, 30, 89–99.
Article Google Scholar
Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34(2), 559–583.
Article MathSciNet Google Scholar
Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin: Springer.
Book Google Scholar
Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.
Article Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Article MathSciNet Google Scholar
Fine, J. P., & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446), 496–509.
Article MathSciNet Google Scholar
Fleming, T. R., & Harrington, D. P. (2011). Counting processes and survival analysis (Vol. 169). Hoboken: John Wiley & Sons.
MATH Google Scholar
Freund, Y., & Schapire, R. E. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Article MathSciNet Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Article MathSciNet Google Scholar
Fu, Z., Parikh, C. R., & Zhou, B. (2016). Penalized variable selection in competing risks regression. Lifetime Data Analysis, 23, 353–376. https://doi.org/10.1007/s10985-016-9362-3.
Article MathSciNet Google Scholar
Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4(4), 2150–2180.
Article MathSciNet Google Scholar
Geskus, R. B. (2011). Cause-specific cumulative incidence estimation and the Fine-Gray model under both left truncation and right censoring. Biometrics, 67(1), 39–49.
Article MathSciNet Google Scholar
Geskus, R. B. (2016). Data analysis with competing risks and intermediate states. Boca Raton, FL: Taylor & Francis Group, LLC.
Google Scholar
Gray, R. J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics, 16(3), 1141–1154.
Article MathSciNet Google Scholar
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3):355–373.
Article Google Scholar
Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data (Vol. 360). Hoboken: John Wiley & Sons.
MATH Google Scholar
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.
Article Google Scholar
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.
Article MathSciNet Google Scholar
Mukherjee, R., Pillai, N. S., & Lin, X. (2015). Hypothesis testing for high-dimensional sparse binary regression. Annals of Statistics, 43(1), 352.
Article MathSciNet Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.
MathSciNet MATH Google Scholar
Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.
Article Google Scholar
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society, Series B, 63(2), 411–423.
Article MathSciNet Google Scholar
Verweij, P. J. M., & Van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24), 2305–2314.
Article Google Scholar
Xu, R., Vaida, F., & Harrington, D. P. (2009). Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 19, 819–842.
MathSciNet MATH Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Article MathSciNet Google Scholar
Zhang, H. H., & Lu, W. (2007). Adaptive lasso for Cox’s proportional hazards model. Biometrika, 94(3), 691–703.
Article MathSciNet Google Scholar
Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7(Nov), 2541–2563.
MathSciNet MATH Google Scholar
Zheng, C., Dai, R., Hari, P. N., & Zhang, M.-J. (2017). Instrumental variable with competing risk model. Statistics in Medicine, 36, 1240–1255.
Article MathSciNet Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Altman Clinical and Translational Research Institute, University of California, San Diego, La Jolla, CA, USA
Jiayi Hou
Department of Family Medicine and Public Health, University of California, San Diego, La Jolla, CA, USA
Ronghui Xu
Department of Mathematics, University of California, San Diego, La Jolla, CA, USA
Ronghui Xu

Authors

Jiayi Hou
View author publications
You can also search for this author in PubMed Google Scholar
Ronghui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
Yichuan Zhao
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Ding-Geng Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hou, J., Xu, R. (2018). Empirical Study on High-Dimensional Variable Selection and Prediction Under Competing Risks. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-99389-8_21
Published: 06 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99388-1
Online ISBN: 978-3-319-99389-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics