Skip to main content

Advertisement

Log in

A classification tree approach for the modeling of competing risks in discrete time

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Cause-specific hazard models are a popular tool for the analysis of competing risks data. The classical modeling approach in discrete time consists of fitting parametric multinomial logit models. A drawback of this method is that the focus is on main effects only, and that higher order interactions are hard to handle. Moreover, the resulting models contain a large number of parameters, which may cause numerical problems when estimating coefficients. To overcome these problems, a tree-based model is proposed that extends the survival tree methodology developed previously for time-to-event models with one single type of event. The performance of the method, compared with several competitors, is investigated in simulations. The usefulness of the proposed approach is demonstrated by an analysis of age-related macular degeneration among elderly people that were monitored by annual study visits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Austin PC, Lee DS, Fine JP (2016) Introduction to the analysis of survival data in the presence of competing risks. Circulation 133:601–609

    Article  Google Scholar 

  • Berger M, Schmid M (2018) Semiparametric regression for discrete time-to-event data. Stat Model 18:1–24

    Article  MathSciNet  Google Scholar 

  • Beyersmann J, Allignol A, Schumacher M (2011) Competing risks and multistate models with R. Springer, New York

    MATH  Google Scholar 

  • Binder H, Allignol A, Schumacher M, Beyersmann J (2009) Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25:890–896

    Article  Google Scholar 

  • Bou-Hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE (2009) Discrete-time survival trees. Can J Stat 37:17–32

    Article  MathSciNet  Google Scholar 

  • Bou-Hamad I, Larocque D, Ben-Ameur H (2011) Discrete-time survival trees and forests with time-varying covariates: application to bankruptcy data. Stat Model 11:429–446

    Article  MathSciNet  Google Scholar 

  • Breiman L (1996) Technical note: some properties of splitting criteria. Mach Learn 24:41–47

    MATH  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone JC (1984) Classification and regression trees. Wadsworth, Monterey

    MATH  Google Scholar 

  • Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 241–256

    Chapter  Google Scholar 

  • Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Discov 24:136–158

    Article  MathSciNet  Google Scholar 

  • Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc Series B 34:187–220

    MATH  Google Scholar 

  • Doove LL, Dusseldorp E, Deun KV, Mechelen IV (2014) A comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment–subgroup interactions. Adv Data Anal Classif 8:403–425

    Article  MathSciNet  Google Scholar 

  • Ferri C, Flach PA, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: Lavrač N, Blockeel DGH, Todorovski L (eds) European conference on machine learning. Springer, Berlin, pp 121–132

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Hoens TR, Qian Q, Chawla NV, Zhou ZH (2012) Building decision trees for the multi-class imbalance problem. In: Tan P, Chawla S, Ho C, Bailey J (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 122–134

    Chapter  Google Scholar 

  • Ibrahim NA, Kudus A, Daud I, Bakar MRA (2008) Decision tree for competing risks survival probability in breast cancer study. Int J Biol Med Sci 3:25–29

    Google Scholar 

  • Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM (2014) Random survival forests for competing risks. Biostatistics 15:757–773

    Article  Google Scholar 

  • Janitza S, Tutz G (2015) Prediction models for time discrete competing risks. Ludwig-Maximilians-Universität München, Department of Statistics Technical Report, p 177

  • Lau B, Cole SR, Gange SJ (2009) Competing risk regression models for epidemiologic data. Am J Epidemiol 170:244–256

    Article  Google Scholar 

  • Luo S, Kong X, Nie T (2016) Spline based survival model for credit risk modeling. Eur J Oper Res 253:869–879

    Article  MathSciNet  Google Scholar 

  • Meggiolaro S, Giraldo A, Clerici R (2017) A multilevel competing risks model for analysis of university students’ careers in italy. Stud High Educ 42:1259–1274

    Article  Google Scholar 

  • Mingers J (1989) An empirical comparison of pruning methods for decision tree induction. Mach Learn 4:227–243

    Article  Google Scholar 

  • Möst S, Pößnecker W, Tutz G (2016) Variable selection for discrete competing risks models. Qual Quant 50:1589–1610

    Article  Google Scholar 

  • Pößnecker W (2014) MRSP: multinomial response models with structured penalties. R package version 0.4.3. http://CRAN.R-project.org/package=MRSP

  • Prentice RL, Kalbfleisch JD, Peterson AV Jr, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554

    Article  Google Scholar 

  • Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52:199–215

    Article  Google Scholar 

  • Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389–2430

    Article  MathSciNet  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  • Ripley BD (1996) Pattern recognition and neural networks. University Press, Cambridge

    Book  Google Scholar 

  • Schmid M, Küchenhoff H, Hörauf A, Tutz G (2016) A survival tree method for the analysis of discrete event times in clinical and epidemiological studies. Stat Med 35:734–751

    Article  MathSciNet  Google Scholar 

  • Schmid M, Tutz G, Welchowski T (2018) Discrimination measures for discrete time-to-event predictions. Econ Stat 7:153–164

    MathSciNet  Google Scholar 

  • Steinberg JS, Göbel AP, Thiele S, Fleckenstein M, Holz FG, Schmitz-Valckenberg S (2016) Development of intraretinal cystoid lesions in eyes with intermediate age-related macular degeneration. Retina 36:1548–1556

    Article  Google Scholar 

  • Tapak L, Saidijam M, Sadeghifar M, Poorolajal J, Mahjub H (2015) Competing risks data analysis with high-dimensional covariates: an application in bladder cancer. Genomics Proteomics Bioinformatics 13:169–176

    Article  Google Scholar 

  • Tutz G (1995) Competing risks models in discrete time with nominal or ordinal categories of response. Qual Quant 29:405–420

    Article  Google Scholar 

  • Tutz G (2012) Regression for categorical data. University Press, Cambridge

    MATH  Google Scholar 

  • Tutz G, Schmid M (2016) Modeling discrete time-to-event data. Springer, New York

    Book  Google Scholar 

  • Tutz G, Pößnecker W, Uhlmann L (2015) Variable selection in general multinomial logit models. Comput Stat Data Anal 82:207–222

    Article  MathSciNet  Google Scholar 

  • Vallejos CA, Steel MFJ (2017) Bayesian survival modelling of university outcomes. J R Stat Soc Series A Stat Soc 180:613–631

    Article  MathSciNet  Google Scholar 

  • Welchowski T, Schmid M (2017) discSurv: discrete time survival analysis. R package version 1.1.7. http://CRAN.R-project.org/package=discSurv

  • Xu W, Che J, Kong Q (2016) Recursive partitioning method on competing risk outcomes. Cancer Inform 15:CIN–S39364

  • Yee TW (2010) The VGAM package for categorical data analysis. J Stat Softw 32:1–34

    Article  Google Scholar 

  • Yee TW (2017) VGAM: vector generalized linear and additive models. R package version 1.0-4. https://CRAN.R-project.org/package=VGAM

  • Zahid FM, Tutz G (2013) Multinomial logit models with implicit variable selection. Adv Data Anal Classif 7:393–416

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Support by the German Research Foundation (DFG), Grant SCHM 2966/1-2 and SCHM 2966/2-1, is gratefully acknowledged. The MODIAMD study is funded by the German Ministry of Education and Research (BMBF), Funding Number 13N10349.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moritz Berger.

Appendix: Further simulation results

Appendix: Further simulation results

See Figs. 9 and 10.

Fig. 9
figure 9

Results of the simulation study. The boxplots visualize the predictive log-likelihood values obtained from the various tree-based approaches for the six scenarios with \(n=500\). Dark gray boxplots refer to the results with splitting by Gini impurity (GI), light gray boxplots refer to the results with splitting by Hellinger distance (HD). High values of the predictive log-likelihood correspond to good model fits, and vice versa

Fig. 10
figure 10

Results of the simulation study. The boxplots visualize the predictive log-likelihood values obtained from various modeling approaches for the six scenarios with \(n=500\). The first two boxplots (GI and HD) obtained from the tree-based models refer to the results with tuning by the predictive log-likelihood (ll), respectively. The sixth boxplot in each of the six panels contains the true log-likelihood values of the 100 test data sets (True), based on the true hazards defined in (18). Dashed lines refer to the median values of the best-performing tree-based model

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berger, M., Welchowski, T., Schmitz-Valckenberg, S. et al. A classification tree approach for the modeling of competing risks in discrete time. Adv Data Anal Classif 13, 965–990 (2019). https://doi.org/10.1007/s11634-018-0345-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0345-y

Keywords

Mathematics Subject Classification

Navigation