Advances in Data Analysis and Classification

, Volume 13, Issue 4, pp 965–990 | Cite as

A classification tree approach for the modeling of competing risks in discrete time

  • Moritz BergerEmail author
  • Thomas Welchowski
  • Steffen Schmitz-Valckenberg
  • Matthias Schmid
Regular Article


Cause-specific hazard models are a popular tool for the analysis of competing risks data. The classical modeling approach in discrete time consists of fitting parametric multinomial logit models. A drawback of this method is that the focus is on main effects only, and that higher order interactions are hard to handle. Moreover, the resulting models contain a large number of parameters, which may cause numerical problems when estimating coefficients. To overcome these problems, a tree-based model is proposed that extends the survival tree methodology developed previously for time-to-event models with one single type of event. The performance of the method, compared with several competitors, is investigated in simulations. The usefulness of the proposed approach is demonstrated by an analysis of age-related macular degeneration among elderly people that were monitored by annual study visits.


Discrete time-to-event data Competing risks Recursive partitioning Cause-specific hazards Regression modeling 

Mathematics Subject Classification

62N01 62N02 62P10 62-07 



Support by the German Research Foundation (DFG), Grant SCHM 2966/1-2 and SCHM 2966/2-1, is gratefully acknowledged. The MODIAMD study is funded by the German Ministry of Education and Research (BMBF), Funding Number 13N10349.


  1. Austin PC, Lee DS, Fine JP (2016) Introduction to the analysis of survival data in the presence of competing risks. Circulation 133:601–609CrossRefGoogle Scholar
  2. Berger M, Schmid M (2018) Semiparametric regression for discrete time-to-event data. Stat Model 18:1–24MathSciNetCrossRefGoogle Scholar
  3. Beyersmann J, Allignol A, Schumacher M (2011) Competing risks and multistate models with R. Springer, New YorkzbMATHGoogle Scholar
  4. Binder H, Allignol A, Schumacher M, Beyersmann J (2009) Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25:890–896CrossRefGoogle Scholar
  5. Bou-Hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE (2009) Discrete-time survival trees. Can J Stat 37:17–32MathSciNetCrossRefGoogle Scholar
  6. Bou-Hamad I, Larocque D, Ben-Ameur H (2011) Discrete-time survival trees and forests with time-varying covariates: application to bankruptcy data. Stat Model 11:429–446MathSciNetCrossRefGoogle Scholar
  7. Breiman L (1996) Technical note: some properties of splitting criteria. Mach Learn 24:41–47zbMATHGoogle Scholar
  8. Breiman L, Friedman JH, Olshen RA, Stone JC (1984) Classification and regression trees. Wadsworth, MontereyzbMATHGoogle Scholar
  9. Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 241–256CrossRefGoogle Scholar
  10. Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Discov 24:136–158MathSciNetCrossRefGoogle Scholar
  11. Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc Series B 34:187–220zbMATHGoogle Scholar
  12. Doove LL, Dusseldorp E, Deun KV, Mechelen IV (2014) A comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment–subgroup interactions. Adv Data Anal Classif 8:403–425MathSciNetCrossRefGoogle Scholar
  13. Ferri C, Flach PA, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: Lavrač N, Blockeel DGH, Todorovski L (eds) European conference on machine learning. Springer, Berlin, pp 121–132Google Scholar
  14. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  15. Hoens TR, Qian Q, Chawla NV, Zhou ZH (2012) Building decision trees for the multi-class imbalance problem. In: Tan P, Chawla S, Ho C, Bailey J (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 122–134CrossRefGoogle Scholar
  16. Ibrahim NA, Kudus A, Daud I, Bakar MRA (2008) Decision tree for competing risks survival probability in breast cancer study. Int J Biol Med Sci 3:25–29Google Scholar
  17. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM (2014) Random survival forests for competing risks. Biostatistics 15:757–773CrossRefGoogle Scholar
  18. Janitza S, Tutz G (2015) Prediction models for time discrete competing risks. Ludwig-Maximilians-Universität München, Department of Statistics Technical Report, p 177Google Scholar
  19. Lau B, Cole SR, Gange SJ (2009) Competing risk regression models for epidemiologic data. Am J Epidemiol 170:244–256CrossRefGoogle Scholar
  20. Luo S, Kong X, Nie T (2016) Spline based survival model for credit risk modeling. Eur J Oper Res 253:869–879MathSciNetCrossRefGoogle Scholar
  21. Meggiolaro S, Giraldo A, Clerici R (2017) A multilevel competing risks model for analysis of university students’ careers in italy. Stud High Educ 42:1259–1274CrossRefGoogle Scholar
  22. Mingers J (1989) An empirical comparison of pruning methods for decision tree induction. Mach Learn 4:227–243CrossRefGoogle Scholar
  23. Möst S, Pößnecker W, Tutz G (2016) Variable selection for discrete competing risks models. Qual Quant 50:1589–1610CrossRefGoogle Scholar
  24. Pößnecker W (2014) MRSP: multinomial response models with structured penalties. R package version 0.4.3.
  25. Prentice RL, Kalbfleisch JD, Peterson AV Jr, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554CrossRefGoogle Scholar
  26. Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52:199–215CrossRefGoogle Scholar
  27. Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389–2430MathSciNetCrossRefGoogle Scholar
  28. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106Google Scholar
  29. Ripley BD (1996) Pattern recognition and neural networks. University Press, CambridgeCrossRefGoogle Scholar
  30. Schmid M, Küchenhoff H, Hörauf A, Tutz G (2016) A survival tree method for the analysis of discrete event times in clinical and epidemiological studies. Stat Med 35:734–751MathSciNetCrossRefGoogle Scholar
  31. Schmid M, Tutz G, Welchowski T (2018) Discrimination measures for discrete time-to-event predictions. Econ Stat 7:153–164MathSciNetGoogle Scholar
  32. Steinberg JS, Göbel AP, Thiele S, Fleckenstein M, Holz FG, Schmitz-Valckenberg S (2016) Development of intraretinal cystoid lesions in eyes with intermediate age-related macular degeneration. Retina 36:1548–1556CrossRefGoogle Scholar
  33. Tapak L, Saidijam M, Sadeghifar M, Poorolajal J, Mahjub H (2015) Competing risks data analysis with high-dimensional covariates: an application in bladder cancer. Genomics Proteomics Bioinformatics 13:169–176CrossRefGoogle Scholar
  34. Tutz G (1995) Competing risks models in discrete time with nominal or ordinal categories of response. Qual Quant 29:405–420CrossRefGoogle Scholar
  35. Tutz G (2012) Regression for categorical data. University Press, CambridgezbMATHGoogle Scholar
  36. Tutz G, Schmid M (2016) Modeling discrete time-to-event data. Springer, New YorkCrossRefGoogle Scholar
  37. Tutz G, Pößnecker W, Uhlmann L (2015) Variable selection in general multinomial logit models. Comput Stat Data Anal 82:207–222MathSciNetCrossRefGoogle Scholar
  38. Vallejos CA, Steel MFJ (2017) Bayesian survival modelling of university outcomes. J R Stat Soc Series A Stat Soc 180:613–631MathSciNetCrossRefGoogle Scholar
  39. Welchowski T, Schmid M (2017) discSurv: discrete time survival analysis. R package version 1.1.7.
  40. Xu W, Che J, Kong Q (2016) Recursive partitioning method on competing risk outcomes. Cancer Inform 15:CIN–S39364Google Scholar
  41. Yee TW (2010) The VGAM package for categorical data analysis. J Stat Softw 32:1–34CrossRefGoogle Scholar
  42. Yee TW (2017) VGAM: vector generalized linear and additive models. R package version 1.0-4.
  43. Zahid FM, Tutz G (2013) Multinomial logit models with implicit variable selection. Adv Data Anal Classif 7:393–416MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Medical Biometry, Informatics and EpidemiologyUniversity Hospital BonnBonnGermany
  2. 2.University Eye Hospital BonnBonnGermany

Personalised recommendations