On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models
- 278 Downloads
In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. Their prediction performance, however, highly depends on specific tuning parameters, in particular on the number of boosting iterations to perform. This crucial parameter is usually selected via cross-validation. The cross-validation procedure may highly depend on a completely random component, namely the considered fold partition. We empirically study how much this randomness affects the results of the boosting techniques, in terms of selected predictors and prediction ability of the related models. We use four publicly available data sets related to four different diseases. In these studies, the goal is to predict survival end-points when a large number of continuous candidate predictors are available. We focus on two well known boosting approaches implemented in the R-packages CoxBoost and mboost, assuming the validity of the proportional hazards assumption and the linearity of the effects of the predictors. We show that the variability in selected predictors and prediction ability of the model is reduced by averaging over several repetitions of cross-validation in the selection of the tuning parameters.
KeywordsBoosting Cross-validation Parameter tuning High dimensional data Survival analysis
We thank Rory Wilson and Jenny Lee for language improvements. HS and RDB were supported by Grants BO3139/4-1, BO3139/4-2 and BO3139/2-3 to ALB from the German Research Foundation (DFG).
- Binder H (2013) CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks, R package version 1.4. http://CRAN.R-project.org/package=CoxBoost
- Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning. Morgan Kaufmann, pp 148–156Google Scholar
- Fuchs M, Hornung R, De Bin R, Boulesteix, AL (2013) A U-statistic estimator for the variance of resampling-based error estimators. Technical Report 148, University of MunichGoogle Scholar
- Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner B (2015) mboost: model-based boosting, R package version 2.4-2. http://CRAN.R-project.org/package=mboost
- Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence, pp 1137–1145Google Scholar
- Ridgeway G (1999) Generalization of boosting algorithms and applications of Bayesian inference for massive datasets. PhD thesis, University of WashingtonGoogle Scholar