Abstract
Development of classification rules is often based on tree methodology. Using data from a diagnostic study where Doppler flow signals were measured to separate between malignant and benign breast tumors I will discuss issues of searching for the cutpoint of continuous variables with a minimal p-value and the necessity to correct this p-value because of multiple testing. Ignoring the correction will strongly favor continuous variables in tree development and may lead to useless trees. I will further investigate the influence of the complexity of a tree by estimating the overoptimism as the difference from the apparent error rates based on the original data to estimated error rates based on 5-fold crossvalidation. Furthermore I consider the use of predefined cutpoint on the development of trees and the resulting error rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altman DG, Lausen B, Sauerbrei W, Schumacher M: Danger of using “optimal” cutpoints in the evaluation of prognostic factors. Journal of the National Cancer Institute, 1994; 86: 829–835.
Breiman L: Bagging Predictor. Machine Learning 1996, 24:123–140.
Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. Wadsworth: Monterey, 1984.
Lausen B, Sauerbrei W, Schumacher M: Classification and regression trees used for the exploration of prognostic factors measured on different scales. In: Dirschedl, P, Ostermann, R (Hrsg): Computational Statistics. Physica-Verlag, Heidelberg, 1994; 483–496.
Lausen, B, Schumacher M.: Maximally selected rank statistics. Biometrics, 1992; 48: 73–85.
LeBlanc M: Tree-Based Methods for Prognostic Stratification. In: Crowley, J (Ed): Handbook of Statistics in Clinical Oncology. Marcel Dekker, New York, 2001; 457–472
Miller R, Siegmund D: Maximally Selected Chi Square Statistics. Biometrics. 1982; 38: 1011–1016.
Sauerbrei W: On the development and validation of classification schemes in survival data. In: Klar, R, Opitz, O (Hrsg): Classification and Knowledge Organization Springer-Verlag, 1997; 509–518.
Sauerbrei, W: ‘The use of resampling methods to simplify regression models in medical statistics’, Applied Statistics, 1999; 48:313–329.
Sauerbrei, W, Hübner, K, Schmoor, C, Schumacher, M for the German Breast Cancer Study Group (1997): ‘Validation of existing and development of new prognostic classification schemes in node negative breast cancer’, Breast Cancer Research and Treatment, 42: 149–163; Corrigendum Breast Cancer Research and Treatment 1998; 48: 191–192.
Sauerbrei, W, Madjar, H, Prömpeler, HJ: Differentiation of benign and malignant breast tumors by logistic regression and a classification tree using doppler flow signals. Methods of Information in Medicine, 1998; 37: 226–234.
Schumacher M, Roßner R, Vach W: Neural networks and logistic regression: Part I. Computational Statistics & Data Analysis 1996; 21:661–682.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sauerbrei, W. (2001). Investigations on Stability and Overoptimism of Classification Trees by Using Cross-Validation. In: Crespo, J., Maojo, V., Martin, F. (eds) Medical Data Analysis. ISMDA 2001. Lecture Notes in Computer Science, vol 2199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45497-7_38
Download citation
DOI: https://doi.org/10.1007/3-540-45497-7_38
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42734-6
Online ISBN: 978-3-540-45497-7
eBook Packages: Springer Book Archive