Skip to main content

Investigations on Stability and Overoptimism of Classification Trees by Using Cross-Validation

  • Conference paper
  • First Online:
  • 741 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2199))

Abstract

Development of classification rules is often based on tree methodology. Using data from a diagnostic study where Doppler flow signals were measured to separate between malignant and benign breast tumors I will discuss issues of searching for the cutpoint of continuous variables with a minimal p-value and the necessity to correct this p-value because of multiple testing. Ignoring the correction will strongly favor continuous variables in tree development and may lead to useless trees. I will further investigate the influence of the complexity of a tree by estimating the overoptimism as the difference from the apparent error rates based on the original data to estimated error rates based on 5-fold crossvalidation. Furthermore I consider the use of predefined cutpoint on the development of trees and the resulting error rates.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altman DG, Lausen B, Sauerbrei W, Schumacher M: Danger of using “optimal” cutpoints in the evaluation of prognostic factors. Journal of the National Cancer Institute, 1994; 86: 829–835.

    Article  Google Scholar 

  2. Breiman L: Bagging Predictor. Machine Learning 1996, 24:123–140.

    MATH  MathSciNet  Google Scholar 

  3. Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. Wadsworth: Monterey, 1984.

    MATH  Google Scholar 

  4. Lausen B, Sauerbrei W, Schumacher M: Classification and regression trees used for the exploration of prognostic factors measured on different scales. In: Dirschedl, P, Ostermann, R (Hrsg): Computational Statistics. Physica-Verlag, Heidelberg, 1994; 483–496.

    Google Scholar 

  5. Lausen, B, Schumacher M.: Maximally selected rank statistics. Biometrics, 1992; 48: 73–85.

    Article  Google Scholar 

  6. LeBlanc M: Tree-Based Methods for Prognostic Stratification. In: Crowley, J (Ed): Handbook of Statistics in Clinical Oncology. Marcel Dekker, New York, 2001; 457–472

    Google Scholar 

  7. Miller R, Siegmund D: Maximally Selected Chi Square Statistics. Biometrics. 1982; 38: 1011–1016.

    Article  MATH  MathSciNet  Google Scholar 

  8. Sauerbrei W: On the development and validation of classification schemes in survival data. In: Klar, R, Opitz, O (Hrsg): Classification and Knowledge Organization Springer-Verlag, 1997; 509–518.

    Google Scholar 

  9. Sauerbrei, W: ‘The use of resampling methods to simplify regression models in medical statistics’, Applied Statistics, 1999; 48:313–329.

    MATH  Google Scholar 

  10. Sauerbrei, W, Hübner, K, Schmoor, C, Schumacher, M for the German Breast Cancer Study Group (1997): ‘Validation of existing and development of new prognostic classification schemes in node negative breast cancer’, Breast Cancer Research and Treatment, 42: 149–163; Corrigendum Breast Cancer Research and Treatment 1998; 48: 191–192.

    Article  Google Scholar 

  11. Sauerbrei, W, Madjar, H, Prömpeler, HJ: Differentiation of benign and malignant breast tumors by logistic regression and a classification tree using doppler flow signals. Methods of Information in Medicine, 1998; 37: 226–234.

    Google Scholar 

  12. Schumacher M, Roßner R, Vach W: Neural networks and logistic regression: Part I. Computational Statistics & Data Analysis 1996; 21:661–682.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sauerbrei, W. (2001). Investigations on Stability and Overoptimism of Classification Trees by Using Cross-Validation. In: Crespo, J., Maojo, V., Martin, F. (eds) Medical Data Analysis. ISMDA 2001. Lecture Notes in Computer Science, vol 2199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45497-7_38

Download citation

  • DOI: https://doi.org/10.1007/3-540-45497-7_38

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42734-6

  • Online ISBN: 978-3-540-45497-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics