Skip to main content
  • 2541 Accesses

Abstract

Tree-based models are popular and widely used because they are simple, flexible and powerful tools for classification. Unfortunately they are not stable classifiers. Significant improvement of model stability and prediction accuracy can be obtained by aggregation of multiple classification trees. The reduction of classification error is a result of decreasing bias or/and variance of the committee of trees (called also an ensemble or a forest). In this paper we discuss and compare different methods for model aggregation. We also address the problem of finding minimal number of trees sufficient for the forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AMIT, Y. and BLANCHARD, G. (2001): Multiple Randomized Classifiers: MRCL. Technical Report, Department of Statistics, University of Chicago, Chicago.

    Google Scholar 

  • BAUER, E. and KOHAVI R. (1999): An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 36, 105–142.

    Article  Google Scholar 

  • BLAKE, C., KEOGH, E., and MERZ, C.J. (1998): UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine.

    Google Scholar 

  • BREIMAN, L., FRIEDMAN, J., OLSHEN, R., and STONE, C. (1984): Classification and Regression Trees, Chapman & Hall/CRC Press, London.

    Google Scholar 

  • BREIMAN, L. (1996a): Bagging predictors. Machine Learning, 24, 123–140.

    MATH  MathSciNet  Google Scholar 

  • BREIMAN, L. (1996b): Bias, Variance and Arcing Classifiers. Technical Report, Statistics Department, University of California, Berkeley.

    Google Scholar 

  • BREIMAN, L. (1998): Arcing classifers. Annals of Statistics, 26, 801–849.

    Article  MATH  MathSciNet  Google Scholar 

  • BREIMAN, L. (1999): Using adaptive bagging to debias regressions. Technical Report 547, Department of Statistics, University of California, Berkeley.

    Google Scholar 

  • BREIMAN, L. (2001): Random Forests. Machine Learning 45, 5–32.

    Article  MATH  Google Scholar 

  • BREIMAN, L. (2002): Wald Lecture I-Machine Learning. Department of Statistics, University of California, Berkeley.

    Google Scholar 

  • CARTER, C. and CATLETT, J. (1987): Assesing Credit Card Applications Using Machine Learning. IEEE Expert, Fall issue, 71–79.

    Google Scholar 

  • DIETTERICH, T. (2000): An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization. Machine Learning 40, 139–158.

    Article  Google Scholar 

  • DIETTERICH, T. and KONG, E. (1995): Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms. Technical Report, Department of Computer Science, Oregon State University.

    Google Scholar 

  • FREUND, Y. and SCHAPIRE, R.E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences 55, 119–139.

    Article  MathSciNet  Google Scholar 

  • FRIEDMAN, J.H. (1996): On Bias, Variance, 0/1-Loss, and The Curse-of-Dimensionality. Technical Report, Department of Computer Science, Stanford University.

    Google Scholar 

  • FRIEDMAN, J.H. (1999): Stochastic Gradient Boosting. Technical Report, Department of Computer Science, Stanford University.

    Google Scholar 

  • GATNAR, E. (2002): Tree-based models in statistics: three decades of research. In: K. Jajuga, A. Sokoiowski, and H.-H. Bock (Eds.): Classification, Clustering, and Analysis. Springer, Berlin, 399–408.

    Google Scholar 

  • HASTIE, T. and PREGIBON, D. (1991): Shrinking Trees. Technical Report, AT&T Laboratories, Murray Hill.

    Google Scholar 

  • HO, T.K. (1998): The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832–844.

    Article  Google Scholar 

  • JIANG, W. (2000): Process Consistency for AdaBoost. Technical Report 00-05, Department of Statistics, Northwestern University.

    Google Scholar 

  • KOHAVI, R. and WOLPERT, D. (1996): Bias Plus Variance Decomposition for Zero-One Loss Functions. In: L. Saitta (Ed.): Machine Learning: Proceedings of the XIIIth International Conference, Morgan Kaufman, 313–321.

    Google Scholar 

  • LATINNE, P., DEBEIR, O., and DECAESECKER, Ch. (2001): Limiting the number of trees in random forests, In: J. Kittler and F. Roli (Eds.): Multiple Classifier System, LNCS 2096, Springer, Berlin, 178–187.

    Google Scholar 

  • LUGOSI, G. and VAYATIS, N. (2002): Statistical Study of Regularized Boosting Methods. Technical Report, Department of Economics, Pompeu Fabra University, Barcelona.

    Google Scholar 

  • QUINLAN, J.R. (1993): C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo.

    Google Scholar 

  • TIBSHIRANI, R. (1996): Regression shrinkage and selection via the lasso. J.R. Statist. Soc. B, 58, 267–288.

    MATH  MathSciNet  Google Scholar 

  • TUKEY, J. (1977): Exploratory Data Analysis, Addison-Wesly, Reading.

    Google Scholar 

  • WOLPERT, D.H. (1992): Stacked Generalization. Neural Networks, 5, 241–259

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Gatnar, E. (2005). Randomization in Aggregated Classification Trees. In: Baier, D., Wernecke, KD. (eds) Innovations in Classification, Data Science, and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26981-9_25

Download citation

Publish with us

Policies and ethics