Randomization in Aggregated Classification Trees

Gatnar, Eugeniusz

doi:10.1007/3-540-26981-9_25

Eugeniusz Gatnar²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2541 Accesses

Abstract

Tree-based models are popular and widely used because they are simple, flexible and powerful tools for classification. Unfortunately they are not stable classifiers. Significant improvement of model stability and prediction accuracy can be obtained by aggregation of multiple classification trees. The reduction of classification error is a result of decreasing bias or/and variance of the committee of trees (called also an ensemble or a forest). In this paper we discuss and compare different methods for model aggregation. We also address the problem of finding minimal number of trees sufficient for the forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AMIT, Y. and BLANCHARD, G. (2001): Multiple Randomized Classifiers: MRCL. Technical Report, Department of Statistics, University of Chicago, Chicago.
Google Scholar
BAUER, E. and KOHAVI R. (1999): An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 36, 105–142.
Article Google Scholar
BLAKE, C., KEOGH, E., and MERZ, C.J. (1998): UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine.
Google Scholar
BREIMAN, L., FRIEDMAN, J., OLSHEN, R., and STONE, C. (1984): Classification and Regression Trees, Chapman & Hall/CRC Press, London.
Google Scholar
BREIMAN, L. (1996a): Bagging predictors. Machine Learning, 24, 123–140.
MATH MathSciNet Google Scholar
BREIMAN, L. (1996b): Bias, Variance and Arcing Classifiers. Technical Report, Statistics Department, University of California, Berkeley.
Google Scholar
BREIMAN, L. (1998): Arcing classifers. Annals of Statistics, 26, 801–849.
Article MATH MathSciNet Google Scholar
BREIMAN, L. (1999): Using adaptive bagging to debias regressions. Technical Report 547, Department of Statistics, University of California, Berkeley.
Google Scholar
BREIMAN, L. (2001): Random Forests. Machine Learning 45, 5–32.
Article MATH Google Scholar
BREIMAN, L. (2002): Wald Lecture I-Machine Learning. Department of Statistics, University of California, Berkeley.
Google Scholar
CARTER, C. and CATLETT, J. (1987): Assesing Credit Card Applications Using Machine Learning. IEEE Expert, Fall issue, 71–79.
Google Scholar
DIETTERICH, T. (2000): An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization. Machine Learning 40, 139–158.
Article Google Scholar
DIETTERICH, T. and KONG, E. (1995): Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms. Technical Report, Department of Computer Science, Oregon State University.
Google Scholar
FREUND, Y. and SCHAPIRE, R.E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences 55, 119–139.
Article MathSciNet Google Scholar
FRIEDMAN, J.H. (1996): On Bias, Variance, 0/1-Loss, and The Curse-of-Dimensionality. Technical Report, Department of Computer Science, Stanford University.
Google Scholar
FRIEDMAN, J.H. (1999): Stochastic Gradient Boosting. Technical Report, Department of Computer Science, Stanford University.
Google Scholar
GATNAR, E. (2002): Tree-based models in statistics: three decades of research. In: K. Jajuga, A. Sokoiowski, and H.-H. Bock (Eds.): Classification, Clustering, and Analysis. Springer, Berlin, 399–408.
Google Scholar
HASTIE, T. and PREGIBON, D. (1991): Shrinking Trees. Technical Report, AT&T Laboratories, Murray Hill.
Google Scholar
HO, T.K. (1998): The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832–844.
Article Google Scholar
JIANG, W. (2000): Process Consistency for AdaBoost. Technical Report 00-05, Department of Statistics, Northwestern University.
Google Scholar
KOHAVI, R. and WOLPERT, D. (1996): Bias Plus Variance Decomposition for Zero-One Loss Functions. In: L. Saitta (Ed.): Machine Learning: Proceedings of the XIIIth International Conference, Morgan Kaufman, 313–321.
Google Scholar
LATINNE, P., DEBEIR, O., and DECAESECKER, Ch. (2001): Limiting the number of trees in random forests, In: J. Kittler and F. Roli (Eds.): Multiple Classifier System, LNCS 2096, Springer, Berlin, 178–187.
Google Scholar
LUGOSI, G. and VAYATIS, N. (2002): Statistical Study of Regularized Boosting Methods. Technical Report, Department of Economics, Pompeu Fabra University, Barcelona.
Google Scholar
QUINLAN, J.R. (1993): C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo.
Google Scholar
TIBSHIRANI, R. (1996): Regression shrinkage and selection via the lasso. J.R. Statist. Soc. B, 58, 267–288.
MATH MathSciNet Google Scholar
TUKEY, J. (1977): Exploratory Data Analysis, Addison-Wesly, Reading.
Google Scholar
WOLPERT, D.H. (1992): Stacked Generalization. Neural Networks, 5, 241–259
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistics, Katowice University of Economics, ul. Bogucicka 14, 40-226, Katowice, Poland
Eugeniusz Gatnar

Authors

Eugeniusz Gatnar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Business Administration and Economics, Brandenburg University of Technology Cottbus, Konrad-Wachsmann-Allee 1, 03046, Cottbus, Germany
Daniel Baier (Chair of Marketing and Innovation Management) (Chair of Marketing and Innovation Management)
Department of Medical Biometrics Charité Virchow-Klinikum, Humboldt University Berlin, 13344, Berlin, Germany
Klaus-Dieter Wernecke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gatnar, E. (2005). Randomization in Aggregated Classification Trees. In: Baier, D., Wernecke, KD. (eds) Innovations in Classification, Data Science, and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26981-9_25

Download citation

DOI: https://doi.org/10.1007/3-540-26981-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23221-6
Online ISBN: 978-3-540-26981-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics