Limiting the Number of Trees in Random Forests

Latinne, Patrice; Debeir, Olivier; Decaestecker, Christine

doi:10.1007/3-540-48219-9_18

Limiting the Number of Trees in Random Forests

Patrice Latinne⁶,
Olivier Debeir⁷ &
Christine Decaestecker⁸

Conference paper
First Online: 01 January 2001

1381 Accesses
61 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2096))

Abstract

The aim of this paper is to propose a simple procedure that a priori determines a minimum number of classifiers to combine in order to obtain a prediction accuracy level similar to the one obtained with the combination of larger ensembles. The procedure is based on the McNemar non-parametric test of significance. Knowing a priori the minimum size of the classifier ensemble giving the best prediction accuracy, constitutes a gain for time and memory costs especially for huge data bases and real-time applications. Here we applied this procedure to four multiple classifier systems with C4.5 decision tree (Breiman’s Bagging, Ho’s Random subspaces, their combination we labeled ’Bagfs’, and Breiman’s Random forests) and five large benchmark data bases. It is worth noticing that the proposed procedure may easily be extended to other base learning algorithms than a decision tree as well. The experimental results showed that it is possible to limit significantly the number of trees. We also showed that the minimum number of trees required for obtaining the best prediction accuracy may vary from one classifier combination method to another.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ali and Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24:173–202, 1996.
Google Scholar
Stephen D. Bay. Nearest neighbor classification from multiple feature subsets. In Proceedings of the International Conference on Machine Learning, Madison, Wisc., 1998. Morgan Kaufmann Publishers.
Google Scholar
C. Blake, E. Keogh, and C.J. Merz. Uci repository of machine learning databases. [http://www.ics.uci.edu/mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
Google Scholar
Leo Breiman. Bagging predictors. Machine Learning, 24, 1996.
Google Scholar
Leo Breiman. Arcing classifiers. Annals of statistics, 26:801–849, 1998.
Article MATH MathSciNet Google Scholar
Leo Breiman. Random forests–random features. Technical Report 567, Statistics Department, University of California, Berkeley, CA 94720, september 1999.
Google Scholar
T.G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10:1895–1923, 1998.
Article Google Scholar
T.G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning, 40:139–157, 2000.
Article Google Scholar
Giorgio Giacento and Fabio Roli. An approach to the automatic design of multiple classifier systems. Pattern recognition letters, 22:25–33, 2001.
Article Google Scholar
T.K. Ho. the random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence, 20:832–844, 1998.
Article Google Scholar
Ji and Ma. Combinations of weak classifiers. IEEE Trans. Neural Network, 7(1):32–42, 1997.
Google Scholar
Ron Kohavi and Clayton Kunz. Option decision trees with majority votes. In Proceedings of the Fourtheeth International Conference on Machine Learning, pages 161–169, San Francisco, CA, 1997. Morgan Kaufmann.
Google Scholar
Patrice Latinne, Olivier Debeir, and Christine Decaestecker. Different ways of weakening decision trees and their impact on classification accuracy. In Proc. of the 1st International Workshop of Multiple Classifier System, pages 200–210, Cagliari, Italy, 2000. Springer (Lecture Notes in Computer Sciences; Vol. 1857).
Chapter Google Scholar
Patrice Latinne, Olivier Debeir, and Christine Decaestecker. Mixing bagging and multiple feature subsets to improve classification accuracy of decision tree combination. In Proc. of the Tenth Belgian-Dutch Conference on Machine Learning Benelearn’00, pages 15–22, Tilburg University, 2000. Ed. Ad Feelders.
Google Scholar
J.R. Quinlan. C4.5: Programs For Machine Learning. Morgan Kaufmann Publishers, San Mateo, California, 1993.
Google Scholar
J.R. Quinlan. Bagging, boosting, and c4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725–730, 1996.
Google Scholar
Bernard Rosner. Fundamentals of Biostatistics. Duxbury Press (ITP), Belmont, CA, USA, 4th edition, 1995.
Google Scholar
Steven Salzberg. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and knowledge discovery, 1:317–327, 1997.
Article Google Scholar
R.E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990.
Google Scholar
S. Siegel and N.J. Castellan. Nonparametric Statistics for the behavioral sciences. McGraw-Hill, second edition, 1988.
Google Scholar
K. Tumer and J. Ghosh. Classifier combining: analytical results and implications. In Proceedings of the National Conference on Artificial Intelligence, Portland, OR, 1996.
Google Scholar
Zijian Zheng. Generating classifier committees by stochastically selecting both attributes and training examples. In Proceedings of the 5th Pacific Rim International Conferences on Artificial Intelligence (PRICAI’98), pages 12–23. Berlin: Springer-Verlag, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

RIDIA Laboratory, Université Libre de Bruxelles, 50, avenue Franklin Roosevelt cp 196/06, B-1050, Brussels, Belgium
Patrice Latinne
Information and Decision Systems, Université Libre de Bruxelles, 50, avenue Franklin Roosevelt cp 165/57, B-1050, Brussels, Belgium
Olivier Debeir
Laboratory of Histopathology, Université Libre de Bruxelles, 808, route de Lennik cp 620, B-1070, Brussels, Belgium
Christine Decaestecker

Authors

Patrice Latinne
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Debeir
View author publications
You can also search for this author in PubMed Google Scholar
Christine Decaestecker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Josef Kittler
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Latinne, P., Debeir, O., Decaestecker, C. (2001). Limiting the Number of Trees in Random Forests. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_18

Download citation

DOI: https://doi.org/10.1007/3-540-48219-9_18
Published: 22 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42284-6
Online ISBN: 978-3-540-48219-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics