A Double Pruning Algorithm for Classification Ensembles

Soto, Víctor; Martínez-Muñoz, Gonzalo; Hernández-Lobato, Daniel; Suárez, Alberto

doi:10.1007/978-3-642-12127-2_11

Víctor Soto¹⁹,
Gonzalo Martínez-Muñoz¹⁹,
Daniel Hernández-Lobato¹⁹ &
…
Alberto Suárez¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5997))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

1332 Accesses
7 Citations

Abstract

This article introduces a double pruning algorithm that can be used to reduce the storage requirements, speed-up the classification process and improve the performance of parallel ensembles. A key element in the design of the algorithm is the estimation of the class label that the ensemble assigns to a given test instance by polling only a fraction of its classifiers. Instead of applying this form of dynamical (instance-based) pruning to the original ensemble, we propose to apply it to a subset of classifiers selected using standard ensemble pruning techniques. The pruned subensemble is built by first modifying the order in which classifiers are aggregated in the ensemble and then selecting the first classifiers in the ordered sequence. Experiments in benchmark problems illustrate the improvements that can be obtained with this technique. Specifically, using a bagging ensemble of 101 CART trees as a starting point, only the 21 trees of the pruned ordered ensemble need to be stored in memory. Depending on the classification task, on average, only 5 to 12 of these 21 classifiers are queried to compute the predictions. The generalization performance achieved by this double pruning algorithm is similar to pruned ordered bagging and significantly better than standard bagging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)
Book MATH Google Scholar
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proc. of the 14th International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: Many could be better than all. Artificial Intelligence 137(1-2), 239–263 (2002)
Article MATH MathSciNet Google Scholar
Martínez-Muñoz, G., Suárez, A.: Aggregation ordering in bagging. In: Proc. of the IASTED International Conference on Artificial Intelligence and Applications, pp. 258–263. Acta Press (2004)
Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proc. of the 21st International Conference on Machine Learning, p. 18. ACM Press, New York (2004)
Google Scholar
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble diversity measures and their application to thinning. Information Fusion 6(1), 49–62 (2005)
Article Google Scholar
Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. Journal of Machine Learning Research 7, 1315–1338 (2006)
MathSciNet Google Scholar
Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 245–259 (2009)
Article Google Scholar
Hernández-Lobato, D., Martínez-Muñoz, G., Suárez, A.: Statistical instance-based pruning in ensembles of independent classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 364–369 (2009)
Article Google Scholar
Latinne, P., Debeir, O., Decaestecker, C.: Limiting the number of trees in random forests. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 178–187. Springer, Heidelberg (2001)
Chapter Google Scholar
Sharkey, A., Sharkey, N., Gerecke, U., Chandroth, G.: The Test and select approach to ensemble combination. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 30–44. Springer, Heidelberg (2000)
Chapter Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recognition Letters 28(1), 156–165 (2007)
Article Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall, New York (1984)
MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Autónoma de Madrid, EPS, Calle Francisco Tomás y Valiente, 11, Madrid, 28049, Spain
Víctor Soto, Gonzalo Martínez-Muñoz, Daniel Hernández-Lobato & Alberto Suárez

Authors

Víctor Soto
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Martínez-Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Hernández-Lobato
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Suárez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Informatics Science, Nile University, 12677, Giza, Egypt
Neamat El Gayar
Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH, Guildford, Surrey, UK
Josef Kittler
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soto, V., Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A. (2010). A Double Pruning Algorithm for Classification Ensembles. In: El Gayar, N., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2010. Lecture Notes in Computer Science, vol 5997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12127-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-12127-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12126-5
Online ISBN: 978-3-642-12127-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics