Bagging Decision Trees on Data Sets with Classification Noise

Abellán, Joaquín; Masegosa, Andrés R.

doi:10.1007/978-3-642-11829-6_17

Joaquín Abellán¹⁸ &
Andrés R. Masegosa¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5956))

Included in the following conference series:

International Symposium on Foundations of Information and Knowledge Systems

634 Accesses
25 Citations

Abstract

In many of the real applications of supervised classification techniques, the data sets employed to learn the models contains classification noise (some instances of the data set have wrong assignations of the class label), principally due to deficiencies in the data capture process. Bagging ensembles of decision trees are considered to be one of the most outperforming supervised classification models in these situations. In this paper, we propose Bagging ensemble of credal decision trees, which are based on imprecise probabilities, via the Imprecise Dirichlet model, and information based uncertainty measures, via the maximum of entropy function. We remark that our method can be applied on data sets with continuous variables and missing data. With an experimental study, we prove that Bagging credal decision trees outperforms more complex Bagging approaches in data sets with classification noise. Furthermore, using a bias-variance error decomposition analysis, we also justify the performance of our approach showing that it achieves a stronger and more robust reduction of the variance error component.

This work has been jointly supported by Spanish Ministry of Education and Science under project TIN2007-67418-C03-03, by European Regional Development Fund (FEDER) and by the Spanish research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abellán, J.: Uncertainty measures on probability intervals from Imprecise Dirichlet model. International Journal of General Systems 35(5), 509–528 (2006)
Article MATH MathSciNet Google Scholar
Abellán, J., Moral, S.: Maximum entropy for credal sets. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 11(5), 587–597 (2003)
Article MATH MathSciNet Google Scholar
Abellán, J., Moral, S.: Building classification trees using the total uncertainty criterion. International Journal of Intelligent Systems 18(12), 1215–1225 (2003)
Article MATH Google Scholar
Abellán, J., Moral, S.: An algorithm that computes the upper entropy for order-2 capacities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 14(2), 141–154 (2006)
Article MATH MathSciNet Google Scholar
Abellán, J., Moral, S.: Upper entropy of credal sets. Applications to credal classification. International Journal of Approximate Reasoning 39(2-3), 235–255 (2005)
Article MATH MathSciNet Google Scholar
Abellán, J., Klir, G.J., Moral, S.: Disaggregated total uncertainty measure for credal sets. International Journal of General Systems 35(1), 29–44 (2006)
Article MATH MathSciNet Google Scholar
Abellán, J., Masegosa, A.R.: An experimental study about simple decision trees for bagging ensemble on data sets with classification noise. In: Sossai, C., Chemello, G. (eds.) ECSQARU 2009. LNCS, vol. 5590, pp. 446–456. Springer, Heidelberg (2009)
Google Scholar
Bernard, J.M.: An introduction to the imprecise Dirichlet model for multinomial data. International Journal of Approximate Reasoning 39, 123–150 (2005)
Article MATH MathSciNet Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A.: Classification and Regression Trees. Wadsworth Statistics, Probability Series, Belmont (1984)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Demsar, J.: Statistical Comparison of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Elvira: An Environment for Creating and Using Probabilistic Graphical Models. In: Proceedings of the First European Workshop on Probabilistic Graphical Models (PGM 2002), Cuenca, Spain, pp. 1–11 (2002)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, San Francisco, pp. 148–156 (1996)
Google Scholar
Klir, G.J.: Uncertainty and Information: Foundations of Generalized Information Theory. John Wiley, Hoboken (2006)
Google Scholar
Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the Thirteenth International Conference of Machine Learning, pp. 275–283 (1996)
Google Scholar
Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 293–302. Springer, Heidelberg (2004)
Chapter Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, J.: Simplifying decision trees. International Journal of Machine Learning Studies 27, 221–234 (1987)
Article Google Scholar
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning (1993)
Google Scholar
Walley, P.: Inferences from multinomial data: learning about a bag of marbles. Journal of the Royal Statistical Society, Series B 58, 3–57 (1996)
MATH MathSciNet Google Scholar
Webb, G., Conilione, P.: Estimating bias and variance from data, Technical Report (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and, Artificial Intelligence, University of Granada, Spain
Joaquín Abellán & Andrés R. Masegosa

Authors

Joaquín Abellán
View author publications
You can also search for this author in PubMed Google Scholar
Andrés R. Masegosa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Management, The Victoria University of Wellington, Wellington, New Zealand
Sebastian Link
IRIT, University of Toulouse III, 118 route de Narbonne, 31062, Toulouse, Cedex 9, France
Henri Prade

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abellán, J., Masegosa, A.R. (2010). Bagging Decision Trees on Data Sets with Classification Noise. In: Link, S., Prade, H. (eds) Foundations of Information and Knowledge Systems. FoIKS 2010. Lecture Notes in Computer Science, vol 5956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11829-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-11829-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11828-9
Online ISBN: 978-3-642-11829-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics