A First Study on Decomposition Strategies with Data with Class Noise Using Decision Trees

Sáez, José A.; Galar, Mikel; Luengo, Julián; Herrera, Francisco

doi:10.1007/978-3-642-28931-6_3

A First Study on Decomposition Strategies with Data with Class Noise Using Decision Trees

José A. Sáez²⁵,
Mikel Galar²⁶,
Julián Luengo²⁷ &
…
Francisco Herrera²⁵

Conference paper

1751 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7209))

Abstract

Noise is a common problem that produces negative consequences in classification problems. When a problem has more than two classes, that is, a multi-class problem, an interesting approach to deal with noise is to decompose the problem into several binary subproblems, reducing the complexity and consequently dividing the effects caused by noise into each of these subproblems. This contribution analyzes the use of decomposition strategies, and more specifically the One-vs-One scheme, to deal with multi-class datasets with class noise. In order to accomplish this, the performance of the decision trees built by C4.5, with and without decomposition, are studied. The results obtained show that the use of the One-vs-One strategy significantly improves the performance of C4.5 when dealing with noisy data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley, New York (2001)
MATH Google Scholar
Anand, A., Suganthan, P.N.: Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates. Journal of Theoretical Biology 259(3), 533–540 (2009)
Article Google Scholar
Hong, J.H., Min, J.K., Cho, U.K., Cho, S.B.: Fingerprint classification using one-vs-all support vector machines dynamically ordered with naïve bayes classifiers. Pattern Recognition 41(2), 662–671 (2008)
Article MATH Google Scholar
Wang, R.Y., Storey, V.C., Firth, C.P.: A Framework for Analysis of Data Quality Research. IEEE Transactions on Knowledge and Data Engineering 7(4), 623–640 (1995)
Article Google Scholar
Zhu, X., Wu, X.: Class Noise vs. Attribute Noise: A Quantitative Study. Artificial Intelligence Review 22, 177–210 (2004)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
MATH Google Scholar
Lorena, A., de Carvalho, A., Gama, J.: A review on the combination of binary classifiers in multiclass problems. Artificial Intelligence Review 30, 19–37 (2008)
Article Google Scholar
Knerr, S., Personnaz, L., Dreyfus, G.: Single-Layer Learning Revisited: A Stepwise Procedure for Building and Training a Neural Network. In: Fogelman Soulié, F., Hérault, J. (eds.) Neurocomputing: Algorithms, Architectures and Applications, pp. 41–50. Springer, Heidelberg (1990)
Google Scholar
Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient classification for multiclass problems using modular neural networks. IEEE Transactions on Neural Networks 6(1), 117–124 (1995)
Article Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition 44(8), 1761–1776 (2011)
Article Google Scholar
Furnkranz, J.: Round Robin Classification (2002)
Google Scholar
Sun, Y., Wong, A. K. C., Kamel, M. S.: Classification of Imbalanced Data: a Review. International Journal of Pattern Recognition and Artificial Intelligence, 687–719 (2009)
Google Scholar
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH Google Scholar
Passerini, A., Pontil, M., Frasconi, P.: New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 45–54 (2004)
Google Scholar
Pimenta, E., Gama, J.: A study on error correcting output codes. In: Portuguese Conference on Artificial Intelligence EPIA, pp. 218–223 (2005)
Google Scholar
Fürnkranz, J., Hüllermeier, E., Vanderlooy, S.: Binary Decomposition Methods for Multipartite Ranking. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5781, pp. 359–374. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: Proceeding of the Twentieth International Conference on Machine Learning, pp. 920–927 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, CITIC-UGR, Granada, Spain, 18071
José A. Sáez & Francisco Herrera
Department of Automática y Computación, Universidad Pública de Navarra, Pamplona, Spain, 31006
Mikel Galar
Department of Civil Engineering, LSI, University of Burgos, Burgos, Spain, 09006
Julián Luengo

Authors

José A. Sáez
View author publications
You can also search for this author in PubMed Google Scholar
Mikel Galar
View author publications
You can also search for this author in PubMed Google Scholar
Julián Luengo
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado
VŠB-TU Ostrava 17, Listopadu 15, 70833, Ostrava, Czech Republic
Václav Snášel
Machine Intelligence Research Labs Machine Intelligence Research Labs(MIR Labs),, Scientific Network for Innovation and Research Excellence, P.O. Box 2259, 98071, Auburn, Washington, USA
Ajith Abraham
Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Michał Woźniak
University of the Basque Country, Pº Manuel Lardizabal 1, 20018, San Sebastian, Spain
Manuel Graña
Yonsei University, 134 Shinchon-dong, 120-749, Sudaemoon-ku, Seoul, Korea
Sung-Bae Cho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sáez, J.A., Galar, M., Luengo, J., Herrera, F. (2012). A First Study on Decomposition Strategies with Data with Class Noise Using Decision Trees. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28931-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-28931-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28930-9
Online ISBN: 978-3-642-28931-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics