Applying One-Sided Selection to Unbalanced Datasets

Batista, Gustavo E. A. P. A.; Carvalho, Andre C. P. L. F.; Monard, Maria Carolina

doi:10.1007/10720076_29

Applying One-Sided Selection to Unbalanced Datasets

Gustavo E. A. P. A. Batista⁹,
Andre C. P. L. F. Carvalho¹⁰ &
Maria Carolina Monard¹¹

Conference paper

883 Accesses
26 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1793))

Abstract

Several aspects may influence the performance achieved by a classifier created by a Machine Learning system. One of these aspects is related to the difference between the number of examples belonging to each class. When the difference is large, the learning system may have difficulties to learn the concept related to the minority class. In this work, we discuss some methods to decrease the number of examples belonging to the majority class, in order to improve the performance of the minority class. We also propose the use of the VDM metric in order to improve the performance of the classification techniques. Experimental application in a real world dataset confirms the efficiency of the proposed methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barnard, E., Cole, R.A., Hou, L.: Location and Classification of Plosive Constants Using Expert Knowledge and Neural Nets Classifiers. Journal of the Acoustical Society of America 84(Supp. 1), 60 (1988)
Google Scholar
Batista, G.E.A.P.A., Monard, M.C.: A Computational Environment to Measure Machine Learning Systems Performance. In: Proceedings I ENIA, pp. 41–45 (1997) (in Portuguese)
Google Scholar
Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases, Department of Information and Computer Science,University of California, Irvine, http://www.ics.uci.edu/mlearn/MLRepository.html
Chan, P.K., Stolfo, S.J.: Learning with Non-uniform Class and Cost Distributions: Effects and a Distributed Multi-Classifier Approach. In: KDD 1998 Workshop on Distributed Data Mining, pp. 1–9 (1998)
Google Scholar
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10(1), 57–78 (1993)
Google Scholar
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory IT-14, 515–516 (1968)
Article Google Scholar
Holte, C.R.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)
Article MATH Google Scholar
Kubat, M., Matwin, S.: Addressing the Course of Imbalanced Training Sets: One- Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, pp. 179–186. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L.: Neural Network Classification and Prior Class Probabilities. In: Orr, G., Müller, K.R., Caruana, R. (eds.) Tricks of the trade, Lecture Notes in Computer Science State-of-the-art surveys, pp. 299–314. Springer, Heidelberg (1998)
Google Scholar
Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th International Conference on Machine Learning, ICML 1994, pp. 148–156. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, CA (1988)
Google Scholar
Stanfill, C., Waltz, D.: Toward Memory-Based Reasoning. Communications of the ACM 29(12), 1213–1228 (1986)
Article Google Scholar
Stolfo, S.J., Fan, D.W., Lee, W., Prodromidis, A.L., Chan, P.K.: Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results. In: Proc. AAAI 1997 Workshop on AI Methods in Fraud and Risk Management (1997)
Google Scholar
Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems Man and Communications SMC-6, 769–772 (1976)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Ciências Matemáticas de São Carlos, Universidade de São Paulo. Silicon Graphics, Brasil
Gustavo E. A. P. A. Batista
Instituto de Ciências Matemáticas de São Carlos, Universidade de São Paulo,
Andre C. P. L. F. Carvalho
Instituto de Ciências Matemáticas de São Carlos, Universidade de São Paulo/ILTC, Av. Carlos Botelho, 1465, Caixa Postal 668, CEP 13560-970, São Carlos, SP, Brasil
Maria Carolina Monard

Authors

Gustavo E. A. P. A. Batista
View author publications
You can also search for this author in PubMed Google Scholar
Andre C. P. L. F. Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Maria Carolina Monard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science Instituto Tecnológico Autónomo de México (ITAM), Río Hondo 1, 01080, México DF, México
Osvaldo Cairó
Monterrey Institute of Technology (ITESM), Campus Morelos, Av. Reforma 182-A, Lomas de Cuernvaca, Temixco, 62589, Morelos, Mexico
L. Enrique Sucar
ITESM Campus Monterrey, Research and Graduate Studies Office, Monterrey, N.L., México
Francisco J. Cantu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista, G.E.A.P.A., Carvalho, A.C.P.L.F., Monard, M.C. (2000). Applying One-Sided Selection to Unbalanced Datasets. In: Cairó, O., Sucar, L.E., Cantu, F.J. (eds) MICAI 2000: Advances in Artificial Intelligence. MICAI 2000. Lecture Notes in Computer Science(), vol 1793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720076_29

Download citation

DOI: https://doi.org/10.1007/10720076_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67354-5
Online ISBN: 978-3-540-45562-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics