Skip to main content

Applying One-Sided Selection to Unbalanced Datasets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1793))

Abstract

Several aspects may influence the performance achieved by a classifier created by a Machine Learning system. One of these aspects is related to the difference between the number of examples belonging to each class. When the difference is large, the learning system may have difficulties to learn the concept related to the minority class. In this work, we discuss some methods to decrease the number of examples belonging to the majority class, in order to improve the performance of the minority class. We also propose the use of the VDM metric in order to improve the performance of the classification techniques. Experimental application in a real world dataset confirms the efficiency of the proposed methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnard, E., Cole, R.A., Hou, L.: Location and Classification of Plosive Constants Using Expert Knowledge and Neural Nets Classifiers. Journal of the Acoustical Society of America 84(Supp. 1), 60 (1988)

    Google Scholar 

  2. Batista, G.E.A.P.A., Monard, M.C.: A Computational Environment to Measure Machine Learning Systems Performance. In: Proceedings I ENIA, pp. 41–45 (1997) (in Portuguese)

    Google Scholar 

  3. Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases, Department of Information and Computer Science,University of California, Irvine, http://www.ics.uci.edu/mlearn/MLRepository.html

  4. Chan, P.K., Stolfo, S.J.: Learning with Non-uniform Class and Cost Distributions: Effects and a Distributed Multi-Classifier Approach. In: KDD 1998 Workshop on Distributed Data Mining, pp. 1–9 (1998)

    Google Scholar 

  5. Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10(1), 57–78 (1993)

    Google Scholar 

  6. Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory IT-14, 515–516 (1968)

    Article  Google Scholar 

  7. Holte, C.R.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  8. Kubat, M., Matwin, S.: Addressing the Course of Imbalanced Training Sets: One- Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, pp. 179–186. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  9. Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L.: Neural Network Classification and Prior Class Probabilities. In: Orr, G., Müller, K.R., Caruana, R. (eds.) Tricks of the trade, Lecture Notes in Computer Science State-of-the-art surveys, pp. 299–314. Springer, Heidelberg (1998)

    Google Scholar 

  10. Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th International Conference on Machine Learning, ICML 1994, pp. 148–156. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  11. Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, CA (1988)

    Google Scholar 

  12. Stanfill, C., Waltz, D.: Toward Memory-Based Reasoning. Communications of the ACM 29(12), 1213–1228 (1986)

    Article  Google Scholar 

  13. Stolfo, S.J., Fan, D.W., Lee, W., Prodromidis, A.L., Chan, P.K.: Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results. In: Proc. AAAI 1997 Workshop on AI Methods in Fraud and Risk Management (1997)

    Google Scholar 

  14. Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems Man and Communications SMC-6, 769–772 (1976)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Batista, G.E.A.P.A., Carvalho, A.C.P.L.F., Monard, M.C. (2000). Applying One-Sided Selection to Unbalanced Datasets. In: Cairó, O., Sucar, L.E., Cantu, F.J. (eds) MICAI 2000: Advances in Artificial Intelligence. MICAI 2000. Lecture Notes in Computer Science(), vol 1793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720076_29

Download citation

  • DOI: https://doi.org/10.1007/10720076_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67354-5

  • Online ISBN: 978-3-540-45562-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics