Skip to main content

Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior

  • Conference paper
MICAI 2004: Advances in Artificial Intelligence (MICAI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2972))

Included in the following conference series:

Abstract

Several works point out class imbalance as an obstacle on applying machine learning algorithms to real world domains. However, in some cases, learning algorithms perform well on several imbalanced domains. Thus, it does not seem fair to directly correlate class imbalance to the loss of performance of learning algorithms. In this work, we develop a systematic study aiming to question whether class imbalances are truly to blame for the loss of performance of learning systems or whether the class imbalances are not a problem by themselves. Our experiments suggest that the problem is not directly caused by class imbalances, but is also related to the degree of overlapping among the classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chawla, N., Japkowicz, N., Kolcz, A. (eds.): ICML 2003 Workshop on Learning from Imbalanced Data Sets (II) (2003), Proceedings available at http://www.site.uottawa.ca/~nat/Workshop2003/workshop2003.html

  2. Drummond, C., Holt, R.C.: Explicity representing expected cost: An alternative to roc representation. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–207 (2000)

    Google Scholar 

  3. Ferri, C., Flach, P., Hernández-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Hoffman, C.S.A. (ed.) Nineteenth International Conference on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  4. Hand, D.J.: Construction and Assessment of Classification Rules. John Wiley and Sons, Chichester (1997)

    MATH  Google Scholar 

  5. Japkowicz, N. (ed.): AAAI Workshop on Learning from Imbalanced Data Sets. AAAI Press, Menlo Park (2003), Techical report WS-00-05

    Google Scholar 

  6. Japkowicz, N.: Class imbalances: Are we focusing on the right issue. In: Proc. of the ICML 2003 Workshop on Learning from Imbalanced Data Sets (II) (2003)

    Google Scholar 

  7. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5), 429–450 (2002)

    MATH  Google Scholar 

  8. Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distributions. Technical Report A-2001-2, University of Tampere, Finland (2001)

    Google Scholar 

  9. Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Datasets (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  10. Provost, F.J., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)

    Google Scholar 

  11. Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  12. Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44, Rutgers University, Department of Computer Science (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prati, R.C., Batista, G.E.A.P.A., Monard, M.C. (2004). Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds) MICAI 2004: Advances in Artificial Intelligence. MICAI 2004. Lecture Notes in Computer Science(), vol 2972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24694-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24694-7_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21459-5

  • Online ISBN: 978-3-540-24694-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics