Skip to main content

Addressing the Class Imbalance Problem

  • Chapter
  • First Online:
Machine Learning Paradigms

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 118))

  • 2237 Accesses

Abstract

In this chapter, we investigate the particular effects of the class imbalance problem on standard classifier methodologies and present the various methodologies that have been proposed as a remedy. The most interesting approach within the context of Artificial Immune Systems is the one related to the machine learning paradigm of one-class classification. One-Class Classification problems may be thought of as degenerated binary classification problems in which the available training instances originate exclusively from the under-represented class of patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barnett, V., Barnett, V., Lewis, T.: Outliers in statistical data. Technical report (1978)

    Google Scholar 

  2. Batista, G.E.A.P.A., Bazzan, A.L.C., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003)

    Google Scholar 

  3. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  4. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)

    MATH  Google Scholar 

  5. Breiman, L., Breiman, L.: Bagging predictors. In: Machine Learning, pp. 123–140 (1996)

    Google Scholar 

  6. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  8. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications, 2nd edn. Springer, Heidelberg (2000)

    Book  MATH  Google Scholar 

  9. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999)

    Google Scholar 

  10. Duda, R.O., Hart, P.E., et al.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)

    MATH  Google Scholar 

  11. Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, pp. 973–978 (2001)

    Google Scholar 

  12. Han, H., Wang, W., Mao, B.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: ICIC (1), pp. 878–887 (2005)

    Google Scholar 

  13. Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory IT–14, 515–516 (1968)

    Article  Google Scholar 

  14. Japkowicz, N., Myers, C., Gluck, M., et al.: A novelty detection approach to classification. In: IJCAI, vol. 1, pp. 518–523 (1995)

    Google Scholar 

  15. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  16. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. 6(1), 40–49 (2004)

    Article  MathSciNet  Google Scholar 

  17. Kohonen, T.: Learning vector quantization. In: Self-Organizing Maps, pp. 203–217. Springer (1997)

    Google Scholar 

  18. Kraaijveld, M., Duin, R.: A Criterion for the Smoothing Parameter for Parzen-Estimators of Probability Density Functions. Delft University of Technology, Delft (1991)

    Google Scholar 

  19. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)

    Google Scholar 

  20. Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: AIME 2001: Proceedings of the 8th Conference on AI in Medicine in Europe, pp. 63–66. Springer, London (2001)

    Google Scholar 

  21. Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: KDD, pp. 73–79 (1998)

    Google Scholar 

  22. Liu, A., Ghosh, J., Martin, C.: Generative oversampling for mining imbalanced datasets. In: DMIN, pp. 66–72 (2007)

    Google Scholar 

  23. Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II (2003)

    Google Scholar 

  24. Metz, C.E.: Basic principles of roc analysis. Seminars Nuclear Med. 8, 283–298 (1978). Elsevier

    Article  Google Scholar 

  25. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  26. Tarassenko, L., Hayton, P., Cerneaz, N., Brady, M.: Novelty detection for the identification of masses in mammograms. In: Fourth International Conference on Artificial Neural Networks, 1995, pp. 442–447. IET (1995)

    Google Scholar 

  27. Tax, D.M.J.: Concept-learning in the absence of counter-examples:an auto association-based approach to classification. Ph.D. thesis, The State University of NewJersey (1999)

    Google Scholar 

  28. Tomek, I.: Two modifications of cnn. IEEE Trans. Syst. Man Commun. C SMC C–6, 769–772 (1976)

    MathSciNet  MATH  Google Scholar 

  29. Weiss, G., Provost, F.: The effect of class distribution on classifier learning: An empirical study. Technical report (2001)

    Google Scholar 

  30. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000)

    Article  MATH  Google Scholar 

  31. Ypma, A., Duin, R.P.: Support objects for domain approximation. In: ICANN 98, pp. 719–724. Springer (1998)

    Google Scholar 

  32. Zhang, J., Mani, I.: knn approach to unbalanced data distributions: a case study involving information extraction, pp. 42–48 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dionisios N. Sotiropoulos .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Sotiropoulos, D.N., Tsihrintzis, G.A. (2017). Addressing the Class Imbalance Problem. In: Machine Learning Paradigms. Intelligent Systems Reference Library, vol 118. Springer, Cham. https://doi.org/10.1007/978-3-319-47194-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47194-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47192-1

  • Online ISBN: 978-3-319-47194-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics