Addressing the Class Imbalance Problem

Sotiropoulos, Dionisios N.; Tsihrintzis, George A.

doi:10.1007/978-3-319-47194-5_4

Dionisios N. Sotiropoulos⁵ &
George A. Tsihrintzis⁵

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 118))

2237 Accesses

Abstract

In this chapter, we investigate the particular effects of the class imbalance problem on standard classifier methodologies and present the various methodologies that have been proposed as a remedy. The most interesting approach within the context of Artificial Immune Systems is the one related to the machine learning paradigm of one-class classification. One-Class Classification problems may be thought of as degenerated binary classification problems in which the available training instances originate exclusively from the under-represented class of patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barnett, V., Barnett, V., Lewis, T.: Outliers in statistical data. Technical report (1978)
Google Scholar
Batista, G.E.A.P.A., Bazzan, A.L.C., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)
MATH Google Scholar
Breiman, L., Breiman, L.: Bagging predictors. In: Machine Learning, pp. 123–140 (1996)
Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications, 2nd edn. Springer, Heidelberg (2000)
Book MATH Google Scholar
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999)
Google Scholar
Duda, R.O., Hart, P.E., et al.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)
MATH Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, pp. 973–978 (2001)
Google Scholar
Han, H., Wang, W., Mao, B.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: ICIC (1), pp. 878–887 (2005)
Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory IT–14, 515–516 (1968)
Article Google Scholar
Japkowicz, N., Myers, C., Gluck, M., et al.: A novelty detection approach to classification. In: IJCAI, vol. 1, pp. 518–523 (1995)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. 6(1), 40–49 (2004)
Article MathSciNet Google Scholar
Kohonen, T.: Learning vector quantization. In: Self-Organizing Maps, pp. 203–217. Springer (1997)
Google Scholar
Kraaijveld, M., Duin, R.: A Criterion for the Smoothing Parameter for Parzen-Estimators of Probability Density Functions. Delft University of Technology, Delft (1991)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)
Google Scholar
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: AIME 2001: Proceedings of the 8th Conference on AI in Medicine in Europe, pp. 63–66. Springer, London (2001)
Google Scholar
Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: KDD, pp. 73–79 (1998)
Google Scholar
Liu, A., Ghosh, J., Martin, C.: Generative oversampling for mining imbalanced datasets. In: DMIN, pp. 66–72 (2007)
Google Scholar
Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II (2003)
Google Scholar
Metz, C.E.: Basic principles of roc analysis. Seminars Nuclear Med. 8, 283–298 (1978). Elsevier
Article Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Tarassenko, L., Hayton, P., Cerneaz, N., Brady, M.: Novelty detection for the identification of masses in mammograms. In: Fourth International Conference on Artificial Neural Networks, 1995, pp. 442–447. IET (1995)
Google Scholar
Tax, D.M.J.: Concept-learning in the absence of counter-examples:an auto association-based approach to classification. Ph.D. thesis, The State University of NewJersey (1999)
Google Scholar
Tomek, I.: Two modifications of cnn. IEEE Trans. Syst. Man Commun. C SMC C–6, 769–772 (1976)
MathSciNet MATH Google Scholar
Weiss, G., Provost, F.: The effect of class distribution on classifier learning: An empirical study. Technical report (2001)
Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000)
Article MATH Google Scholar
Ypma, A., Duin, R.P.: Support objects for domain approximation. In: ICANN 98, pp. 719–724. Springer (1998)
Google Scholar
Zhang, J., Mani, I.: knn approach to unbalanced data distributions: a case study involving information extraction, pp. 42–48 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Piraeus, Piraeus, Greece
Dionisios N. Sotiropoulos & George A. Tsihrintzis

Authors

Dionisios N. Sotiropoulos
View author publications
You can also search for this author in PubMed Google Scholar
George A. Tsihrintzis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dionisios N. Sotiropoulos .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sotiropoulos, D.N., Tsihrintzis, G.A. (2017). Addressing the Class Imbalance Problem. In: Machine Learning Paradigms. Intelligent Systems Reference Library, vol 118. Springer, Cham. https://doi.org/10.1007/978-3-319-47194-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-47194-5_4
Published: 27 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47192-1
Online ISBN: 978-3-319-47194-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics