Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise

Kaur, Prabhjot; Gosain, Anjana

doi:10.1007/978-981-10-6602-3_3

Prabhjot Kaur¹⁷ &
Anjana Gosain¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 653))

1407 Accesses
63 Citations

Abstract

Class imbalance learning is a recent topic, which helps us to detect the classes from unbalanced datasets. In various real scenarios, where we need to find the exceptional cases like credit card problem, brain tumor detection, etc., the traditional classification algorithms fail because they are designed in such a way that either their results are overwhelmed by the bigger class and or they ignore the smaller class as a noise and avoid it. In recent studies, it has been found that class imbalance problem itself is not a problem but there are certain other data distribution complexities, which when combined with the class imbalance problem degrade the performance of classifier. One of the major issues is noise in the data, which is a part of every real data in one form or another. This paper compares the oversampling and undersampling approaches of class imbalance learning in noisy environment and tries to find out which is the better approach in such case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yong, Y.: The Research of Imbalanced data-set of sample sampling method based on K-means cluster and Genetic algorithm. Energy Procedia, vol. 17, pp. 164–170. Sciverse ScienceDirect (2012)
Google Scholar
Garcia, V., et. al.: The class imbalance problem in pattern classification and learning. Pattern analysis and learning group, Conreso Espanol de Informatica; pp. 283–291
Google Scholar
Napierala, K., et. al.: Learning from Imbalance data in presence of Noisy and Borderline Examples. RSCTC, LNAI 6086, pp. 158–167, Springer-Verlag, Berlin Heidelberg (2010)
Google Scholar
Satyashree, K.P.N.V., Murthy, J.V.R.: An exhaustiv literature review on class imbalance problem. Int. J. Emerging Trends Technol. Comput. Sci. 2(3), 109–118 (2013)
Google Scholar
Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule base classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Article Google Scholar
Fernandez, A., et al.: Hierarchical fuzzy rule base classification system with genetic rule selection for imbalanced data-sets. 50, 561–577 (2009)
Google Scholar
Batuwita, R., Palade, V.: FSVM-CIL: fuzzy support vector machine for class imbalanced learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)
Article Google Scholar
Zhao, Z., Zhong, P., Zhao, Y.: Learning SVM with weighted maximun margin criterion for classification of imbalanced data. Math. Comput. Model. 54, 1093–1099 (2011)
Article MATH Google Scholar
Galar, M., et al.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn. 46, 3412–3424 (2013)
Article Google Scholar
Gu, X., et al.: New fuzzy support vector machine for the class imbalance problem in medical data-sets classification. The Scientific World Journal, vol. 2014, pp. 1–12, Hindawi Publishing Corporation (2014)
Google Scholar
Seiffert, C., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. On Sys. Man and Cyber.-Part A 40(1), 185–197 (2010)
Article Google Scholar
Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Stat Anal Data Min 2, 412–426 (2009)
Article MathSciNet Google Scholar
Blaszczynski, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating Selective pre-processing of imbalanced data with ivotes ensemble. Rough sets and Current trends in Computing (Lecture notes in Computer Science Series 6086), Springer-Verlag, pp. 148–157, (2010)
Google Scholar
Chawla, N., V., Lazarevic, A., Hall, L., O., Bowyer, K., W.: SMOTBoost: improving pridiction of the minority class in boosting. In proc. Knowledge Discovery databases, pp. 107–119 (2003)
Google Scholar
Saez, J.A., Luengo, J., Stefanowksi, J., Herrera, F.: Managing BorderLine and Noisy examples in Imbalanced Classification by combining SMOTE with Ensemble filtering. IDEAL 2014, LNCS 8669, pp. 61–68, Springer, (2014)
Google Scholar
Garcia, V., et. al.: Combined effects of Class Imbalance and Class Overlap on Instance-based Classification, IDEAL 2006, LNCS, vol. 4224, pp. 371–378, Springer Heidelberg (2006)
Google Scholar
Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behaviour, In: Proc. 3rd Mexican International Conference on Artificial Intelligence, pp. 312–321, (2004)
Google Scholar
Japkowicz, N.: Class imbalance: are we focussing on the right issue?, In proc. International workshop on learning from imbalanced data-sets II, (2003)
Google Scholar
Jo, T., Japkowicz, N.: Class imbalance versus small disjuncts. SIGKDD Explorations 6, 40–49 (2004)
Article Google Scholar
Batista, G.E.A.P.A., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Expl. Newl. 6(1), 20–29 (2004)
Article Google Scholar
Chawla, N.V., et al.: SMOTE: synthetic minority over sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
MATLAB version 7.10.0. Natick, Massachusetts: The MathWorks Inc., 2010
Google Scholar
Mark, H., Eibe, F., Geoffrey, H., Bernhard, P., Peter, R., Ian, H. Witten: the weka data mining software: an update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Wang, Q.: A hybrid sampling SVM approach to imbalanced data classification. Abstract and Applied Analysis, vol. 2014, pp 1–7, Hindwani Publishing Corporation, (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Maharaja Surajmal Institute of Technology, Janakpuri, New Delhi, India
Prabhjot Kaur
USICT, Guru Gobind Singh Indraprastha University, New Delhi, India
Anjana Gosain

Authors

Prabhjot Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Anjana Gosain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prabhjot Kaur .

Editor information

Editors and Affiliations

USMS, GGSIP University, New Delhi, Delhi, India
A. K. Saini
Indian Institute of Business Management, Patna, Bihar, India
A. K. Nayak
Institute of Life Long Learning (ILLL), New Delhi, Delhi, India
Ram Krishna Vyas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaur, P., Gosain, A. (2018). Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise. In: Saini, A., Nayak, A., Vyas, R. (eds) ICT Based Innovations. Advances in Intelligent Systems and Computing, vol 653. Springer, Singapore. https://doi.org/10.1007/978-981-10-6602-3_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-6602-3_3
Published: 01 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6601-6
Online ISBN: 978-981-10-6602-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics