Abstract
Class imbalance learning is a recent topic, which helps us to detect the classes from unbalanced datasets. In various real scenarios, where we need to find the exceptional cases like credit card problem, brain tumor detection, etc., the traditional classification algorithms fail because they are designed in such a way that either their results are overwhelmed by the bigger class and or they ignore the smaller class as a noise and avoid it. In recent studies, it has been found that class imbalance problem itself is not a problem but there are certain other data distribution complexities, which when combined with the class imbalance problem degrade the performance of classifier. One of the major issues is noise in the data, which is a part of every real data in one form or another. This paper compares the oversampling and undersampling approaches of class imbalance learning in noisy environment and tries to find out which is the better approach in such case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yong, Y.: The Research of Imbalanced data-set of sample sampling method based on K-means cluster and Genetic algorithm. Energy Procedia, vol. 17, pp. 164–170. Sciverse ScienceDirect (2012)
Garcia, V., et. al.: The class imbalance problem in pattern classification and learning. Pattern analysis and learning group, Conreso Espanol de Informatica; pp. 283–291
Napierala, K., et. al.: Learning from Imbalance data in presence of Noisy and Borderline Examples. RSCTC, LNAI 6086, pp. 158–167, Springer-Verlag, Berlin Heidelberg (2010)
Satyashree, K.P.N.V., Murthy, J.V.R.: An exhaustiv literature review on class imbalance problem. Int. J. Emerging Trends Technol. Comput. Sci. 2(3), 109–118 (2013)
Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule base classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Fernandez, A., et al.: Hierarchical fuzzy rule base classification system with genetic rule selection for imbalanced data-sets. 50, 561–577 (2009)
Batuwita, R., Palade, V.: FSVM-CIL: fuzzy support vector machine for class imbalanced learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)
Zhao, Z., Zhong, P., Zhao, Y.: Learning SVM with weighted maximun margin criterion for classification of imbalanced data. Math. Comput. Model. 54, 1093–1099 (2011)
Galar, M., et al.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn. 46, 3412–3424 (2013)
Gu, X., et al.: New fuzzy support vector machine for the class imbalance problem in medical data-sets classification. The Scientific World Journal, vol. 2014, pp. 1–12, Hindawi Publishing Corporation (2014)
Seiffert, C., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. On Sys. Man and Cyber.-Part A 40(1), 185–197 (2010)
Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Stat Anal Data Min 2, 412–426 (2009)
Blaszczynski, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating Selective pre-processing of imbalanced data with ivotes ensemble. Rough sets and Current trends in Computing (Lecture notes in Computer Science Series 6086), Springer-Verlag, pp. 148–157, (2010)
Chawla, N., V., Lazarevic, A., Hall, L., O., Bowyer, K., W.: SMOTBoost: improving pridiction of the minority class in boosting. In proc. Knowledge Discovery databases, pp. 107–119 (2003)
Saez, J.A., Luengo, J., Stefanowksi, J., Herrera, F.: Managing BorderLine and Noisy examples in Imbalanced Classification by combining SMOTE with Ensemble filtering. IDEAL 2014, LNCS 8669, pp. 61–68, Springer, (2014)
Garcia, V., et. al.: Combined effects of Class Imbalance and Class Overlap on Instance-based Classification, IDEAL 2006, LNCS, vol. 4224, pp. 371–378, Springer Heidelberg (2006)
Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behaviour, In: Proc. 3rd Mexican International Conference on Artificial Intelligence, pp. 312–321, (2004)
Japkowicz, N.: Class imbalance: are we focussing on the right issue?, In proc. International workshop on learning from imbalanced data-sets II, (2003)
Jo, T., Japkowicz, N.: Class imbalance versus small disjuncts. SIGKDD Explorations 6, 40–49 (2004)
Batista, G.E.A.P.A., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Expl. Newl. 6(1), 20–29 (2004)
Chawla, N.V., et al.: SMOTE: synthetic minority over sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATLAB version 7.10.0. Natick, Massachusetts: The MathWorks Inc., 2010
Mark, H., Eibe, F., Geoffrey, H., Bernhard, P., Peter, R., Ian, H. Witten: the weka data mining software: an update. SIGKDD Explorations 11(1) (2009)
Wang, Q.: A hybrid sampling SVM approach to imbalanced data classification. Abstract and Applied Analysis, vol. 2014, pp 1–7, Hindwani Publishing Corporation, (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kaur, P., Gosain, A. (2018). Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise. In: Saini, A., Nayak, A., Vyas, R. (eds) ICT Based Innovations. Advances in Intelligent Systems and Computing, vol 653. Springer, Singapore. https://doi.org/10.1007/978-981-10-6602-3_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-6602-3_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6601-6
Online ISBN: 978-981-10-6602-3
eBook Packages: EngineeringEngineering (R0)