Abstract
Class imbalance problem has been a widely studied problem in data mining. In this paper, we present a new filter approach to the class imbalance problem that uses repeated under-sampling to create balanced data sets and then uses majority voting ensemble type learning to create a meta-classifier. We test our method on five imbalanced data sets and compare its performance with that of three other techniques. We show that our method significantly improves the prediction accuracy of the under-represented class while also reducing the gap in prediction accuracy between the two classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, W.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Int. J. Pattern Recogn. Artif. Intell. 7(6), 1417–1436 (1993)
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of International Conference on Data Mining, pp. 592–602 (2006)
Liu, X.Y., Zhou, Z.H.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
UC Irvine Machine Learning Repository (2009). http://archive.ics.uci.edu/ml/
Garcia, V.: The class imbalance problem in pattern classification and learning. Congreso Espanol de Informatica, vol. 9 (2013)
Batista, G.E., Pratti, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Expl. 6, 20–29 (2004)
Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proceeding of 4th International Conference on Knowledge Discovery and Data Mining, pp. 73–79 (1998)
Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under sampling beats over-sampling. In: Proceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets II (2003)
Han, H.: Borderline - SMOTE. Springer, Berlin (2005)
Raskutti, B., Kowalczyk, A.: Extreme rebalancing for svms: a case study. SIGKDD Expl. 6, 60–69 (2004)
Domingos, P.: Metacost: a general method for making classifiers costsensitive. In: Proceedings of 5th International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Gordon, D.F., Perlis, D.: Explicitly biased generalization. Comput. Intell. 5, 67–81 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sikora, R., Raina, S. (2019). Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2018. Advances in Intelligent Systems and Computing, vol 857. Springer, Cham. https://doi.org/10.1007/978-3-030-01177-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-01177-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01176-5
Online ISBN: 978-3-030-01177-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)