A Review on Ensembles-Based Approach to Overcome Class Imbalance Problem
Predictive analytics incorporate various statistical techniques from predictive modelling, machine learning and data mining to analyse large database for future prediction. Data mining is a powerful technology to help organization to concentrate on most important data by extracting useful information from large database. With the improvement in technology day by day large amount of data are collected in raw form and as a result necessity of using data mining techniques in various domains are increasing. Class imbalance is an open challenge problem in data mining and machine learning. It occurs due to imbalanced data set. A data set is considered as imbalanced when a data set contains number of instance in one class vastly outnumber the number of instances in other class. When traditional data mining algorithms trained with imbalanced data sets, it gives suboptimal classification model. Recently class imbalance problem have gain significance attention from data mining and machine learning researcher community due to its presence in many real world problem such as remote-sensing, pollution detection, risk management, fraud detection and medical diagnosis. Several methods have been proposed to overcome the problem of class imbalance problem. In this paper, our goal is to review various methods which are proposed to overcome the effect of imbalance data on classification learning algorithms.
KeywordsClass imbalance Bagging Boosting Classification Ensemble Sampling Ensemble approach for class imbalance
- 1.Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the class imbalance problem. https://doi.org/10.1109/icnc.2008.871, IEEE.
- 7.Liu, Y.-H., & Chen, Y.-T. (2005). Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In Proceedings of the IEEE International Conference on Systems, Man, Cybernetics, Vol. 2, pp. 1704–1711.Google Scholar
- 9.Galar, M., Fern´andez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. In IEEE Transaction on Systems, Man, and Cybernetics-Part C: Application and Review, IEEE.Google Scholar
- 10.Nguyen, G. H., Bouzerdoum, A., & Phung, S. (2009). Learning pattern classification tasks with imbalanced data sets. In P. Yin (Ed.), Pattern recognition (pp. 193–208).Google Scholar
- 11.Liu, B., Ma, Y., & Wong, C. (2000). Improving an association rule based classifier. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Principles of data mining and knowledge discovery (Lecture Notes in Computer Science Series 1910) (pp. 293–317).Google Scholar
- 14.Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Conference on Machine Learning, pp. 217–225.Google Scholar
- 15.Ezawa, K.J., Singh, M., & Norton, S.W. (1996). Learning goal oriented Bayesian networks for telecommunications management. In Proceedings of the 13th International Conference on Machine Learning, pp. 139–147.Google Scholar
- 19.Stefanowski, J., & Wilk, S. (2008). Selective pre-processing of imbalanced data for improving classification performance. In I.-Y. Song, J. Eder, & T. Nguyen, (Eds.), Data Warehousing and Knowledge Discovery (Lecture Notes in Computer Science Series 5182), pp. 283–292.Google Scholar
- 20.Zhang, S., Liu, L., Zhu, X., & Zhang, C. (2008). A strategy for attributes selection in cost sensitive decision trees induction. In Proceedings of the IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 8–13.Google Scholar
- 24.Liu, X.-Y., Wu, J., & Zhou, Z.-H.: Exploratory undersampling for class imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics Part B, Application Review, 39(2), 539–550.Google Scholar
- 25.Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference.Google Scholar
- 27.Bauer, E., & Kohavi, R. (1998). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. In Machine Learning, vv, 1, Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.Google Scholar
- 29.Webb, G. I. (2000). MultiBoosting: A technique for combining boosting and wagging. Machine Learning, 40, 159–196, Kluwer Academic Publishers, Boston.Google Scholar
- 30.Chawla, N.V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the Knowledge Discovery Databases, pp. 107–119.Google Scholar
- 33.Mustafa, G., Niu, Z., Yousif, A., & Tarus, J. (2015). Solving the class imbalance problems using RUSMultiBoost ensemble. In 10th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, Aveiro, Portugal.Google Scholar
- 34.https://archive.ics.uci.edu/ml, March 30, 2018.