Imbalanced Data Classification Method Based on Ensemble Learning
Imbalanced data classification is one of the problems that emerged when classifier learning algorithms used in the worlds of business and industry. This paper proposes the methodology to improve the performance of imbalanced data classification. We balance data sets by using synthetic minority oversampling technique (SMOTE); noise generated by new data sets is eliminated by Tomek links (T-Links), support vector machine (SVM), k-nearest neighbor (KNN), and logistic regression (LR) which are selected as the base classifiers to improve classification by using stacked generalization, and the final result is generated by weighted voting. In the experiments, six UCI datasets are tested, and the experimental results show that the method is highly representative and can effectively improve the classification ability.
KeywordsImbalanced data SMOTE–Tomek links Ensemble learning Stacked generalization Weighted voting
- 4.Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2005)Google Scholar
- 7.Zhi-Fei, Y.E., Wen, Y.M., Bao-Liang, L.U.: A survey of imbalanced pattern classification problems. Caai Trans. Intell. Syst. (2009)Google Scholar
- 10.Rojarath, A., Songpan, W., Pong-Inwong, C.: Improved ensemble learning for classification techniques based on majority voting. In: IEEE International Conference on Software Engineering and Service Science, pp. 107–110. IEEE (2017)Google Scholar
- 11.Bingyan, Xiong, Guoying, Wang, Weibin, Deng: Under-sampling method based on sample weight for imbalance data. J. Comput. Res. Dev. 53(11), 2613–2622 (2016)Google Scholar