Advertisement

Imbalanced Data Classification Method Based on Ensemble Learning

  • Yu XiangEmail author
  • Yongping Xie
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 517)

Abstract

Imbalanced data classification is one of the problems that emerged when classifier learning algorithms used in the worlds of business and industry. This paper proposes the methodology to improve the performance of imbalanced data classification. We balance data sets by using synthetic minority oversampling technique (SMOTE); noise generated by new data sets is eliminated by Tomek links (T-Links), support vector machine (SVM), k-nearest neighbor (KNN), and logistic regression (LR) which are selected as the base classifiers to improve classification by using stacked generalization, and the final result is generated by weighted voting. In the experiments, six UCI datasets are tested, and the experimental results show that the method is highly representative and can effectively improve the classification ability.

Keywords

Imbalanced data SMOTE–Tomek links Ensemble learning Stacked generalization Weighted voting 

References

  1. 1.
    Wei, W., Li, J., Cao, L., Ou, Y., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web-Internet Web Inf. Syst. 16(4), 449–475 (2013)CrossRefGoogle Scholar
  2. 2.
    Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38(1), 223–230 (2011)CrossRefGoogle Scholar
  3. 3.
    Yi, P.E.N.G., Gang, K.O.U., Guoxun, W.A.N.G., Wenshuai, W.U., Yong, S.H.I.: Ensemble of software defect predictors: an ahp-based evaluation method. Int. J. Inf. Technol. Decisi. Mak. 10(01), 187–206 (2011)CrossRefGoogle Scholar
  4. 4.
    Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2005)Google Scholar
  5. 5.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  6. 6.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)CrossRefGoogle Scholar
  7. 7.
    Zhi-Fei, Y.E., Wen, Y.M., Bao-Liang, L.U.: A survey of imbalanced pattern classification problems. Caai Trans. Intell. Syst. (2009)Google Scholar
  8. 8.
    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  9. 9.
    Graczyk, M., Lasota, T., Trawiński, B., Trawiński, K.: Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal, vol. 5991, pp. 340–350 (2010)CrossRefGoogle Scholar
  10. 10.
    Rojarath, A., Songpan, W., Pong-Inwong, C.: Improved ensemble learning for classification techniques based on majority voting. In: IEEE International Conference on Software Engineering and Service Science, pp. 107–110. IEEE (2017)Google Scholar
  11. 11.
    Bingyan, Xiong, Guoying, Wang, Weibin, Deng: Under-sampling method based on sample weight for imbalance data. J. Comput. Res. Dev. 53(11), 2613–2622 (2016)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Information and Communication EngineeringDalian University of TechnologyDalianChina

Personalised recommendations