Advertisement

A Novel Approach to Solve Class Imbalance Problem Using Noise Filter Method

  • Gillala RekhaEmail author
  • Amit Kumar Tyagi
  • V. Krishna Reddy
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)

Abstract

Today’s one of the popular pre-processing technique in handling class imbalance problems is over-sampling. It balances the datasets to achieve a high classification rate and also avoids the bias towards majority class samples. Over-sampling technique takes full minority samples in the training data into consideration while performing classification. But, the presence of some noise (in the minority samples and majority samples) may degrade the classification performance. Hence, this work introduces a noise filter over-sampling approach with Adaptive Boosting Algorithm (AdaBoost) for effective classification. This work evaluates the performance with the state of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 14 imbalance binary class datasets with various Imbalance Ratios (IR). The experimental results show that our approach works as promising and effective for dealing with imbalanced datasets using metrics like F-Measure and AUC.

Keywords

Class imbalance Ensemble learning method Noise filter Boosting Over-sampling 

References

  1. 1.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482. Springer (2009)Google Scholar
  2. 2.
    Cano, A., Zafra, A., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  4. 4.
    Codetta-Raiteri, D., Portinale, L.: Dynamic bayesian networks for fault detection, identification, and recovery in autonomous spacecraft. IEEE Trans. Syst. Man Cybern. Syst. 45(1), 13–24 (2015)CrossRefGoogle Scholar
  5. 5.
    Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)Google Scholar
  6. 6.
    Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRefGoogle Scholar
  7. 7.
    Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer (2005)Google Scholar
  8. 8.
    He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008, IEEE World Congress on Computational Intelligence, pp. 1322–1328. IEEE (2008)Google Scholar
  9. 9.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)Google Scholar
  10. 10.
    Kang, Q., Huang, B., Zhou, M.: Dynamic behavior of artificial Hodgkin-Huxley neuron model subject to additive noise. IEEE Trans. Cybern. 46(9), 2083–2093 (2016)CrossRefGoogle Scholar
  11. 11.
    Kang, Q., Zhou, M., An, J., Wu, Q.: Swarm intelligence approaches to optimal power flow problem with distributed generator failures in power networks. IEEE Trans. Autom. Sci. Eng. 10(2), 343–353 (2013)CrossRefGoogle Scholar
  12. 12.
    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)CrossRefGoogle Scholar
  13. 13.
    Liu, X.Y., Zhou, Z.H.: The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proceedings of the Sixth International Conference on Data Mining, pp. 970–974. IEEE (2006)Google Scholar
  14. 14.
    Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)Google Scholar
  15. 15.
    Oliker, N., Ostfeld, A.: A coupled classification-evolutionary optimization model for contamination event detection in water distribution systems. Water Res. 51, 234–245 (2014)CrossRefGoogle Scholar
  16. 16.
    Sáez, J.A., Galar, M., Luengo, J., Herrera, F.: INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf. Fusion 27, 19–32 (2016)CrossRefGoogle Scholar
  17. 17.
    Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)CrossRefGoogle Scholar
  18. 18.
    Somasundaram, A., Reddy, U.S.: Modelling a stable classifier for handling large scale data with noise and imbalance. In: 2017 International Conference on Computational Intelligence in Data Science (ICCIDS), pp. 1–6. IEEE (2017)Google Scholar
  19. 19.
    Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)CrossRefGoogle Scholar
  20. 20.
    Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 281–288 (2009)CrossRefGoogle Scholar
  21. 21.
    Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)CrossRefGoogle Scholar
  22. 22.
    Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: A novel noise filtering algorithm for imbalanced data. In: 2010 Ninth International Conference on Machine Learning and Applications (ICMLA), pp. 9–14. IEEE (2010)Google Scholar
  23. 23.
    Yin, H.L., Leong, T.Y.: A model driven approach to imbalanced data sampling in medical decision making. In: MedInfo, pp. 856–860 (2010)Google Scholar
  24. 24.
    Zhang, Y., Zhou, Z.H.: Cost-sensitive face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(10), 1758–1769 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Gillala Rekha
    • 1
    Email author
  • Amit Kumar Tyagi
    • 2
  • V. Krishna Reddy
    • 1
  1. 1.Department of CSEKoneru Lakshmaiah Education FoundationVaddeswaramIndia
  2. 2.Lingayas VidyapeethFaridabadIndia

Personalised recommendations