Advertisement

A Review on Ensembles-Based Approach to Overcome Class Imbalance Problem

  • Sujit KumarEmail author
  • J. N. Madhuri
  • Mausumi Goswami
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 906)

Abstract

Predictive analytics incorporate various statistical techniques from predictive modelling, machine learning and data mining to analyse large database for future prediction. Data mining is a powerful technology to help organization to concentrate on most important data by extracting useful information from large database. With the improvement in technology day by day large amount of data are collected in raw form and as a result necessity of using data mining techniques in various domains are increasing. Class imbalance is an open challenge problem in data mining and machine learning. It occurs due to imbalanced data set. A data set is considered as imbalanced when a data set contains number of instance in one class vastly outnumber the number of instances in other class. When traditional data mining algorithms trained with imbalanced data sets, it gives suboptimal classification model. Recently class imbalance problem have gain significance attention from data mining and machine learning researcher community due to its presence in many real world problem such as remote-sensing, pollution detection, risk management, fraud detection and medical diagnosis. Several methods have been proposed to overcome the problem of class imbalance problem. In this paper, our goal is to review various methods which are proposed to overcome the effect of imbalance data on classification learning algorithms.

Keywords

Class imbalance Bagging Boosting Classification Ensemble Sampling Ensemble approach for class imbalance 

References

  1. 1.
    Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the class imbalance problem.  https://doi.org/10.1109/icnc.2008.871, IEEE.
  2. 2.
    Yang, Z., Tang, W., Shintemirov, A., & Wu, Q. (2009). Association rule mining based dissolved gas analysis for fault diagnosis of power transformers. IEEE Transactions on Systems, Man, and Cybernetics, Part C, (Applications and Reviews), 39(6), 597–610.CrossRefGoogle Scholar
  3. 3.
    Zhu, Z.-B., & Song, Z.-H. (2010). Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chemical Engineering Research and Design, 88(8), 936–951.CrossRefGoogle Scholar
  4. 4.
    Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo, J. Y., Baker, J. A., & Tourassi, G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks, 21(2–3), 427–436.CrossRefGoogle Scholar
  5. 5.
    Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). Iterative Boolean combination of classifiers in the roc space: An application to anomaly detection with hmms. Pattern Recognition, 43(8), 2732–2752.CrossRefGoogle Scholar
  6. 6.
    Tavallaee, M., Stakhanova, N., & Ghorbani, A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C, (Applications and Reviews), 40(5), 516–524.CrossRefGoogle Scholar
  7. 7.
    Liu, Y.-H., & Chen, Y.-T. (2005). Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In Proceedings of the IEEE International Conference on Systems, Man, Cybernetics, Vol. 2, pp. 1704–1711.Google Scholar
  8. 8.
    Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30, 195–215.CrossRefGoogle Scholar
  9. 9.
    Galar, M., Fern´andez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. In IEEE Transaction on Systems, Man, and Cybernetics-Part C: Application and Review, IEEE.Google Scholar
  10. 10.
    Nguyen, G. H., Bouzerdoum, A., & Phung, S. (2009). Learning pattern classification tasks with imbalanced data sets. In P. Yin (Ed.), Pattern recognition (pp. 193–208).Google Scholar
  11. 11.
    Liu, B., Ma, Y., & Wong, C. (2000). Improving an association rule based classifier. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Principles of data mining and knowledge discovery (Lecture Notes in Computer Science Series 1910) (pp. 293–317).Google Scholar
  12. 12.
    Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in non standard situations. Machine Learning, 46, 191–202.CrossRefGoogle Scholar
  13. 13.
    Barandela, R., Sanchez, J. S., Garcýa, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36(3), 849–851.CrossRefGoogle Scholar
  14. 14.
    Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Conference on Machine Learning, pp. 217–225.Google Scholar
  15. 15.
    Ezawa, K.J., Singh, M., & Norton, S.W. (1996). Learning goal oriented Bayesian networks for telecommunications management. In Proceedings of the 13th International Conference on Machine Learning, pp. 139–147.Google Scholar
  16. 16.
    Kubat, M., Holte, R., & Matwin, S. (1998). Detection of oil-spills in radar images of sea surface. Machine Learning, 30, 195–215.CrossRefGoogle Scholar
  17. 17.
    Barandela, R., Sánchez, J. S., García, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36, 849–851.CrossRefGoogle Scholar
  18. 18.
    Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancingmachine learning training data. SIGKDD Explorations Newsletters, 6, 20–29.CrossRefGoogle Scholar
  19. 19.
    Stefanowski, J., & Wilk, S. (2008). Selective pre-processing of imbalanced data for improving classification performance. In I.-Y. Song, J. Eder, & T. Nguyen, (Eds.), Data Warehousing and Knowledge Discovery (Lecture Notes in Computer Science Series 5182), pp. 283–292.Google Scholar
  20. 20.
    Zhang, S., Liu, L., Zhu, X., & Zhang, C. (2008). A strategy for attributes selection in cost sensitive decision trees induction. In Proceedings of the IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 8–13.Google Scholar
  21. 21.
    Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.CrossRefGoogle Scholar
  22. 22.
    Hart, P. E. (1968). The condensed nearest neighbour rule. IEEE Transactions on Information Theory, 14(3), 515–516.CrossRefGoogle Scholar
  23. 23.
    Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 3, 408–421.MathSciNetCrossRefGoogle Scholar
  24. 24.
    Liu, X.-Y., Wu, J., & Zhou, Z.-H.: Exploratory undersampling for class imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics Part B, Application Review, 39(2), 539–550.Google Scholar
  25. 25.
    Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference.Google Scholar
  26. 26.
    Cao, D. S., Xu, Q. S., Liang, Y.-Z., Zhang, L.-X., & Li, H.-D. (2010). The boosting: A new idea of building models. Chemometrics and Intelligent Laboratory Systems, 100, 1–11.CrossRefGoogle Scholar
  27. 27.
    Bauer, E., & Kohavi, R. (1998). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. In Machine Learning, vv, 1, Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.Google Scholar
  28. 28.
    Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.zbMATHGoogle Scholar
  29. 29.
    Webb, G. I. (2000). MultiBoosting: A technique for combining boosting and wagging. Machine Learning, 40, 159–196, Kluwer Academic Publishers, Boston.Google Scholar
  30. 30.
    Chawla, N.V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the Knowledge Discovery Databases, pp. 107–119.Google Scholar
  31. 31.
    Seiffert, C., Khoshgoftaar, T., Van Hulse, J., & Napolitano, A. (2010). Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A, Systems, and Humans, 40(1), 185–197.CrossRefGoogle Scholar
  32. 32.
    Krawczyka, B., Galar, M., Jelen, Ł., & Herrera, F. (2016). Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Applied Soft Computing, Elsevier, 38, 714–726.CrossRefGoogle Scholar
  33. 33.
    Mustafa, G., Niu, Z., Yousif, A., & Tarus, J. (2015). Solving the class imbalance problems using RUSMultiBoost ensemble. In 10th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, Aveiro, Portugal.Google Scholar
  34. 34.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of CSEChrist (Deemed to Be University)BengaluruIndia

Personalised recommendations