Advertisement

Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems

  • Jinyan Li
  • Simon Fong
Chapter
Part of the International Series on Computer Entertainment and Media Technology book series (ISCEMT)

Abstract

Imbalanced classification is a well-known NP-hard problem in data mining. Since there are more data from the majority classes than the minorities in imbalanced dataset, the resultant classifier would become over-fitted to the former and under-fitted to the latter. Previous solutions focus on increasing the learning sensitivity to the minorities and/or rebalancing sample sizes before learning. Using swarm intelligence algorithm, we propose a series of unified pre-processing approaches to address imbalanced classification problem. These methods used stochastic swarm heuristics to cooperatively optimize and fuse the distribution of an imbalanced training dataset. Foremost, as shown in our published paper, this series of algorithms indeed have an edge in relieving imbalanced problem. In this book chapter we take an in-depth and thorough evaluation of the performances of the contemporary swarm rebalancing algorithms. Through the experimental results, we observe that the proposed algorithms overcome the current 17 comparative algorithms. Though some are better than the others, in general these algorithm exhibit superior computational speed, high accuracy and acceptable reliability of classification model.

Keywords

Imbalanced classification Swarm intelligence Swarm rebalancing 

Notes

Acknowledgement

The authors are thankful to the financial support from the research grants, (1) MYRG2016-00069, titled ‘Nature-Inspired Computing and Metaheuristics Algorithms for Optimizing Data Mining Performance’ offered by RDAO/FST, University of Macau and Macau SAR government. (2) FDCT/126/2014/A3, titled ‘A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel’ offered by FDCT of Macau SAR government.

References

  1. 1.
    Brown, I. and C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 2012. 39(3): p. 3446–3453.CrossRefGoogle Scholar
  2. 2.
    Amin, A., et al., Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access, 2016. 4: p. 7940–7957.CrossRefGoogle Scholar
  3. 3.
    Li, J., et al., Solving the under-fitting problem for decision tree algorithms by incremental swarm optimization in rare-event healthcare classification. Journal of Medical Imaging and Health Informatics, 2016. 6(4): p. 1102–1110.CrossRefGoogle Scholar
  4. 4.
    Sun, A., E.-P. Lim, and Y. Liu, On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 2009. 48(1): p. 191–201.CrossRefGoogle Scholar
  5. 5.
    Kubat, M., R.C. Holte, and S. Matwin, Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998. 30(2–3): p. 195–215.CrossRefGoogle Scholar
  6. 6.
    Jinyan, L., F. Simon, and Y. Xin-She, Solving imbalanced dataset problems for high-dimensional image processing by swarm optimization, in Bio-Inspired Computation and Applications in Image Processing. 2016, ELSEVIER. p. 311–321.Google Scholar
  7. 7.
    Li, J., et al., Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. The Journal of Supercomputing, 2016. 72(10): p. 3708–3728.CrossRefGoogle Scholar
  8. 8.
    Quinlan, J.R. Bagging, boosting, and C4. 5. in AAAI/IAAI, Vol. 1. 1996.Google Scholar
  9. 9.
    Fan, W., et al. AdaCost: misclassification cost-sensitive boosting. in Icml. 1999.Google Scholar
  10. 10.
    Seiffert, C., et al., RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2010. 40(1): p. 185–197.CrossRefGoogle Scholar
  11. 11.
    Chen, C., A. Liaw, and L. Breiman, Using random forest to learn imbalanced data. University of California, Berkeley, 2004. 110.Google Scholar
  12. 12.
    Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321–357.Google Scholar
  13. 13.
    Li, J., S. Fong, and Y. Zhuang. Optimizing SMOTE by metaheuristics with neural network and decision tree. in Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on. 2015. IEEE.Google Scholar
  14. 14.
    Hu, S., et al. MSMOTE: improving classification performance when training data is imbalanced. in Computer Science and Engineering, 2009. WCSE'09. Second International Workshop on. 2009. IEEE.Google Scholar
  15. 15.
    Chawla, N.V., et al. SMOTEBoost: Improving prediction of the minority class in boosting. in European Conference on Principles of Data Mining and Knowledge Discovery. 2003. Springer.CrossRefGoogle Scholar
  16. 16.
    Kotsiantis, S., D. Kanellopoulos, and P. Pintelas, Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 2006. 30(1): p. 25–36.Google Scholar
  17. 17.
    Tomek, I., An experiment with the edited nearest-neighbor rule.IEEE Transactions on systems, Man, and Cybernetics, 1976(6): p. 448–452.Google Scholar
  18. 18.
    Bekkar, M. and T.A. Alitouche, Imbalanced data learning approaches review.International Journal of Data Mining & Knowledge Management Process, 2013. 3(4): p. 15.CrossRefGoogle Scholar
  19. 19.
    He, H. and E.A. Garcia, Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 2009. 21(9): p. 1263–1284.CrossRefGoogle Scholar
  20. 20.
    Tang, Y., et al., SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(1): p. 281–288.CrossRefGoogle Scholar
  21. 21.
    Li, J., et al., Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion, 2018. 39: p. 1–24.CrossRefGoogle Scholar
  22. 22.
    Nikolaou, N., et al., Cost-sensitive boosting algorithms: Do we really need them? Machine Learning, 2016. 104(2–3): p. 359–384.CrossRefGoogle Scholar
  23. 23.
    Li, J., et al. Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification. in Advanced Data Mining and Applications: 12th International Conference, ADMA 2016, Gold Coast, QLD, Australia, December 12-15, 2016, Proceedings 12. 2016. Springer.CrossRefGoogle Scholar
  24. 24.
    Viera, A.J. and J.M. Garrett, Understanding interobserver agreement: the kappa statistic. Fam Med, 2005. 37(5): p. 360–363.Google Scholar
  25. 25.
    Chen, Y.-W. and C.-J. Lin, Combining SVMs with various feature selection strategies, in Feature extraction. 2006, Springer. p. 315–324.Google Scholar
  26. 26.
    Stone, E.A., Predictor performance with stratified data and imbalanced classes. Nature methods, 2014. 11(8): p. 782.CrossRefGoogle Scholar
  27. 27.
    Tan, S., Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 2005. 28(4): p. 667–671.CrossRefGoogle Scholar
  28. 28.
    Maratea, A., A. Petrosino, and M. Manzo, Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 2014. 257: p. 331–341.CrossRefGoogle Scholar
  29. 29.
    Chawla, N.V. C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. in Proceedings of the ICML. 2003.Google Scholar
  30. 30.
    Poli, R., J. Kennedy, and T. Blackwell, Particle swarm optimization. Swarm intelligence, 2007. 1(1): p. 33–57.CrossRefGoogle Scholar
  31. 31.
    Kohavi, R. and G.H. John, Wrappers for feature subset selection. Artificial intelligence, 1997. 97(1–2): p. 273–324.CrossRefGoogle Scholar
  32. 32.
    Fonseca, C.M. and P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 1998. 28(1): p. 26–37.CrossRefGoogle Scholar
  33. 33.
    Li, X. and S. Ma, Multi-objective memetic search algorithm for multi-objective permutation flow shop scheduling problem. IEEE Access, 2016. 4: p. 2154–2165.CrossRefGoogle Scholar
  34. 34.
    Landis, J.R. and G.G. Koch, The measurement of observer agreement for categorical data. biometrics, 1977: p. 159–174.CrossRefGoogle Scholar
  35. 35.
    Fong, S., et al., Feature selection in life science classification: metaheuristic swarm search. IT Professional, 2014. 16(4): p. 24–29.CrossRefGoogle Scholar
  36. 36.
    Li, J., et al., Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. BioData Mining, 2016. 9(1): p. 37.CrossRefGoogle Scholar
  37. 37.
    Blumer, A., et al., Occam's razor. Information processing letters, 1987. 24(6): p. 377–380.CrossRefGoogle Scholar
  38. 38.
    Bifet, A., et al., Moa: Massive online analysis. Journal of Machine Learning Research, 2010. 11(May): p. 1601–1604.Google Scholar
  39. 39.
    He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. 2008. IEEE.Google Scholar
  40. 40.
    Liu, X.-Y., J. Wu, and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(2): p. 539–550.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Big Data PDU, Huawei Software Theologies, CO.LTDNanjingChina
  2. 2.Department of Computer and Information ScienceUniversity of MacauTaipaChina

Personalised recommendations