Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
- 51 Downloads
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel undersampling technique has been successfully applied in searching for the best majority class subset for training a good-performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.
Keywordsclass imbalanced problem under-sampling bagging evolutionary under-sampling ensemble learning machine learning data mining
Unable to display preview. Download preview PDF.
We would like to express our gratitude to both the associate editor and the anonymous reviewers for their constructive comments that improved the quality of our manuscript to a large extent. This work was supported by the National Natural Science Foundation of China (Grant No.61501229) and the Fundamental Research Funds for the Central Universities (NS2015091, NS2014067, NJ20160013).
- 10.Chang X J, Yu Y L, Yang Y, Hauptmann A G. Searching persuasively: joint event detection and evidence justification with limited supervision. In: Proceedings of the 23rd Annual ACM Conference on Multimedia. 2015, 581–590Google Scholar
- 11.Chang X J, Yang Y, Xing E P, Yu Y L. Complex event detection using semantic saliency and nearly-isotonic SVM. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 1348–1357Google Scholar
- 12.Chang X J, Yang Y, Hauptmann A G, Xing E P. Semantic concept discovery for large-scale zero-shot event detection. In: Proceedings of the 4th International Joint Conference on Artificial Intelligence. 2015Google Scholar
- 21.Drummond C, Holte R C. C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the International Conference on Machine Learning, Workshop on Learning from Imbalanced Datasets II. 2003, 1–8Google Scholar
- 22.Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new oversampling method in imbalanced data sets learning. In: Proceedings of International Conference on Intelligent Computing. 2005, 878–887Google Scholar
- 30.Wang S, Yao X. Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining. 2009, 324–331Google Scholar
- 35.Khoshgoftaar T M, Van Hulse J, Napolitano A. Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 41(3): 552–568Google Scholar
- 36.Chawla N V, Lazarevic A, Hall L O, Bowyer K W. SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. 2003, 107–119Google Scholar
- 37.Zhou Z H. Ensemble Methods: Foundations and Algorithms. Florida: CRC Press, 2012Google Scholar
- 41.Yan Y, Xu Z W, Tsang I W, Long G, Yang Y. Robust semi-supervised learning through label aggregation. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1–7Google Scholar