Abstract
All neural networks are not always effective in processing imbalanced datasets when dealing with text classification due to most of them designed under a balanced assumption. In this paper, we present a novel framework named BSIL to improve the capability of neural networks in imbalanced text classification built on brain storm optimization (BSO). With our framework BSIL, the simulation of human brainstorming process of BSO can sample imbalanced datasets in a reasonable way. Firstly, we present an approach to generate multiple relatively balanced subsets of an imbalanced dataset by applying scrambling segmentation and global random sampling in BSIL. Secondly, we introduce a parallel method to train a classifier for a subset efficiently. Finally, we propose a decision-making layer to accept “suggestions” of all classifiers in order to achieve the most reliable prediction result. The experimental results show that BSIL associated with CNN, RNN and Self-attention model can performs better than those models in imbalanced text classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Stouhi, S., Reddy, K.: Transfer learning for class imbalance problems with inadequate data. Knowl. Inf. Syst. 48(1), 201–228 (2016)
Charte, F., Rivera, J., del Jesus, J., Herrera, F.: REMEDIAL-HwR: tackling multilabel imbalance through label decoupling and data resampling hybridization. Neurocomputing 326, 110–122 (2019)
Charte, F., Rivera, J., del Jesus, J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Chen, W., Cao, Y., Sun, Y., Liu, Q., Li, Y.: Improving brain storm optimization algorithm via simplex search. arXiv, CoRR abs/1712.03166 (2017)
Cheng, S., Qin, Q., Chen, J., Shi, Y.: Brain storm optimization algorithm: a review. Artif. Intell. Rev. 46(4), 445–458 (2016)
Datta, S., Nag, S., Mullick, S., Das, S.: Diversifying support vector machines for boosting using kernel perturbation: Applications to class imbalance and small disjuncts. arXiv, CoRR abs/1712.08493 (2017)
He, H., Garcia, A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)
Khan, H., Hayat, M., Bennamoun, M., Sohel, A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 29(8), 3573–3587 (2018)
Kubat, M., Holte, C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of AAAI 2015, pp. 2267–2273 (2015)
Lin, C., Tsai, F., Hu, H., Jhang, S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)
Moreo A., Esuli A., Sebastiani F.: Distributional random oversampling for imbalanced text classification. In: Proceedings of SIGIR 2016, pp. 805–808 (2016)
Sun Y., Kamel M., Wang Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of ICDM 2017, pp. 592–602 (2006)
Wang, J., Chen, Y., Hao, S., Feng, W., Shen, Z.: Balanced distribution adaptation for transfer learning. In: Proceedings of ICDM 2017, pp. 1129–1134 (2017)
Wang, S., Minku, L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Acknowledgments
This work is supported by the National Key Research and Development Program of China (2017YFB1401200, 2017YFC0908401) and the National Natural Science Foundation of China (61672377). Xiaowang Zhang is supported by the Peiyang Young Scholars in Tianjin University (2019XRX-0032).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tian, J., Chen, S., Zhang, X., Feng, Z. (2019). BSIL: A Brain Storm-Based Framework for Imbalanced Text Classification. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)