Oversample Based Large Scale Support Vector Machine for Online Class Imbalance Problem
Dealing with online class imbalance from evolving stream is a critical issue than the conventional class imbalance problem. Usually, the class imbalance problem occurs when one class of data severely outnumbers the other classes of data, thus leads to skewed class boundaries. In the case of online class imbalance problem, the degree of class imbalance changes over time and the present state of imbalance is not known a prior to the learner. To address such problem, in this paper, we present an Oversampling based Online Large Scale Support Vector Machine (OOLASVM) algorithm which is a hybrid of active sample selection and over sampling of Support Vectors and thereby both oversampling and under sampling coexists while learning the new boundary. Further, OOLASVM maintains the balanced boundary throughout the learning process. Results on simulated and real world datasets demonstrate that proposed OOLASVM yields better performance than existing approaches such as Generalized Oversampling based Online Imbalanced Learners and Over Online Bagging.
KeywordsOnline learning Dynamic class imbalance Active learning Support Vector Machines Oversampling
This work is supported by the Defense Research and Development Organization (DRDO), India, under the sanction code: ERIPR/GIA/17-18/038. Center For Artificial Intelligence and Robotics (CAIR) is acting as the reviewing lab for the work is concerned.
- 6.Wu, G., Chang, E.: Class-boundary alignment for imbalanced dataset Learning. In: ICML 2003, Santa Barbara, California (2003)Google Scholar
- 9.Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach-a case study in intensive care monitoring. In: ICML (1999)Google Scholar
- 11.Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 275, 1356–1368 (2014)Google Scholar
- 13.Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream. Int. J. Mach. Learn. Cybern. 51, 51–62 (2013)Google Scholar
- 15.Yan, Y., Yang, T., Chen, J.: A framework of online learning with imbalanced streaming data. In: Sing, S.P., Markovitch, S. (eds.) Conference on Artificial Intelligence 2017, San Francisco, pp. 2817–2823. AAAI Press (2017)Google Scholar
- 20.Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. A technical report MSR-TR-98-14 (1998)Google Scholar
- 21.Ertekin, S., Huang, J., Lee Giles, C.: Active learning class imbalance Problem. In: Kraaij, W., de Vries, AP., Clarke, L.A.C., Fuhr, N., Kando, N. (eds.) Conference on Research and Development in Information Retrieval 2007, Netherlands, pp. 823–824 (2007). https://doi.org/10.1145/1277741
- 24.UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/. Accessed 26 June 2018