A Membership Probability–Based Undersampling Algorithm for Imbalanced Data
- 15 Downloads
Classifiers for a highly imbalanced dataset tend to bias in majority classes and, as a result, the minority class samples are usually misclassified as majority class. To overcome this, a proper undersampling technique that removes some majority samples can be an alternative. We propose an efficient and simple undersampling method for imbalanced datasets and show that the proposed method outperforms others with respect to four different performance measures by several illustrative experiments, especially for highly imbalanced datasets.
KeywordsImbalanced class problem undersampling membership probability information loss
This work has been supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(2019R1A2C1088255), and Research Project Program for Newly-Recruited Personnel funded by the Ministry of Science and Technology of Taiwan, R.O.C. (MOST 108 - 2218 - E - 027 - 008 - MY2).
- Chawla, N. V. (2010). “Data mining for imbalanced datasets: An overview”, In Data Mining and Knowledge Discovery Handbook (pp. 875-886). Springer.Google Scholar
- Chyi, Y.M. (2003). “Classification analysis techniques for skewed class distribution problems”, Master Thesis, Department of Information Management, National Sun Yat-Sen University.Google Scholar
- Kang, P., & Cho, S. (2006). “EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems”, In Neural Information Processing (pp. 837-846).Google Scholar
- Passos, I. C., Mwangi, B., Cao, B., Hamilton, J. E., Wu, M. J., Zhang, X. Y., Zunta-Soares, G. B., Quevedo, J., Kauer-Santanna, M., Kapczinski, F., & Soares, J. C. (2016). Identifying a clinical signature of suicidality among patients with mood disorders: A pilot study using a machine learning approach. Journal of Affective Disorders, 193, 109–116.CrossRefGoogle Scholar
- Provost, F., & Fawcett, T. (2013). “Fitting a model to data”, in Data Science for Business: What you need to know about data mining and data-analytic thinking. California: O’Reilly Media.Google Scholar
- Quinlan, J.R. (2014). C4.5: Programs for Machine Learning. Elsevier.Google Scholar
- Tutz, G. (2012). Regression for categorical data. Cambridge University Press.Google Scholar