Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis
- 15 Downloads
Aiming at the problem of low accuracy of classification learning algorithm caused by serious imbalance of sample set in medical diagnostic application, this paper proposes a distribution-sensitive oversampling algorithm for imbalanced data. The algorithm accurately divides the minority samples into noise samples, unstable samples, boundary samples and stable samples according to the location of the minority samples. Different samples are processed differently to select the most suitable sample for the synthesis of new samples. In the case of sample synthesis, a distribution-sensitive sample synthesis method is adopted. Different sample synthesis methods are selected according to their different distance from the surrounding minority samples, so as to ensure that the newly synthesized samples have the same characteristics with the original minority samples. The real medical diagnostic data test shows that this algorithm improves the accuracy rate of classification learning algorithm compared with the existing sampling algorithms, especially for the accuracy rate and recall rate of minority classes.
KeywordsMedical diagnosis Imbalanced data Data resampling Oversampling Undersampling Classification learning
Funded by NSFC (No. 61672020), the national key research and development program[2016YFB0800303], Supported by DongGuan Innovative Research Team Program.
Compliance with Ethical Standards
Declaration of Conflict of Interest
Weihong Han, Zizhong Huang, Shudong Li and Yan Jia declare no conflict of interest directly related to the submitted work.
This article does not contain any studies with human participants performed by any of the authors.
- 8.Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap C., Safe-level-SMOTE: Safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem[C]// Pacific-Asia conference on advances in knowledge discovery and data mining. Springer-Verlag, :475–482, 2009.Google Scholar
- 9.Han, H., Wang, W. Y., and Mao, B. H., Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[A]. Int. Conf. Intell. Comput. 3644(5):878–887, 2005.Google Scholar
- 11.Bunkhumpornpat, C., and Sinapiromsaran, K. CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique.[M]. Inderscience Publishers, 2015.Google Scholar
- 12.Bennin, K.E. and Keung, J. et al., MAHAKIL: Diversity based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction[J]. IEEE Transactions on Software Engineering, (99) :1–1, 2017.Google Scholar
- 14.Douzas, G., Bacao, F., and Last, F., Improving imbalanced learning through a heuristic oversampling method based on K-means and SMOTE[J]. Information Sciences, 2018.Google Scholar
- 15.Jin, S., and Pedersen, T., Duluth UROP at SemEval-2018 task 2: Multilingual emoji prediction with ensemble learning and oversampling[J]. 2018.Google Scholar