Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification
Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.
KeywordsCost-sensitive Misclassification cost Correct classification rate Parameter fitting
This study is supported by National Natural Science Foundation of China (Nos. 61272315, 61402417, 61602431 and 61701468), Zhejiang Provincial Natural Science Foundation (Nos. Y1110342, LY15F020037) and International Cooperation Project of Zhejiang Provincial Science and Technology Department (No. 2017C34003).
- 6.Liu, Y., Lu, H., Yan, K., et al.: Applying cost-sensitive extreme learning machine and dissimilarity integration to gene expression data classification. Comput. Intell. Neurosci. 2016 (2016). Article ID 8056253Google Scholar
- 9.Cheng, X.Y., Chai, F.X., et al.: 1stOpt and global optimization platform—comparison and case study. In: Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology, Chengdu, China, pp. 18–21 (2011)Google Scholar