Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification

  • Huijuan Lu
  • Yige Xu
  • Minchao YeEmail author
  • Ke Yan
  • Qun Jin
  • Zhigang Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10954)


Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.


Cost-sensitive Misclassification cost Correct classification rate Parameter fitting 



This study is supported by National Natural Science Foundation of China (Nos. 61272315, 61402417, 61602431 and 61701468), Zhejiang Provincial Natural Science Foundation (Nos. Y1110342, LY15F020037) and International Cooperation Project of Zhejiang Provincial Science and Technology Department (No. 2017C34003).


  1. 1.
    Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  2. 2.
    Yan, K., Ma, L.L., Dai, Y.T., et al.: Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis. Int. J. Refrig. 86, 401–409 (2018)CrossRefGoogle Scholar
  3. 3.
    Lu, H.J., Yang, L., Yan, K., et al.: A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing 228, 270–276 (2017)CrossRefGoogle Scholar
  4. 4.
    Cao, P., Zhao, D., Zaiane, O.: An optimized cost-sensitive SVM for imbalanced data learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS, vol. 7819, pp. 280–292. Springer, Berlin (2013). Scholar
  5. 5.
    Zheng, E., Zhang, C., Liu, X., Lu, H., Sun, J.: Cost-sensitive extreme learning machine. In: Motoda, H., et al. (eds.) ADMA 2013. LNCS (LNAI), vol. 8347, pp. 478–488. Springer, Heidelberg (2013). Scholar
  6. 6.
    Liu, Y., Lu, H., Yan, K., et al.: Applying cost-sensitive extreme learning machine and dissimilarity integration to gene expression data classification. Comput. Intell. Neurosci. 2016 (2016). Article ID 8056253Google Scholar
  7. 7.
    Lu, H.J., Chen, J.Y., Yan, K., et al.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256, 56–62 (2017)CrossRefGoogle Scholar
  8. 8.
    Yan, K., Ji, Z.W., Shen, W.: Online fault detection methods for chillers combining extended Kalman filter and recursive one-class SVM. Neurocomputing 228, 205–212 (2017)CrossRefGoogle Scholar
  9. 9.
    Cheng, X.Y., Chai, F.X., et al.: 1stOpt and global optimization platform—comparison and case study. In: Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology, Chengdu, China, pp. 18–21 (2011)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Huijuan Lu
    • 1
  • Yige Xu
    • 1
  • Minchao Ye
    • 1
    Email author
  • Ke Yan
    • 1
  • Qun Jin
    • 2
  • Zhigang Gao
    • 3
  1. 1.College of Information EngineeringChina Jiliang UniversityHangzhouChina
  2. 2.Faculty of Human SciencesWaseda UniversityTokorozawaJapan
  3. 3.College of Computer ScienceHangzhou Dianzi UniversityHangzhouChina

Personalised recommendations