Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function

  • Yuri Sousa AurelioEmail author
  • Gustavo Matheus de Almeida
  • Cristiano Leite de Castro
  • Antonio Padua Braga


This paper presents a novel approach to deal with the imbalanced data set problem in neural networks by incorporating prior probabilities into a cost-sensitive cross-entropy error function. Several classical benchmarks were tested for performance evaluation using different metrics, namely G-Mean, area under the ROC curve (AUC), adjusted G-Mean, Accuracy, True Positive Rate, True Negative Rate and F1-score. The obtained results were compared to well-known algorithms and showed the effectiveness and robustness of the proposed approach, which results in well-balanced classifiers given different imbalance scenarios.


Multilayer perceptron Imbalanced data Classification problem Back-propagation Cost-sensitive function 



The authors would like to thank the funding agencies CNPq, FAPEMIG and CAPES for their financial support.


  1. 1.
    Chawla NV, Japkowicz N, Kotcz A (2004a) Special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRefGoogle Scholar
  2. 2.
    Chawla N, Japkowicz N, Kolcz A (2004b) Special issue on learning from imbalanced data sets. In: Editorial of the ACM SIGKDD explorations newsletterGoogle Scholar
  3. 3.
    He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  4. 4.
    López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRefGoogle Scholar
  5. 5.
    Bhowan U, Johnston M, Zhang M, Yao X (2013) Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans Evol Comput 17(3):368–386CrossRefGoogle Scholar
  6. 6.
    Frasca M, Bertoni A, Re M, Valentini G (2013) A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw 43:84–98CrossRefzbMATHGoogle Scholar
  7. 7.
    Wang L, Yang B, Chen Y, Zhang X, Orchard J (2017) Improving neural-network classifiers using nearest neighbor partitioning. IEEE Trans Neural Netw Learn Syst 28(10):2255–2267MathSciNetCrossRefGoogle Scholar
  8. 8.
    Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899CrossRefGoogle Scholar
  9. 9.
    Oh SH (2011) A statistical perspective of neural networks for imbalanced data problems. Int J Contents 7(3):1–5CrossRefGoogle Scholar
  10. 10.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkzbMATHGoogle Scholar
  11. 11.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357Google Scholar
  12. 12.
    Barandela R, Valdovinos RM, Sánchez JS, Ferri FJ (2004) The imbalanced training sample problem: under or over sampling? In: Structural, syntactic, and statistical pattern recognition. Springer, pp 806–814Google Scholar
  13. 13.
    He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328Google Scholar
  14. 14.
    Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830CrossRefGoogle Scholar
  15. 15.
    Chen S, He H, Garcia EA (2010) Ramoboost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624–1642CrossRefGoogle Scholar
  16. 16.
    Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378CrossRefzbMATHGoogle Scholar
  17. 17.
    Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336CrossRefzbMATHGoogle Scholar
  18. 18.
    Kukar M, Kononenko I (1998) Cost-sensitive learning with neural networks. In: ECAI, pp 445–449Google Scholar
  19. 19.
    Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence. Lawrence Erlbaum Associates Ltd, pp 973–978Google Scholar
  20. 20.
    Alejo R, García V, Sotoca JM, Mollineda RA, Sánchez JS (2007) Improving the performance of the rbf neural networks trained with imbalanced samples. In: Computational and ambient intelligence. Springer, pp 162–169Google Scholar
  21. 21.
    Kline DM, Berardi VL (2005) Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput Appl 14(4):310–318CrossRefGoogle Scholar
  22. 22.
    Berger JO (2010) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New YorkGoogle Scholar
  23. 23.
    Riedmiller M, Braun H (1993) A direct adaptive method for faster back propagation learning: the rprop algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591Google Scholar
  24. 24.
    Zhu C, Wang Z (2017) Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit Lett 88:72–80CrossRefGoogle Scholar
  25. 25.
    Tomek I (1976) Two modifications of cnn. IEEE Trans Syst Man Cybern 6:769–772MathSciNetzbMATHGoogle Scholar
  26. 26.
    Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231CrossRefzbMATHGoogle Scholar
  27. 27.
    Kubat M, Matwin S (1997) Addressing the curse of imbalanced trainingsets: one-sided selection. In: ICML, Nashville, USA, vol 97, pp 179–186Google Scholar
  28. 28.
    Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874MathSciNetCrossRefGoogle Scholar
  29. 29.
    Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(04):1250003CrossRefGoogle Scholar
  30. 30.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetzbMATHGoogle Scholar
  31. 31.
    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefzbMATHGoogle Scholar
  32. 32.
    Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Federal University of Minas GeraisBelo HorizonteBrazil

Personalised recommendations