Skip to main content
Log in

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper a new dynamic over-sampling method is proposed, it is a hybrid method that combines a well known over-sampling technique (SMOTE) with the sequential back-propagation algorithm. The method is based on the back-propagation mean square error (MSE) for automatically identifying the over-sampling rate, i.e., it allows only the use of necessary training samples for dealing with the class imbalance problem and avoiding to increase excessively the (neural networks) NN training time. The main aim of the proposed method is to obtain a trade-off between NN classification performance and NN training time on scenarios where the training data set represents a multi-class classification problem, it is high imbalanced and it might request a large NN training time. Experimental results on fifteen multi-class imbalanced data sets show that the proposed method is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html

References

  1. A. Asuncion, D.N.: UCI machine learning repository (2007). www.ics.uci.edu/mlearn/

  2. Alejo R, Valdovinos RM, García V, Pacheco-Sanchez JH (2012) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388

    Article  Google Scholar 

  3. Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969

    Article  Google Scholar 

  4. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29

    Article  Google Scholar 

  5. Batista GEAPA, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. In: IDA, pp. 24–35

  6. Bruzzone L, Serpico S (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328

    Article  Google Scholar 

  7. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  8. Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252

    Article  MathSciNet  Google Scholar 

  9. Crone SF, Lessmann S, Stahlbock R (2006) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper Res 173(3):781–800

    Article  MathSciNet  MATH  Google Scholar 

  10. Debowski B, Areibi S, Gréwal G, Tempelman J (2012). A dynamic sampling framework for multi-class imbalanced data. ICMLA 2:113–118

  11. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  12. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874

    Article  Google Scholar 

  13. Fernández-Navarro F, Hervás-Martínez C, Antonio Gutiérrez P (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833

    Article  MATH  Google Scholar 

  14. Fernández-Navarro F, Hervás-Martínez C, García-Alonso CR, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490

    Article  Google Scholar 

  15. García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17:275–306

    Article  Google Scholar 

  16. García V, Sánchez JS, Mollineda RA (2008) On the use of surrounding neighbors for synthetic over-sampling of the minority class. In: Proceedings of the 8th conference on Simulation., modelling and optimization, SMO’08Stevens Point, Wisconsin, USA, pp 389–394

  17. Han H, Wang W, Mao B (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. ICIC 1:878–887

    Google Scholar 

  18. Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186

    Article  MATH  Google Scholar 

  19. Haykin S (1999) Neural networks. A comprehensive foundation, 2nd edn. Pretince Hall, New Jersey

    MATH  Google Scholar 

  20. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328

  21. He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  22. Iman RL, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Commun Stat Theory Methods 9(6):571–595

    Article  Google Scholar 

  23. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    MATH  Google Scholar 

  24. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Emerging artificial intelligence applications in computer engineering, pp. 3–24

  25. Kretzschmar R, Karayiannis NB, Eggimann F (2005) Feedforward neural network models for handling class overlap and class imbalance. Int J Neural Syst 15(5):323–338

    Article  Google Scholar 

  26. Lawrence S, Burns I, Back A, Tsoi A, Giles CL (1998) Neural network classification and unequal prior class probabilities. In: Neural networks: tricks of the trade, LNCS. pp 299–314

  27. Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backProp. In: G. Orr, K. Müller (eds.) Neural networks-tricks of the trade, lecture notes in computer science, vol. 1524, pp. 5–50. Springer Verlag

  28. Li BY, Peng J, Chen YQ, Jin YQ (2006) Classifying unbalanced pattern groups by training neural network. ISNN 2:8–13

    Google Scholar 

  29. Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. Handbook of metaheuristics, international series in operations research and management science. Springer, New York, p 105144

    Google Scholar 

  30. Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117–128

    Article  MATH  Google Scholar 

  31. Oh SH (2011) Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6):1058–1061

    Article  Google Scholar 

  32. Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Facetwise analysis of xcs for problems with class imbalances. Trans Evol Comp 13:1093–1119

    Article  Google Scholar 

  33. Ou G, Murphey YL (2007) Multi-class pattern classification using neural networks. Pattern Recognit 40(1):4–18

    Article  MATH  Google Scholar 

  34. Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the learning from imbalanced data sets: Papers from the Amercian association for artificial intelligence workshop, 2000 (Technical report WS-00-05)

  35. Ramanan S, Clarkson T, Taylor J (1998) Adaptive algorithm for training pram neural networks on unbalanced data sets. Electron Lett 34(13):1335–1336

    Article  Google Scholar 

  36. Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130

    Article  Google Scholar 

  37. Weiss GM, Provost FJ (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19:315–354

    MATH  Google Scholar 

  38. Wilamowski BM, Kaynak O (2001) An algorithm for fast convergence in training neural networks. In: Proceedings of the international joint conference on neural networks, 2:17781782

  39. Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl and Data Eng 18:63–77

    Article  Google Scholar 

Download references

Acknowledgments

This work has partially been supported by the Mexican SEP under grants PROMEP/103.5/11/3796 and PROMEP/103.5/12/4783, the TESJo under grant SDMAIA-010, and the Mexican Science and Technology Council (CONACYT-Mexico) through a Postdoctoral Fellowship [223351].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Alejo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alejo, R., García, V. & Pacheco-Sánchez, J.H. An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem. Neural Process Lett 42, 603–617 (2015). https://doi.org/10.1007/s11063-014-9376-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-014-9376-3

Keywords

Navigation