An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

Alejo, R.; García, V.; Pacheco-Sánchez, J. H.

doi:10.1007/s11063-014-9376-3

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

Published: 17 August 2014

Volume 42, pages 603–617, (2015)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

R. Alejo¹,
V. García² &
J. H. Pacheco-Sánchez³

674 Accesses
24 Citations
6 Altmetric
Explore all metrics

Abstract

In this paper a new dynamic over-sampling method is proposed, it is a hybrid method that combines a well known over-sampling technique (SMOTE) with the sequential back-propagation algorithm. The method is based on the back-propagation mean square error (MSE) for automatically identifying the over-sampling rate, i.e., it allows only the use of necessary training samples for dealing with the class imbalance problem and avoiding to increase excessively the (neural networks) NN training time. The main aim of the proposed method is to obtain a trade-off between NN classification performance and NN training time on scenarios where the training data set represents a multi-class classification problem, it is high imbalanced and it might request a large NN training time. Experimental results on fifteen multi-class imbalanced data sets show that the proposed method is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Article 30 August 2019

Xibin Dong, Zhiwen Yu, … Qianli Ma

Notes

https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html

References

A. Asuncion, D.N.: UCI machine learning repository (2007). www.ics.uci.edu/mlearn/
Alejo R, Valdovinos RM, García V, Pacheco-Sanchez JH (2012) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388
Article Google Scholar
Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969
Article Google Scholar
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29
Article Google Scholar
Batista GEAPA, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. In: IDA, pp. 24–35
Bruzzone L, Serpico S (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252
Article MathSciNet Google Scholar
Crone SF, Lessmann S, Stahlbock R (2006) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper Res 173(3):781–800
Article MathSciNet MATH Google Scholar
Debowski B, Areibi S, Gréwal G, Tempelman J (2012). A dynamic sampling framework for multi-class imbalanced data. ICMLA 2:113–118
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874
Article Google Scholar
Fernández-Navarro F, Hervás-Martínez C, Antonio Gutiérrez P (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833
Article MATH Google Scholar
Fernández-Navarro F, Hervás-Martínez C, García-Alonso CR, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490
Article Google Scholar
García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17:275–306
Article Google Scholar
García V, Sánchez JS, Mollineda RA (2008) On the use of surrounding neighbors for synthetic over-sampling of the minority class. In: Proceedings of the 8th conference on Simulation., modelling and optimization, SMO’08Stevens Point, Wisconsin, USA, pp 389–394
Han H, Wang W, Mao B (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. ICIC 1:878–887
Google Scholar
Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186
Article MATH Google Scholar
Haykin S (1999) Neural networks. A comprehensive foundation, 2nd edn. Pretince Hall, New Jersey
MATH Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Iman RL, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Commun Stat Theory Methods 9(6):571–595
Article Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
MATH Google Scholar
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Emerging artificial intelligence applications in computer engineering, pp. 3–24
Kretzschmar R, Karayiannis NB, Eggimann F (2005) Feedforward neural network models for handling class overlap and class imbalance. Int J Neural Syst 15(5):323–338
Article Google Scholar
Lawrence S, Burns I, Back A, Tsoi A, Giles CL (1998) Neural network classification and unequal prior class probabilities. In: Neural networks: tricks of the trade, LNCS. pp 299–314
Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backProp. In: G. Orr, K. Müller (eds.) Neural networks-tricks of the trade, lecture notes in computer science, vol. 1524, pp. 5–50. Springer Verlag
Li BY, Peng J, Chen YQ, Jin YQ (2006) Classifying unbalanced pattern groups by training neural network. ISNN 2:8–13
Google Scholar
Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. Handbook of metaheuristics, international series in operations research and management science. Springer, New York, p 105144
Google Scholar
Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117–128
Article MATH Google Scholar
Oh SH (2011) Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6):1058–1061
Article Google Scholar
Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Facetwise analysis of xcs for problems with class imbalances. Trans Evol Comp 13:1093–1119
Article Google Scholar
Ou G, Murphey YL (2007) Multi-class pattern classification using neural networks. Pattern Recognit 40(1):4–18
Article MATH Google Scholar
Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the learning from imbalanced data sets: Papers from the Amercian association for artificial intelligence workshop, 2000 (Technical report WS-00-05)
Ramanan S, Clarkson T, Taylor J (1998) Adaptive algorithm for training pram neural networks on unbalanced data sets. Electron Lett 34(13):1335–1336
Article Google Scholar
Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130
Article Google Scholar
Weiss GM, Provost FJ (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19:315–354
MATH Google Scholar
Wilamowski BM, Kaynak O (2001) An algorithm for fast convergence in training neural networks. In: Proceedings of the international joint conference on neural networks, 2:17781782
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl and Data Eng 18:63–77
Article Google Scholar

Download references

Acknowledgments

This work has partially been supported by the Mexican SEP under grants PROMEP/103.5/11/3796 and PROMEP/103.5/12/4783, the TESJo under grant SDMAIA-010, and the Mexican Science and Technology Council (CONACYT-Mexico) through a Postdoctoral Fellowship [223351].

Author information

Authors and Affiliations

Tecnológico de Estudios Superiores de Jocotitlán, Carretera Toluca-Atlacomulco KM. 44.8, Ejido de San Juan y San Agustín, 50700, Jocotitlán, Mexico
R. Alejo
Department of Electrical and Computer Engineering, Instituto de Ingeniería y Tecnología, Universidad Autónoma de Ciudad Juárez, Av. del Charro 450 Norte, 32310, Ciudad Juárez, Chihuahua, Mexico
V. García
Instituto Tecnológico de Toluca, Av. Tecnológico s/n Ex-Rancho La Virgen, 52140, Metepec, Mexico
J. H. Pacheco-Sánchez

Authors

R. Alejo
View author publications
You can also search for this author in PubMed Google Scholar
V. García
View author publications
You can also search for this author in PubMed Google Scholar
J. H. Pacheco-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Alejo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alejo, R., García, V. & Pacheco-Sánchez, J.H. An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem. Neural Process Lett 42, 603–617 (2015). https://doi.org/10.1007/s11063-014-9376-3

Download citation

Published: 17 August 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11063-014-9376-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation