Genetic Algorithm-Based Oversampling Technique to Learn from Imbalanced Data

Saladi, Puneeth Srinivas Mohan; Dash, Tirtharaj

doi:10.1007/978-981-13-1592-3_30

Puneeth Srinivas Mohan Saladi¹⁹ &
Tirtharaj Dash¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 816))

845 Accesses
2 Citations

Abstract

Availability of data from many different applications such as surveillance systems, security appliances, finances has been continuously expanding. Many machine learning (ML) and data mining models have shown promising power in learning from the available data. However, the problem of learning an ML classifier from imbalanced data is still a challenging problem. This problem is often regarded as the imbalanced learning problem. In this problem, there is more amount of information known from the majority classes than the minority classes. In such a learning environment, the classifier during training over-fits to the former classes and under-fits to the minority classes. Distance-based strategy, for example, SMOTE, has been quite useful to oversample the minority classes that essentially uses nearest neighbor samples from the available samples. In this paper, we propose a notion of employing genetic algorithm (GA) that would essentially learn the probability distribution from the available data to generate the minority class samples for binary classification problems. We validate and test our proposed oversampling strategy by training three different kinds of classifiers. The comparative analysis with SMOTE-based oversampling and the proposed GA-based oversampling shows promising results for a selected ten very popular imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We intentionally avoid common terminologies in GA such as population, chromosome, crossover, mutation without any loss of generality in understanding this methodology.

References

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Barandela, R., Sánchez, J.S., Garcıa, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–851 (2003)
Article Google Scholar
Aly, M.: Survey on multiclass classification methods. Neural Netw. 19, 1–9 (2005)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
Article Google Scholar
Li, J., Fong, S., Wong, R.K., Chu, V.W.: Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion (2017)
Google Scholar
Dash, T., Nayak, T., Swain, R.R.: Controlling wall following robot navigation based on gravitational search and feed forward neural network. In: Proceedings of the 2nd International Conference on Perception and Machine Intelligence, pp. 196–200. ACM (2015)
Google Scholar
Boussaïd, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013)
Article MathSciNet Google Scholar
Dash, T., Sahu, P.K.: Gradient gravitational search: an efficient metaheuristic algorithm for global optimization. J. Comput. Chem. 36(14), 1060–1068 (2015)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Li, J., Fong, S., Zhuang, Y.: Optimizing smote by metaheuristics with neural network and decision tree. In: 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI), pp. 26–32. IEEE (2015)
Google Scholar
Jiang, K., Lu, J., Xia, K.: A novel algorithm for imbalance data classification based on genetic algorithm improved smote. Arab. J. Sci. Eng. 41(8), 3255–3266 (2016)
Article Google Scholar
Zorić, B., Bajer, D., Martinović, G.: Employing different optimisation approaches for smote parameter tuning. In: International Conference on Smart Systems and Technologies (SST), pp. 191–196. IEEE (2016)
Google Scholar
Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)
Article Google Scholar
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sanchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)
Google Scholar
Dinesh, S., Dash, T.: Reliable evaluation of neural network for multiclass classification of real-world data. arXiv preprint arXiv:1612.00671 (2016)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pai, P.P., Dash, T., Mondal, S.: Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach. J. Theor. Biol. 418, 77–83 (2017)
Article Google Scholar
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 1119–1130 (2012)
Article Google Scholar
Wan, X., Liu, J., Cheung, W.K., Tong, T.: Learning to improve medical decision making from imbalanced data without a priori cost. BMC Med. Inform. Decis. Mak. 14(1), 111 (2014)
Article Google Scholar
Nayak, T., Dash, T., Rao, D.C., Sahu, P.K.: Evolutionary neural networks versus adaptive resonance theory net for breast cancer diagnosis. In: Proceedings of the International Conference on Informatics and Analytics, p. 97. ACM (2016)
Google Scholar
Dash, T.: Automatic navigation of wall following mobile robot using adaptive resonance theory of type-1. Biologically Inspired Cogn. Archit. 12, 1–8 (2015)
Article Google Scholar
Dash, T.: A study on intrusion detection using neural networks trained with evolutionary algorithms. Soft Comput. 21(10), 2687–2700 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Data Science Research Group, Department of Computer Science, Birla Institute of Technology and Science Pilani, K.K. Birla Goa Campus, Zuarinagar, 403726, Goa, India
Puneeth Srinivas Mohan Saladi & Tirtharaj Dash

Authors

Puneeth Srinivas Mohan Saladi
View author publications
You can also search for this author in PubMed Google Scholar
Tirtharaj Dash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tirtharaj Dash .

Editor information

Editors and Affiliations

Department of Mathematics, South Asian University New Delhi , New Delhi, India
Jagdish Chand Bansal
Department of Mathematics, National Institute Of Technology Silchar Department of Mathematics, Silchar, Assam, India
Kedar Nath Das
Department of Mathematics and Computer Science, Faculty of Science, , Liverpool Hope University, Liverpool, UK
Atulya Nagar
Department of Mathematics, Indian Institute of Technology Roor Department of Mathematics, Roorkee, Uttarakhand, India
Kusum Deep
School of Basic Sciences, Indian Institute of Technology Bhubanesw School of Basic Sciences, Bhubaneswar, Odisha, India
Akshay Kumar Ojha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saladi, P.S.M., Dash, T. (2019). Genetic Algorithm-Based Oversampling Technique to Learn from Imbalanced Data. In: Bansal, J., Das, K., Nagar, A., Deep, K., Ojha, A. (eds) Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 816. Springer, Singapore. https://doi.org/10.1007/978-981-13-1592-3_30

Download citation

DOI: https://doi.org/10.1007/978-981-13-1592-3_30
Published: 14 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1591-6
Online ISBN: 978-981-13-1592-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics