Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models

Pattanayak, Sanjibani Sudha; Rout, Minakhi

doi:10.1007/978-981-10-6875-1_2

Sanjibani Sudha Pattanayak¹⁹ &
Minakhi Rout¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 564))

1112 Accesses
2 Citations

Abstract

Imbalanced dataset is a dataset, in which the number of samples in different classes is highly uneven, which makes it very challenging for classification, i.e., classification becomes very tough as the result may get biased by the dominating class values. But misclassification of minor class sample or interested samples is very much costlier. So to provide solution to this problem, various studies have been made out of which sampling techniques are successfully adopted to preprocess the imbalance datasets. In this paper, experimental comparison of two pioneering sampling techniques SMOTE and MWMOTE is simulated using the classification models SVM, RBF, and MLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. In: Foundations and Trends in Information Retrieva, vol. 16, pp. 321–357 (2002)
Google Scholar
Chawla, N V., Lazarevic, A., Hall, O.: SMOTE Boost improving prediction of the minority class in boosting. In: The 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 1322–1328. Springer (2003)
Google Scholar
Hu, S., Liang, Y., Ma, L., He, Y.: Improving classification performance when training data is imbalanced. IEEE (2005)
Google Scholar
Maciejewski, T., Stefanowski, J.: Local neighborhood extension of SMOTE for mining imbalanced data. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 978-1-4244-99 (2011)
Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new oversampling method in imbalanced data sets learning. In: Proceedings International Conference Intelligent Computing, pp. 878–887 (2005)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of International Joint Conference Neural Networks, pp. 1322–1328 (2008)
Google Scholar
Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2) (2014)
Google Scholar
Jayashree, S., Alice Gavya, A.: Classification of imbalanced problem by MWMOTE and SSO. IJMTES 2(5) (2015)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Google Scholar
Buckland, M., Gey, A.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)
Google Scholar
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge and Information System, vol. 33(2), pp. 245–265. Springer (2012)
Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE a new oversampling method in data sets learning. In: Proceedings of International Conference on Intelligent Computing, pp. 878-887 (2005)
Google Scholar
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: Modeling for highly imbalanced classification. J. latex class files. 1(11) (2002)
Google Scholar
Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for Improved Classification Of Imbalanced Data. Advances in Artifical Intelligence, vol. 4304, pp. 264–273 (2006)
Google Scholar
Prez-Godoy, M.D., Rivera, A.J., Carmona, C.J., delJesus, M.J.: Training algorithms for radial basis function networks to tackle learning processes with imbalanced data-sets. Appl. Soft Comput. 25, 26–39 (2014)
Google Scholar
Haddad, L., Morris, C W., Boddy, L.: Training radial basis function neural networks: effects of training set size and imbalanced training sets. J. Microbiol. Methods 43(1), 33–44 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

ITER, Siksha O Anusandhan University, Bhubaneswar, 751030, Odisha, India
Sanjibani Sudha Pattanayak & Minakhi Rout

Authors

Sanjibani Sudha Pattanayak
View author publications
You can also search for this author in PubMed Google Scholar
Minakhi Rout
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjibani Sudha Pattanayak .

Editor information

Editors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland
Khalid Saeed
Dept. of Computer Science & Engg., University of Calcutta Dept. of Computer Science & Engg., Kolkata, West Bengal, India
Nabendu Chaki
C. V. Raman College of Engineering, Bhubaneswar, Odisha, India
Bibudhendu Pati
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Sambit Bakshi
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Durga Prasad Mohapatra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pattanayak, S.S., Rout, M. (2018). Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models. In: Saeed, K., Chaki, N., Pati, B., Bakshi, S., Mohapatra, D. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 564. Springer, Singapore. https://doi.org/10.1007/978-981-10-6875-1_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-6875-1_2
Published: 22 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6874-4
Online ISBN: 978-981-10-6875-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics