Skip to main content

Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models

  • Conference paper
  • First Online:
Progress in Advanced Computing and Intelligent Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 564))

Abstract

Imbalanced dataset is a dataset, in which the number of samples in different classes is highly uneven, which makes it very challenging for classification, i.e., classification becomes very tough as the result may get biased by the dominating class values. But misclassification of minor class sample or interested samples is very much costlier. So to provide solution to this problem, various studies have been made out of which sampling techniques are successfully adopted to preprocess the imbalance datasets. In this paper, experimental comparison of two pioneering sampling techniques SMOTE and MWMOTE is simulated using the classification models SVM, RBF, and MLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. In: Foundations and Trends in Information Retrieva, vol. 16, pp. 321–357 (2002)

    Google Scholar 

  2. Chawla, N V., Lazarevic, A., Hall, O.: SMOTE Boost improving prediction of the minority class in boosting. In: The 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 1322–1328. Springer (2003)

    Google Scholar 

  3. Hu, S., Liang, Y., Ma, L., He, Y.: Improving classification performance when training data is imbalanced. IEEE (2005)

    Google Scholar 

  4. Maciejewski, T., Stefanowski, J.: Local neighborhood extension of SMOTE for mining imbalanced data. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 978-1-4244-99 (2011)

    Google Scholar 

  5. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new oversampling method in imbalanced data sets learning. In: Proceedings International Conference Intelligent Computing, pp. 878–887 (2005)

    Google Scholar 

  6. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of International Joint Conference Neural Networks, pp. 1322–1328 (2008)

    Google Scholar 

  7. Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2) (2014)

    Google Scholar 

  8. Jayashree, S., Alice Gavya, A.: Classification of imbalanced problem by MWMOTE and SSO. IJMTES 2(5) (2015)

    Google Scholar 

  9. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  10. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Google Scholar 

  11. Buckland, M., Gey, A.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)

    Google Scholar 

  12. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge and Information System, vol. 33(2), pp. 245–265. Springer (2012)

    Google Scholar 

  13. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE a new oversampling method in data sets learning. In: Proceedings of International Conference on Intelligent Computing, pp. 878-887 (2005)

    Google Scholar 

  14. Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: Modeling for highly imbalanced classification. J. latex class files. 1(11) (2002)

    Google Scholar 

  15. Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for Improved Classification Of Imbalanced Data. Advances in Artifical Intelligence, vol. 4304, pp. 264–273 (2006)

    Google Scholar 

  16. Prez-Godoy, M.D., Rivera, A.J., Carmona, C.J., delJesus, M.J.: Training algorithms for radial basis function networks to tackle learning processes with imbalanced data-sets. Appl. Soft Comput. 25, 26–39 (2014)

    Google Scholar 

  17. Haddad, L., Morris, C W., Boddy, L.: Training radial basis function neural networks: effects of training set size and imbalanced training sets. J. Microbiol. Methods 43(1), 33–44 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjibani Sudha Pattanayak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pattanayak, S.S., Rout, M. (2018). Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models. In: Saeed, K., Chaki, N., Pati, B., Bakshi, S., Mohapatra, D. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 564. Springer, Singapore. https://doi.org/10.1007/978-981-10-6875-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6875-1_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6874-4

  • Online ISBN: 978-981-10-6875-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics