Advertisement

SMOTE Algorithm Variations in Balancing Data Streams

  • Bogdan GulowatyEmail author
  • Paweł Ksieniewicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11872)

Abstract

From one year to another, more and more vast amounts of data is being created in different fields of application. Great deal of those sources require real-time processing and analyzing, which leads to increased interest in streaming data classification field of machine learning. It is not rare, that many of those applications deal with somehow skewed or imbalanced data. In this paper, we analyze usage of smote oversampling algorithm variations in learning patterns from imbalanced data streams using different incremental learning ensemble algorithms.

Keywords

Data streams Imbalanced learning Synthetic oversampling Classifier ensembles 

Notes

Acknowledgments

This work is supported by the Polish National Science Center under the Grant no. UMO-2015/19/B/ST6/01597 as well the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wrocław University of Science and Technology.

References

  1. 1.
    Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge (2010)zbMATHGoogle Scholar
  2. 2.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11(May), 1601–1604 (2010)Google Scholar
  3. 3.
    Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2013)CrossRefGoogle Scholar
  4. 4.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_43CrossRefGoogle Scholar
  5. 5.
    Capgemini, BNP Paribas: World payments report 2018, October 2018. https://worldpaymentsreport.com/wp-content/uploads/sites/5/2018/10/World-Payments-Report-WPR18-2018.pdf. Accessed 12 Feb 2019
  6. 6.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  7. 7.
    Facebook Inc.: Facebook reports fourth quarter and full year 2018 results, January 2019. https://investor.fb.com/investor-news/press-release-details/2019/Facebook-Reports-Fourth-Quarter-and-Full-Year-2018-Results/default.aspx. Accessed 21 Feb 2019
  8. 8.
    Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM (2009)Google Scholar
  9. 9.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005).  https://doi.org/10.1007/11538059_91CrossRefGoogle Scholar
  10. 10.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)Google Scholar
  11. 11.
    Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. Artif. Intell. 1(1), 89–101 (2012).  https://doi.org/10.1007/s13748-011-0008-0CrossRefGoogle Scholar
  12. 12.
    Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRefGoogle Scholar
  13. 13.
    Krikorian, R.: New tweets per second record, and how!, August 2013. https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how.html. Accessed 12 Feb 2019
  14. 14.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)Google Scholar
  16. 16.
    Woźniak, M., Kasprzak, A.: Data stream classification using classifier ensemble. Schedae Informaticae 23, 21–32 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Systems and Computer NetworksWroclaw University of Science and TechnologyWroclawPoland

Personalised recommendations