Skip to main content

Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 977))

Abstract

The classification of data streams is a frequently considered problem. The data coming in over time has a tendency to change its characteristics over time and usually we also encounter some difficulties in data distributions as inequality of the number of learning examples from considered classes. The combination of these two phenomena is an additional challenge. In this article, we propose a novel MSRS (Multi Sampling Random Subspace Ensemble) a chunk-based ensemble method for imbalanced non-stationary data stream classification. The proposed algorithm employs random subspace approach and balancing data using various sampling methods to ensure an appropriate diversity of the classifier ensemble. MSRS has been evaluated on the basis of the computer experiments carried out on the diverse pool of the non-stationary imbalanced data streams.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/JakubKlik/msrs.

References

  1. Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. Acm Sigkdd Explor Newslett 6(1):50–59

    Article  Google Scholar 

  2. Krawczyk B, Galar M, Jeleń Ł, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38:714–726

    Article  Google Scholar 

  3. Alqatawna J, Faris H, Jaradat K, Al-Zewairi M, Adwan O (2015) Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int J Commun Netw Syst Sci 8(05):118

    Google Scholar 

  4. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  5. Branco P, Torgo L, Ribeiro R (2015) A survey of predictive modelling under imbalanced distributions. arXiv preprint arXiv:1505.01658

  6. Visa S, Ralescu A (2005) Issues in mining imbalanced data sets-a review paper. In: Proceedings of the sixteen midwest artificial intelligence and cognitive science conference, vol 2005, pp 67–73

    Google Scholar 

  7. Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: 2010 20th international conference on Pattern recognition (ICPR), pp 3121–3124. IEEE

    Google Scholar 

  8. Bifet A, de Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68. ACM

    Google Scholar 

  9. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46

    Article  Google Scholar 

  10. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232

    Article  Google Scholar 

  11. Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156

    Article  Google Scholar 

  12. Sun Y, Wong AK, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(04):687–719

    Article  Google Scholar 

  13. Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  14. Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727

    Article  MathSciNet  Google Scholar 

  15. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp 1322–1328. IEEE

    Google Scholar 

  16. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  17. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29

    Article  Google Scholar 

  18. Batista GE, Bazzan AL, Monard MC (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB, pp 10–18

    Google Scholar 

  19. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550

    Article  Google Scholar 

  20. Gao J, Ding B, Fan W, Han J, Philip SY (2008) Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput 12(6):37–49

    Article  Google Scholar 

  21. Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301

    Article  Google Scholar 

  22. Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508

    Article  Google Scholar 

  23. Elwell R, Polikar R (2009) Incremental learning of variable rate concept drift. In: International workshop on multiple classifier systems, pp 142–151. Springer

    Google Scholar 

  24. Wang Y, Zhang Y, Wang Y (2009) Mining data streams with skewed distribution by static classifier ensemble. In: Opportunities and challenges for next-generation applied intelligence, pp 65–71. Springer

    Google Scholar 

  25. Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Syst 2(1):35–50

    Article  Google Scholar 

  26. Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International joint conference on neural networks, IJCNN 2009, pp 522–529. IEEE

    Google Scholar 

  27. Chen S, He H, Li K, Desai S (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining. In: 2010 international joint conference on neural networks (IJCNN), pp 1–8. IEEE

    Google Scholar 

  28. Branco P, Torgo L, Ribeiro RP (2017) Relevance-based evaluation metrics for multi-class imbalanced domains. In: Proceedings of advances in knowledge discovery and data mining - 21st Pacific-Asia conference, Part I, PAKDD 2017, Jeju, South Korea, 23–26 May 2017, pp 698–710

    Chapter  Google Scholar 

  29. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  30. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res. 11:1601–1604

    Google Scholar 

  31. Alcalá-Fdez J, Sánchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jakub Klikowski or Michał Woźniak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Klikowski, J., Woźniak, M. (2020). Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds) Progress in Computer Recognition Systems. CORES 2019. Advances in Intelligent Systems and Computing, vol 977. Springer, Cham. https://doi.org/10.1007/978-3-030-19738-4_36

Download citation

Publish with us

Policies and ethics