Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification

Klikowski, Jakub; Woźniak, Michał

doi:10.1007/978-3-030-19738-4_36

Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification

Jakub Klikowski¹⁷ &
Michał Woźniak¹⁷

Conference paper
First Online: 08 May 2019

712 Accesses
6 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 977))

Abstract

The classification of data streams is a frequently considered problem. The data coming in over time has a tendency to change its characteristics over time and usually we also encounter some difficulties in data distributions as inequality of the number of learning examples from considered classes. The combination of these two phenomena is an additional challenge. In this article, we propose a novel MSRS (Multi Sampling Random Subspace Ensemble) a chunk-based ensemble method for imbalanced non-stationary data stream classification. The proposed algorithm employs random subspace approach and balancing data using various sampling methods to ensure an appropriate diversity of the classifier ensemble. MSRS has been evaluated on the basis of the computer experiments carried out on the diverse pool of the non-stationary imbalanced data streams.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/JakubKlik/msrs.

References

Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. Acm Sigkdd Explor Newslett 6(1):50–59
Article Google Scholar
Krawczyk B, Galar M, Jeleń Ł, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38:714–726
Article Google Scholar
Alqatawna J, Faris H, Jaradat K, Al-Zewairi M, Adwan O (2015) Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int J Commun Netw Syst Sci 8(05):118
Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Branco P, Torgo L, Ribeiro R (2015) A survey of predictive modelling under imbalanced distributions. arXiv preprint arXiv:1505.01658
Visa S, Ralescu A (2005) Issues in mining imbalanced data sets-a review paper. In: Proceedings of the sixteen midwest artificial intelligence and cognitive science conference, vol 2005, pp 67–73
Google Scholar
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: 2010 20th international conference on Pattern recognition (ICPR), pp 3121–3124. IEEE
Google Scholar
Bifet A, de Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68. ACM
Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232
Article Google Scholar
Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156
Article Google Scholar
Sun Y, Wong AK, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(04):687–719
Article Google Scholar
Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516
Article Google Scholar
Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727
Article MathSciNet Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp 1322–1328. IEEE
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Article Google Scholar
Batista GE, Bazzan AL, Monard MC (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB, pp 10–18
Google Scholar
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550
Article Google Scholar
Gao J, Ding B, Fan W, Han J, Philip SY (2008) Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput 12(6):37–49
Article Google Scholar
Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301
Article Google Scholar
Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508
Article Google Scholar
Elwell R, Polikar R (2009) Incremental learning of variable rate concept drift. In: International workshop on multiple classifier systems, pp 142–151. Springer
Google Scholar
Wang Y, Zhang Y, Wang Y (2009) Mining data streams with skewed distribution by static classifier ensemble. In: Opportunities and challenges for next-generation applied intelligence, pp 65–71. Springer
Google Scholar
Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Syst 2(1):35–50
Article Google Scholar
Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International joint conference on neural networks, IJCNN 2009, pp 522–529. IEEE
Google Scholar
Chen S, He H, Li K, Desai S (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining. In: 2010 international joint conference on neural networks (IJCNN), pp 1–8. IEEE
Google Scholar
Branco P, Torgo L, Ribeiro RP (2017) Relevance-based evaluation metrics for multi-class imbalanced domains. In: Proceedings of advances in knowledge discovery and data mining - 21st Pacific-Asia conference, Part I, PAKDD 2017, Jeju, South Korea, 23–26 May 2017, pp 698–710
Chapter Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res. 11:1601–1604
Google Scholar
Alcalá-Fdez J, Sánchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Article Google Scholar

Download references

Acknowledgement

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325.

Author information

Authors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Jakub Klikowski & Michał Woźniak

Authors

Jakub Klikowski
View author publications
You can also search for this author in PubMed Google Scholar
Michał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jakub Klikowski or Michał Woźniak .

Editor information

Editors and Affiliations

Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Robert Burduk
Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Marek Kurzynski
Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Michał Wozniak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klikowski, J., Woźniak, M. (2020). Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds) Progress in Computer Recognition Systems. CORES 2019. Advances in Intelligent Systems and Computing, vol 977. Springer, Cham. https://doi.org/10.1007/978-3-030-19738-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-19738-4_36
Published: 08 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19737-7
Online ISBN: 978-3-030-19738-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics