Skip to main content
Log in

A novel approach using incremental oversampling for data stream mining

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Data stream mining is very popular in recent years with advanced electronic devices generating continuous data streams. The performance of standard learning algorithms is been compromised with imbalance nature present in real world data streams. In this paper we propose a novel algorithm dubbed as increment over sampling for data streams (IOSDS) which uses an unique over sampling technique to almost balance the data sets to minimize the effect of imbalance in stream mining process. The experimental analysis is conducted on 15 data chunks of data streams with varied sizes and different imbalance ratios. The results suggests that the proposed IOSDS algorithm improves the knowledge discovery over benchmark algorithms like C4.5 and Hoeffding tree in terms of standard performance measures namely accuracy, AUC, precision, recall and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://moa.cs.waikato.ac.nz/.

  2. https://moa.cms.waikato.ac.nz/datasets/n

References

  • Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17:2–3 (255–287)

    Google Scholar 

  • Angelov PP (2012) Autonomous learning systems: from data streams to knowledge in real-time. Wiley, New York

    Book  Google Scholar 

  • Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  • Bifet A, Holmes G, Pfahringer B, Read J, Kranen P, Kremer H, Jansen T, Seidl T (2011) MOA: a real-time analytics open source framework. In: Joint European conference on machine learning and knowledge discovery in databases, ECML PKDD 2011: machine learning and knowledge discovery in databases, pp 617–620

  • Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst Appl 39:3446–3453

    Article  Google Scholar 

  • Cao P, Zhao D, Zaiane O (2011) A PSO-based cost-sensitive neural network for imbalanced data classification, adfa. Springer, Berlin, p 1

    Google Scholar 

  • Chen Y (2008) Learning classifiers from imbalanced, only positive and unlabeled data sets. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220

  • Czarnowski I, Jedrzejowicz P (2014) Ensemble classifier for mining data streams. In: 18th international conference on knowledge-based and intelligent information and engineering systems—KES2014. https://doi.org/10.1016/j.procs.2014.08.120

  • Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. In: IEEE transactions on knowledge and data engineering, Digital Object Indentifier. https://doi.org/10.1109/TKDE.2012.136

  • Doucette J, Heywood MI (2008) GP classification under imbalanced data sets: active sub-sampling and AUC approximation. In: O’Neill M et al (eds) EuroGP 2008, LNCS 4971. Springer, Berlin, pp 266–277

    Google Scholar 

  • Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • Hamilton A, Newman AD (2007) UCI repository of machine learning database (School of Information and Computer Science). University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 3 May 2017

  • Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1:89–101, https://doi.org/10.1007/s13748-011-0008-0

    Article  Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106

  • Jankowski D, Jackowski K, Cyganek B (2016) Learning decision trees from data streams with concept drift. In: ICCS 2016. The international conference on computational science, vol 80, pp 1682–1691

  • Khamassi I, SayedMouchaweh M, Hammami M, Ghédira K (2016) Discussion and review on evolving data streams and concept drift adapting. Evol Syst Springer. https://doi.org/10.1007/s12530-016-9168-2,

    Google Scholar 

  • Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221. https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  • Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. SIGKDD Explor 16(1):1–10

    Article  Google Scholar 

  • Li Q, Mao Y (2014) A review of boosting methods for imbalanced data classification. Pattern Anal Appl 17(4):679–693

    Article  MathSciNet  MATH  Google Scholar 

  • López V, Triguero I, Carmona CJ, García S, Herrera F (2014) Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 126:15–28

    Article  Google Scholar 

  • Lorena AC, Jacintho LFO, Siqueira MF, Giovanni RD, Lohmann LG, de Carvalho ACPLF, Yamamoto M (2011) Comparing machine learning classifiers in potential distribution modelling. Expert Syst Appl 38:5268–5275

    Article  Google Scholar 

  • Lughofer E, Buchtala O (2013) Reliable all-pairs evolving fuzzy classifiers. IEEE Trans Fuzzy Syst 21(4):625–641

    Article  Google Scholar 

  • Lughofer E, Weig E, Heid W, Eitzinger C, Radauer T (2015) Integrating new classes on the fly in evolving fuzzy classifier designs and its application in visual inspection. Appl Soft Comput 35:558–582

    Article  Google Scholar 

  • Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2016) Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelle d instances. Inf Sci 355–356:127–151

    Article  Google Scholar 

  • Menon AK, Narasimhan H, Agarwal S, Chawla S (2013) On the statistical consistency of algorithms for binary classification under class imbalance. In: Appearing in proceedings of the 30th international conference on machine learning, Atlanta, Georgia, USA

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, Burlington

    Google Scholar 

  • Sayed-Mouchaweh M, Lughofer E (2012) Learning in non-stationary environments: methods and applications. Springer, New York

    Book  MATH  Google Scholar 

  • Song G, Ye Y (2014) A dynamic ensemble framework for mining textual streams with class imbalance. Hindawi Publ Corp Sci World J. https://doi.org/10.1155/2014/497354. (Article ID 497354)

    Google Scholar 

  • Thalor MA, Patil S (2016) Incremental learning on non-stationary data stream using ensemble approach. Int J Electr Comput Eng (IJECE) 6(4):1811–1817. https://doi.org/10.11591/ijece.v6i4.10255

    Google Scholar 

  • Verbiesta N, Ramentol E, Cornelisa C, Herrera F (2014) Preprocessing noisy imbalanced datasets using SMOTE enhanced withfuzzy rough prototype selection. Appl Soft Comput 22:511–517

    Article  Google Scholar 

  • Wang S, Minku LL, Yao X (2014) A multi-objective ensemble method for online class imbalance learning. In: 2014 international joint conference on neural networks IJCNN July 6–11, Beijing, China

  • Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. In: IEEE transactions on knowledge and data engineering. https://doi.org/10.1109/TKDE.2014.2345380

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Yang B, Jing L (2014) A novel nonparallel plane proximal SVM for imbalance data classification. J Softw 9(9):2384–2392

    MathSciNet  Google Scholar 

  • Yu S, Tang K, Minku LL, Wang S, Yao X (2016) Online ensemble learning of data streams with gradually evolved classes. In: IEEE transactions on knowledge and data engineering

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Anupama.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anupama, N., Jena, S. A novel approach using incremental oversampling for data stream mining. Evolving Systems 10, 351–362 (2019). https://doi.org/10.1007/s12530-018-9249-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-018-9249-5

Keywords

Navigation