A Selectively Re-train Approach Based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution

Zhang, Dandan; Shen, Hong; Hui, Tian; Li, Yidong; Wu, Jun; Sang, Yingpeng

doi:10.1007/978-3-319-06605-9_34

Dandan Zhang²³,
Hong Shen^24,25,
Tian Hui²⁶,
Yidong Li²³,
Jun Wu²³ &
…
Yingpeng Sang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4116 Accesses
3 Citations

Abstract

Classification is an important and practical tool which uses a model built on historical data to predict class labels for new arrival data. In the last few years, there have been many interesting studies on classification in data streams. However, most such studies assume that those data streams are relatively balanced and stable. Actually, skewed data streams (e.g., few positive but lots of negatives) are very important and typical, which appear in many real world applications. Concept drifts and skewed distributions, two common properties of data streams, make the task of learning in streams particularly difficult and the traditional data mining algorithms no longer work. In this paper, we propose a method (Selectively Re-train Approach Based on Clustering) which can deal with concept-drifting and skewed distribution simultaneously. We evaluate our algorithm on both synthetic and real data sets simulating skewed data streams. Empirical results show the proposed method yields better performance than the previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling, pp. 1–8 (2003)
Google Scholar
Fawcett, T.: Roc graphs: Notes and practical considerations for re-searchers. Technical report, HP Laboratories (2004)
Google Scholar
Gao, J., Ding, B., Fan, W., Han, J., Yu, P.S.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Computing 12(6), 37–49 (2008)
Article Google Scholar
Gao, J., Fan, W., Han, J., Yu, P.S.: A general framework for mining concept-drifting data streams with skewed distributions. In: Proc. 2007 SIAM Int. Conf. Data Mining (SDM 2007), Minneapolis (MN2007)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Article Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001)
Google Scholar
Kotsiantis, S.B., Pintelas, P.E.: Mixture of expert agents for handling imbalanced data sets (2003)
Google Scholar
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: ICDM, pp. 929–934 (2008)
Google Scholar
Nguyen, H.M., Cooper, E.W., Kamei, K.: Online learning from imbalanced data streams. In: SOCPAR 2011, pp. 347–352 (2011)
Google Scholar
Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds.) ISICA 2009. CCIS, vol. 51, pp. 461–471. Springer, Heidelberg (2009)
Chapter Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 226–235. ACM, New York (2003)
Google Scholar
Wang, Y., Zhang, Y., Wang, Y.: Mining data streams with skewed distributions by static classifier ensemble. In: Chien, B.-C., Hong, T.-P. (eds.) Opportunities and Challenges for Next-Generation Applied Intelligence. SCI, vol. 214, pp. 65–71. Springer, Heidelberg (2009)
Chapter Google Scholar
Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 449–456. ACM Press (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Tech., Beijing Jiaotong University, China
Dandan Zhang, Yidong Li, Jun Wu & Yingpeng Sang
School of Information Science and Technology, Sun Yat-sen University, China
Hong Shen
School of Computer Science, University of Adelaide, Australia
Hong Shen
School of Electronics and Info. Engineering, Beijing Jiaotong University, China
Tian Hui

Authors

Dandan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tian Hui
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yingpeng Sang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y. (2014). A Selectively Re-train Approach Based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-06605-9_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics