Semi-supervised Classification of Concept Drift Data Stream Based on Local Component Replacement

Qin, Keke; Wen, Yimin

doi:10.1007/978-981-13-2122-1_8

Keke Qin¹³ &
Yimin Wen^12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 888))

Included in the following conference series:

International CCF Conference on Artificial Intelligence

1026 Accesses
2 Citations

Abstract

Being compared with traditional data mining, data stream has three distinct characteristics which pose new challenges to machine learning and data mining. These challenges will become more serious when only few instances are labeled in data stream. In the paper, based on the algorithm of SPASC, a strategy of local component replacement for updating classifier pool is proposed. The proposed strategy defines a vector based on local accuracy to evaluate the adaptability of each “component” of a cluster-based classifier to a new chunk and makes the trained cluster-based classifiers in the pool adapt to the current concept better and faster while retaining as much learned knowledge as possible. The proposed algorithm is compared with the state of the art baseline methods on multiple datasets, the experimental results illustrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krawczyk, B., Minku, L.L., Woniak, M., Woniak, M., Woniak, M.: Ensemble learning for data stream analysis. Inf. Fusion 37(C), 132–156 (2017)
Article Google Scholar
Li, Y.F., Zhou, Z.H.: Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 386–391. AAAI, Menlo Park (2011)
Google Scholar
Huang, K., Xu, Z., King, I., Lyu, MR.: Semi-supervised learning from general unlabeled data. In: Proceedings of the Eighth IEEE International Conference on Data Mining, pp. 273–282. IEEE, Piscataway (2008)
Google Scholar
Breve, F., Zhao, L.: Semi-supervised learning with concept drift using particle dynamics applied to network intrusion detection data. In: Proceedings of BRICS Congress on Computational Intelligence & Brazilian Congress on Computational Intelligence, pp. 335–340. IEEE, New York (2013)
Google Scholar
Zhang, Z.W., Jing, X.Y., Wang, T.J.: Label propagation based semi-supervised learning for software defect prediction. Autom. Softw. Eng. 24(1), 1–23 (2016)
Google Scholar
Masud, M.M., et al.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2012)
Article Google Scholar
Hosseini, M.J., Gholipour, A., Beigy, H.: An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowl. Inf. Syst. 46(3), 567–597 (2016)
Article Google Scholar
Li, N.: Clustering assumption based classification algorithm for stream data. Pattern Recog. Artif. Intell. 30(1), 1–10 (2017)
Google Scholar
Haque, A., Khan, L., Baron, M.: SAND: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of Thirtieth AAAI Conference on Artificial Intelligence, pp. 335–340. AAAI, Menlo Park (2016)
Google Scholar
Yang, L., Cheung, Y.M., Yuan, Y.T.: Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 2393–2399. AAAI, Menlo Park (2017)
Google Scholar
Shao, J., Ahmadi, Z., Kramer, S.: Prototype-based learning on concept-drifting data streams. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 412–421. ACM, New York (2014)
Google Scholar
Jayanthi, S., Karthikeyan, B.: A recap on data stream classification. Adv. Natural Appl. Sci. 8(17), 76–82 (2014)
Google Scholar
Wen, Y., Qiang, B., Fan, Z.: A survey of the classification of data streams with concept drift. CAAI Trans. Intell. Syst. 46(11), 2656–2665 (2013). (In Chinese)
Google Scholar
Loo, H.R., Marsono, M.N.: Online data stream classification with incremental semi-supervised learning. In: Proceedings of the Second ACM IKDD Conference on Data Sciences, pp. 132–133. ACM, New York (2015)
Google Scholar
Zhu, L., Pang, S., Sarrafzadeh, A., Ban, T., Inoue, D.: Incremental and decremental max-flow for online semi-supervised learning. IEEE Trans. Knowl. Data Eng. 28(8), 2115–2127 (2017)
Article Google Scholar
Li, P.P., Wu, X.D., Hu, X.G.: Mining recurring concept drifts with limited labeled streaming data. ACM Trans. Intell. Syst. Technol. 3(2), 1–32 (2012)
Google Scholar
Wu, X.D., Li, P.P., Hu, X.G.: Learning from concept drifting data streams with unlabeled data. Neurocomputing 92(9), 145–155 (2012)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: Exploiting unlabeled data to enhance ensemble diversity. In: Proceedings of IEEE International Conference on Data Mining, pp. 619–628. IEEE, Piscataway (2010)
Google Scholar
Ahmadi, Z., Beigy, H.: Semi-supervised ensemble learning of data streams in the presence of concept drift. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 526–537. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28931-6_50
Chapter Google Scholar
Feng, Z., Wang, M., Yang, S., Jiao, L.: Incremental semi-supervised classification of data streams via self-representative selection. Appl. Soft Comput. 47, 389–394 (2016)
Article Google Scholar
Zhang, P., Zhu, X., Tan, J., Guo, L.: Classifier and cluster ensembles for mining concept drifting data streams. In: Proceedings of IEEE International Conference on Data Mining, pp. 1175–1180. IEEE, Piscataway (2011)
Google Scholar
Woolam, C., Masud, Mohammad M., Khan, L.: Lacking labels in the stream: classifying evolving stream data with few labels. In: Rauch, J., Raś, Zbigniew W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS (LNAI), vol. 5722, pp. 552–562. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04125-9_58
Chapter Google Scholar
Apache Spam Assassin. http://spamassassin.apache.org
Machine Learning & Knowledge Discovery Group. http://mlkd.csd.auth.gr/concept_drift.html
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository Irvine (2007)
Google Scholar
Losing, V., Hammer, B., Wersing, H.: KNN classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings of IEEE International Conference on Data Mining, pp. 291–300. IEEE, Piscataway (2017)
Google Scholar
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11(2), 1601–1604 (2010)
Google Scholar
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw. 22(10), 1517–1531 (2011)
Article Google Scholar
Bifet, A., Pfahringer, B., Read, J., Holmes, G.: Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 801–806. ACM, New York (2013)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, L.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar

Download references

Acknowledgment

This work was partially supported by the National Natural Science Foundation of China (61363029, 61662014, 61763007), Guangxi Key Laboratory of Trusted Software (KX201721), Collaborative innovation center of cloud computing and big data (YD16E12), Image intelligent processing project of Key Laboratory Fund (GIIP201505).

Author information

Authors and Affiliations

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Yimin Wen
School of Computer Science and Information Safety, Guilin University of Electronic Technology, Guilin, 541004, China
Keke Qin & Yimin Wen

Authors

Keke Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yimin Wen .

Editor information

Editors and Affiliations

Nanjing University, Nanjing, China
Zhi-Hua Zhou
Hong Kong University of Science and Technology, Hong Kong SAR, China
Qiang Yang
Nanjing University, Nanjing, China
Yang Gao
JD Finance, Beijing, China
Yu Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, K., Wen, Y. (2018). Semi-supervised Classification of Concept Drift Data Stream Based on Local Component Replacement. In: Zhou, ZH., Yang, Q., Gao, Y., Zheng, Y. (eds) Artificial Intelligence. ICAI 2018. Communications in Computer and Information Science, vol 888. Springer, Singapore. https://doi.org/10.1007/978-981-13-2122-1_8

Download citation

DOI: https://doi.org/10.1007/978-981-13-2122-1_8
Published: 02 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2121-4
Online ISBN: 978-981-13-2122-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)