Abstract
Being compared with traditional data mining, data stream has three distinct characteristics which pose new challenges to machine learning and data mining. These challenges will become more serious when only few instances are labeled in data stream. In the paper, based on the algorithm of SPASC, a strategy of local component replacement for updating classifier pool is proposed. The proposed strategy defines a vector based on local accuracy to evaluate the adaptability of each “component” of a cluster-based classifier to a new chunk and makes the trained cluster-based classifiers in the pool adapt to the current concept better and faster while retaining as much learned knowledge as possible. The proposed algorithm is compared with the state of the art baseline methods on multiple datasets, the experimental results illustrate the effectiveness of the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krawczyk, B., Minku, L.L., Woniak, M., Woniak, M., Woniak, M.: Ensemble learning for data stream analysis. Inf. Fusion 37(C), 132–156 (2017)
Li, Y.F., Zhou, Z.H.: Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 386–391. AAAI, Menlo Park (2011)
Huang, K., Xu, Z., King, I., Lyu, MR.: Semi-supervised learning from general unlabeled data. In: Proceedings of the Eighth IEEE International Conference on Data Mining, pp. 273–282. IEEE, Piscataway (2008)
Breve, F., Zhao, L.: Semi-supervised learning with concept drift using particle dynamics applied to network intrusion detection data. In: Proceedings of BRICS Congress on Computational Intelligence & Brazilian Congress on Computational Intelligence, pp. 335–340. IEEE, New York (2013)
Zhang, Z.W., Jing, X.Y., Wang, T.J.: Label propagation based semi-supervised learning for software defect prediction. Autom. Softw. Eng. 24(1), 1–23 (2016)
Masud, M.M., et al.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2012)
Hosseini, M.J., Gholipour, A., Beigy, H.: An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowl. Inf. Syst. 46(3), 567–597 (2016)
Li, N.: Clustering assumption based classification algorithm for stream data. Pattern Recog. Artif. Intell. 30(1), 1–10 (2017)
Haque, A., Khan, L., Baron, M.: SAND: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of Thirtieth AAAI Conference on Artificial Intelligence, pp. 335–340. AAAI, Menlo Park (2016)
Yang, L., Cheung, Y.M., Yuan, Y.T.: Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 2393–2399. AAAI, Menlo Park (2017)
Shao, J., Ahmadi, Z., Kramer, S.: Prototype-based learning on concept-drifting data streams. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 412–421. ACM, New York (2014)
Jayanthi, S., Karthikeyan, B.: A recap on data stream classification. Adv. Natural Appl. Sci. 8(17), 76–82 (2014)
Wen, Y., Qiang, B., Fan, Z.: A survey of the classification of data streams with concept drift. CAAI Trans. Intell. Syst. 46(11), 2656–2665 (2013). (In Chinese)
Loo, H.R., Marsono, M.N.: Online data stream classification with incremental semi-supervised learning. In: Proceedings of the Second ACM IKDD Conference on Data Sciences, pp. 132–133. ACM, New York (2015)
Zhu, L., Pang, S., Sarrafzadeh, A., Ban, T., Inoue, D.: Incremental and decremental max-flow for online semi-supervised learning. IEEE Trans. Knowl. Data Eng. 28(8), 2115–2127 (2017)
Li, P.P., Wu, X.D., Hu, X.G.: Mining recurring concept drifts with limited labeled streaming data. ACM Trans. Intell. Syst. Technol. 3(2), 1–32 (2012)
Wu, X.D., Li, P.P., Hu, X.G.: Learning from concept drifting data streams with unlabeled data. Neurocomputing 92(9), 145–155 (2012)
Zhang, M.L., Zhou, Z.H.: Exploiting unlabeled data to enhance ensemble diversity. In: Proceedings of IEEE International Conference on Data Mining, pp. 619–628. IEEE, Piscataway (2010)
Ahmadi, Z., Beigy, H.: Semi-supervised ensemble learning of data streams in the presence of concept drift. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 526–537. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28931-6_50
Feng, Z., Wang, M., Yang, S., Jiao, L.: Incremental semi-supervised classification of data streams via self-representative selection. Appl. Soft Comput. 47, 389–394 (2016)
Zhang, P., Zhu, X., Tan, J., Guo, L.: Classifier and cluster ensembles for mining concept drifting data streams. In: Proceedings of IEEE International Conference on Data Mining, pp. 1175–1180. IEEE, Piscataway (2011)
Woolam, C., Masud, Mohammad M., Khan, L.: Lacking labels in the stream: classifying evolving stream data with few labels. In: Rauch, J., Raś, Zbigniew W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS (LNAI), vol. 5722, pp. 552–562. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04125-9_58
Apache Spam Assassin. http://spamassassin.apache.org
Machine Learning & Knowledge Discovery Group. http://mlkd.csd.auth.gr/concept_drift.html
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository Irvine (2007)
Losing, V., Hammer, B., Wersing, H.: KNN classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings of IEEE International Conference on Data Mining, pp. 291–300. IEEE, Piscataway (2017)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11(2), 1601–1604 (2010)
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw. 22(10), 1517–1531 (2011)
Bifet, A., Pfahringer, B., Read, J., Holmes, G.: Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 801–806. ACM, New York (2013)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, L.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Acknowledgment
This work was partially supported by the National Natural Science Foundation of China (61363029, 61662014, 61763007), Guangxi Key Laboratory of Trusted Software (KX201721), Collaborative innovation center of cloud computing and big data (YD16E12), Image intelligent processing project of Key Laboratory Fund (GIIP201505).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Qin, K., Wen, Y. (2018). Semi-supervised Classification of Concept Drift Data Stream Based on Local Component Replacement. In: Zhou, ZH., Yang, Q., Gao, Y., Zheng, Y. (eds) Artificial Intelligence. ICAI 2018. Communications in Computer and Information Science, vol 888. Springer, Singapore. https://doi.org/10.1007/978-981-13-2122-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-2122-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2121-4
Online ISBN: 978-981-13-2122-1
eBook Packages: Computer ScienceComputer Science (R0)