Abstract
Clustering data stream is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. Several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with dynamic data that arrive in an online fashion, capable of performing fast and incremental processing of data objects, and suitably addressing time and memory limitations. In this paper, we propose a semi-supervised clustering algorithm that extends Affinity Propagation (AP) to handle evolving data steam. We incorporate a set of labeled data items with set of exemplars to detect a change in the generative process underlying the data stream, which requires the stream model to be updated as soon as possible. Experimental results with state-of-the-art data stream clustering methods demonstrate the effectiveness and efficiency of the proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.: Data Streams: Models and Algorithms. Springer (2007)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 1036–1045 (2007)
Jonathan, A.S., Elaine, R.F., Rodrigo, C.B., Eduardo, R.H., André, C.P., Gama, J.: Data stream clustering: A survey. ACM Computing Surveys (CSUR) 46(1), 1–31 (2013)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15, 515–528 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 81–92 (2003)
Ackermann, M.R., Maartens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: StreamKM++: A clustering algorithm for data streams. Journal on Experimental Algorithmics 17(1) (May 2012)
Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2375–2383 (2011)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining (SDM), pp. 326–337 (2006)
Ruiz, C., Menasalvas, E., Spiliopoulou, M.: C-DenStream: Using Domain Knowledge on a Data Stream. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 287–301. Springer, Heidelberg (2009)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 133–142 (2007)
Chen, Y., Tu, L.: Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(3), 1–27 (2009)
Frey, B., Dueck, D.: Clustering by passing messages between data points. Science, 972–976 (2007)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proceedings of the 30th Int’l Conf. Very Large Data Bases, VLDB (2004)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the Second Int’l Conf. Knowledge Discovery and Data Mining (1996)
Zhang, X., Furtlehner, C., Perez, J., Germain-Renaud, C., Sebag, M.: Toward autonomic grids: Analyzing the job flow with affinity streaming. In: Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining, SIGKDD (2009)
Rand, W.M.: Objective Criteria for the Evaluation of Clustering Methods. J. Am. Statistical Assoc. 66(336), 846–850 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Atwa, W., Li, K. (2014). Clustering Evolving Data Stream with Affinity Propagation Algorithm. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-10073-9_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10072-2
Online ISBN: 978-3-319-10073-9
eBook Packages: Computer ScienceComputer Science (R0)