Skip to main content

Clustering Evolving Data Stream with Affinity Propagation Algorithm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8644))

Abstract

Clustering data stream is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. Several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with dynamic data that arrive in an online fashion, capable of performing fast and incremental processing of data objects, and suitably addressing time and memory limitations. In this paper, we propose a semi-supervised clustering algorithm that extends Affinity Propagation (AP) to handle evolving data steam. We incorporate a set of labeled data items with set of exemplars to detect a change in the generative process underlying the data stream, which requires the stream model to be updated as soon as possible. Experimental results with state-of-the-art data stream clustering methods demonstrate the effectiveness and efficiency of the proposed method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.: Data Streams: Models and Algorithms. Springer (2007)

    Google Scholar 

  2. Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 1036–1045 (2007)

    Google Scholar 

  3. Jonathan, A.S., Elaine, R.F., Rodrigo, C.B., Eduardo, R.H., André, C.P., Gama, J.: Data stream clustering: A survey. ACM Computing Surveys (CSUR) 46(1), 1–31 (2013)

    Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)

    Google Scholar 

  5. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15, 515–528 (2003)

    Article  Google Scholar 

  6. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 81–92 (2003)

    Google Scholar 

  7. Ackermann, M.R., Maartens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: StreamKM++: A clustering algorithm for data streams. Journal on Experimental Algorithmics 17(1) (May 2012)

    Google Scholar 

  8. Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2375–2383 (2011)

    Google Scholar 

  9. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining (SDM), pp. 326–337 (2006)

    Google Scholar 

  10. Ruiz, C., Menasalvas, E., Spiliopoulou, M.: C-DenStream: Using Domain Knowledge on a Data Stream. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 287–301. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 133–142 (2007)

    Google Scholar 

  12. Chen, Y., Tu, L.: Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(3), 1–27 (2009)

    Article  Google Scholar 

  13. Frey, B., Dueck, D.: Clustering by passing messages between data points. Science, 972–976 (2007)

    Google Scholar 

  14. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proceedings of the 30th Int’l Conf. Very Large Data Bases, VLDB (2004)

    Google Scholar 

  15. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the Second Int’l Conf. Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  16. Zhang, X., Furtlehner, C., Perez, J., Germain-Renaud, C., Sebag, M.: Toward autonomic grids: Analyzing the job flow with affinity streaming. In: Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining, SIGKDD (2009)

    Google Scholar 

  17. Rand, W.M.: Objective Criteria for the Evaluation of Clustering Methods. J. Am. Statistical Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Atwa, W., Li, K. (2014). Clustering Evolving Data Stream with Affinity Propagation Algorithm. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10073-9_38

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10072-2

  • Online ISBN: 978-3-319-10073-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics