Abstract
Clustering data streams is a challenging problem in mining data streams. Data streams need to be read by a clustering algorithm in a single pass with limited time, and memory whereas they may change over time. Different clustering algorithms have been developed for data streams. Density-based algorithms are a remarkable group in clustering data that can find arbitrary shape clusters, and handle the outliers as well. In recent years, density-based clustering algorithms are adopted for data streams. However, in clustering data streams, it is impossible to record all data streams. Micro-clustering is a summarization method used to record synopsis information about data streams. Various algorithms apply micro-clustering methods for clustering data streams. In this paper, we will concentrate on the density-based clustering algorithms that use micro-clustering methods for clustering and we refer them as density-micro clustering algorithms. We review the algorithms in details and compare them based on different characteristics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal CC (ed) (2007) Data streams—models and algorithms. Springer, New york, USA
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, VLDB Endowment, Berlin, Germany, pp 81–92
Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases VLDB Endowment, Toronto, Canada, pp 852–863
Anil KJ, Murty MN, Flynn PJ (1999) Data clustering: a review, ACM Comput Surveys 31:264–323
Anil KJ (2008) Data clustering: 50 years beyond K-means, Pattern Recogn Lett 31(8):651–666
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure, SIGMOD Records 28:49–60
Amini A, Teh YW (2011) Density micro-clustering algorithms on data streams: a review, lecture notes in engineering and computer science: proceedings of the international multiconference of engineers and computer scientists 2011, IMECS 2011, Hong Kong, 16–18 March 2011
Amini A, Teh YW, Saybani MR, Aghabozorgi SR (2011) A study of density-grid based clustering algorithms on data streams. In: Proceedings of the 8th international conference on fuzzy systems and knowledge discovery, Shanghai, pp 410–414
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS 2002, New York, pp 1–16
Cao F, Ester M, Weining Q, Aoying Z (2006) Density-based clustering over an evolving data stream with noise. In: SIAM conference on data mining, SIAM, Bethesda, Maryland, USA, pp 328–339
Elena I, Suzana L, Dejan G (2007) A survey of stream data mining. In: Proceedings of 8th national conference with international participation, ETAI, Ohrid, Republic of MACEDONIA, pp 19–21
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining (KDD), AAAI Press, Portland, Oregon, pp 226–231
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications (ASA-SIAM series on statistics and applied probability). Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pennsylvania
Gaber MM, Zaslavsky A, Krishnaswamy S (2010) Data stream mining, data mining and knowledge discovery handbook, pp 759–787
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Record 34:18–26
Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceeding of 4th international conference on knowledge discovery & data mining, New York City, NY, pp 58–65
Kranen P, Assent I, Baldauf C, Seidl T (2011) The ClusTree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2): 249–272
Li-xiong L, Jing K, Yun-fei G, Hai H (2009) A three-step clustering algorithm over an evolving data stream. In: Proceedings of IEEE international conference on intelligent computing and intelligent systems (ICIS), Shanghai, China, pp 160–164
Ren J, Ma R, Ren J (2009) Density-based data streams clustering over sliding windows. In: Proceedings of the 6th international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, Tianjin, China
Ruiz C, Menasalvas E, Spiliopoulou M (2009) C-DenStream: using domain knowledge on a data stream. In: Proceedings of the 12th international conference on discovery science, Springer, Berlin, pp 287–301
Ruiz C, Spiliopoulou M, Menasalvas E (2007) C-DBSCAN: density-based clustering with constraints. In: Proceedings of the international conference on rough sets fuzzy sets data mining and granular computing, Springer, Berlin, Heidelberg, pp 216–223
Tasoulis DK, Ross G, Adams NM (2007) Visualizing the cluster structure of data streams. In: Proceedings of the 7th international conference on intelligent data analysis, IDA, Springer, Berlin, pp 81–92
Wagstaff K, Cardie C, Rogers S, Schrodl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML, San Francisco, pp 577–584
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, pp 103–114
Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowledge Inform Syst 15:181–214
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Amini, A., Wah, T.Y. (2012). A Comparative Study of Density-based Clustering Algorithms on Data Streams: Micro-clustering Approaches. In: Ao, S., Castillo, O., Huang, X. (eds) Intelligent Control and Innovative Computing. Lecture Notes in Electrical Engineering, vol 110. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1695-1_21
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1695-1_21
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1694-4
Online ISBN: 978-1-4614-1695-1
eBook Packages: EngineeringEngineering (R0)