Abstract
Many applications require the clustering of high-dimensional data streams. We propose a subspace clustering algorithm that can find clusters in different subspaces through one pass over a data stream. The algorithm combines the bottom-up grid-based method and top-down grid-based method. A uniformly partitioned grid data structure is used to summarize the data stream online. The top-down grid partition method is used o find the subspaces in which clusters locate. The errors made by the top-down partition procedure are eliminated by a mergence step in our algorithm. Our performance study with real datasets and synthetic dataset demonstrates the efficiency and effectiveness of our proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Henzinger, M.R., et al.: Computing on data streams. SRC Technical Note 1998-011, Digital systems research center, Palo Alto, California (1998)
O’Callaghan, L., et al.: Streaming-Data Algorithms for High-Quality Clustering. In: Proc. of the 18th International Conference on Data Engineering, pp. 685–694 (2002)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Aggarwal, C.C., et al.: A Framework for Clustering Evolving Data Streams. In: Proc. of the 29th VLDB Conference, pp. 81–92 (2003)
Aggarwal, C.C., et al.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proc. of the 30th VLDB Conference, pp. 852–863 (2004)
Park, N.H., Lee, W.S.: Statistical Grid-Based Clustering over Data Streams. ACM SIGMOD Record 33(1), 32–37 (2004)
Lu, Y., et al.: A Grid-Based Clustering Algorithm for High-Dimensional Data Streams. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS, vol. 3584, pp. 824–831. Springer, Heidelberg (2005)
Agrawal, R., et al.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. ACM SIGMOD Int. Conf. On Management of Data (SIGMOD 1998), pp. 94–105 (1998)
Goil, S., et al.: MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets. Technical Report, No. CPDC-TR-9906-010, Center for Parallel and Distributed Computing, Department of Electrical & Computer Engineering, Northwestern University (1999)
Hinneburg, A., Keim, D.A.: Optimal Grid-Clustring: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proc. of the 25th VLDB Conference, pp. 506–517 (1999)
Baumgartner, C., et al.: Subspace Selection for Clustering High-Dimensional Data. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 11–18. Springer, Heidelberg (2004)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proc. of the 24th VLDB Conference, pp. 428–439 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, Y., Lu, Y. (2006). A Grid-Based Subspace Clustering Algorithm for High-Dimensional Data Streams. In: Feng, L., Wang, G., Zeng, C., Huang, R. (eds) Web Information Systems – WISE 2006 Workshops. WISE 2006. Lecture Notes in Computer Science, vol 4256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11906070_4
Download citation
DOI: https://doi.org/10.1007/11906070_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47663-4
Online ISBN: 978-3-540-47664-1
eBook Packages: Computer ScienceComputer Science (R0)