A Grid-Based Subspace Clustering Algorithm for High-Dimensional Data Streams

Sun, Yufen; Lu, Yansheng

doi:10.1007/11906070_4

Yufen Sun²⁰ &
Yansheng Lu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4256))

Included in the following conference series:

International Conference on Web Information Systems Engineering

515 Accesses
4 Citations

Abstract

Many applications require the clustering of high-dimensional data streams. We propose a subspace clustering algorithm that can find clusters in different subspaces through one pass over a data stream. The algorithm combines the bottom-up grid-based method and top-down grid-based method. A uniformly partitioned grid data structure is used to summarize the data stream online. The top-down grid partition method is used o find the subspaces in which clusters locate. The errors made by the top-down partition procedure are eliminated by a mergence step in our algorithm. Our performance study with real datasets and synthetic dataset demonstrates the efficiency and effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Henzinger, M.R., et al.: Computing on data streams. SRC Technical Note 1998-011, Digital systems research center, Palo Alto, California (1998)
Google Scholar
O’Callaghan, L., et al.: Streaming-Data Algorithms for High-Quality Clustering. In: Proc. of the 18th International Conference on Data Engineering, pp. 685–694 (2002)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Aggarwal, C.C., et al.: A Framework for Clustering Evolving Data Streams. In: Proc. of the 29th VLDB Conference, pp. 81–92 (2003)
Google Scholar
Aggarwal, C.C., et al.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proc. of the 30th VLDB Conference, pp. 852–863 (2004)
Google Scholar
Park, N.H., Lee, W.S.: Statistical Grid-Based Clustering over Data Streams. ACM SIGMOD Record 33(1), 32–37 (2004)
Article Google Scholar
Lu, Y., et al.: A Grid-Based Clustering Algorithm for High-Dimensional Data Streams. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS, vol. 3584, pp. 824–831. Springer, Heidelberg (2005)
Chapter Google Scholar
Agrawal, R., et al.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. ACM SIGMOD Int. Conf. On Management of Data (SIGMOD 1998), pp. 94–105 (1998)
Google Scholar
Goil, S., et al.: MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets. Technical Report, No. CPDC-TR-9906-010, Center for Parallel and Distributed Computing, Department of Electrical & Computer Engineering, Northwestern University (1999)
Google Scholar
Hinneburg, A., Keim, D.A.: Optimal Grid-Clustring: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proc. of the 25th VLDB Conference, pp. 506–517 (1999)
Google Scholar
Baumgartner, C., et al.: Subspace Selection for Clustering High-Dimensional Data. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 11–18. Springer, Heidelberg (2004)
Chapter Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proc. of the 24th VLDB Conference, pp. 428–439 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, 430074, China
Yufen Sun & Yansheng Lu

Authors

Yufen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yansheng Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science & Technology, Tsinghua University, Beijing, China
Ling Feng
Northeastern University,, 110004, Shenyang Liaoning, China
Guoren Wang
State Key Lab of Software Engineering, Wuhan University, 430072, Wuhan, P.R. China
Cheng Zeng
School of Information Management, Wuhan University, 430072, Wuhan, China
Ruhua Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Lu, Y. (2006). A Grid-Based Subspace Clustering Algorithm for High-Dimensional Data Streams. In: Feng, L., Wang, G., Zeng, C., Huang, R. (eds) Web Information Systems – WISE 2006 Workshops. WISE 2006. Lecture Notes in Computer Science, vol 4256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11906070_4

Download citation

DOI: https://doi.org/10.1007/11906070_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47663-4
Online ISBN: 978-3-540-47664-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics