A Statistical μ-Partitioning Method for Clustering Data Streams

  • Nam Hun Park
  • Won Suk Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2869)


A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. This paper proposes a clustering method over a data stream based on statistical μ-partition. The multi-dimensional space of a data domain is divided into a set of mutually exclusive equal-size initial cells. A cell maintains the distribution statistics of data elements in its range. Based on the distribution statistics of a cell, a dense cell is dynamically split into two mutually exclusive smaller cells called intermediate cells. Eventually, the dense sub-range of an initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. A cluster of a data stream is a group of adjacent dense unit cells. As the size of a unit cell is set to be smaller, the resulting set of clusters is more accurately identified. Through a series of experiments, the performance of the proposed algorithm is comparatively analyzed.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Chichester (1972)Google Scholar
  2. 2.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRefGoogle Scholar
  3. 3.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. SIGMOD, pp. 103–114 (1996)Google Scholar
  4. 4.
    Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proc. SIGMOD, pp. 73–84 (1998)Google Scholar
  5. 5.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases (1996)Google Scholar
  6. 6.
    Wang, W., Yang, J., Muntz, R.: Sting: A statistical information grid approach to spatial data mining (1997)Google Scholar
  7. 7.
    Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look. In: The tutorial notes of the 28th Int’l Conference on Very Large Databases, Hong Kong, China (August 2002)Google Scholar
  8. 8.
    O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: STREAM-data algorithms for high-quality clustering. In: Proc. of IEEE International Conference on Data Engineering (March 2002)Google Scholar
  9. 9.
    Cheng, C., Fu, A., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: KDD 1999, San Diego, pp. 84–93 (August 1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Nam Hun Park
    • 1
  • Won Suk Lee
    • 1
  1. 1.Department of Computer ScienceYonsei UniversitySeoulKorea

Personalised recommendations