Continuously Extracting High-Quality Representative Set from Massive Data Streams

  • Xiaokang Ji
  • Xiuli Ma
  • Ting Huang
  • Shiwei Tang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)


In many large-scale real-time monitoring applications, hundreds or thousands of streams should be continuously monitored. To ease the monitoring, a small set of representatives can be extracted to represent all the streams. To get a high-quality representative set, not only representativeness but also its stability should be guaranteed. In this paper, we propose a method to continuously extract high-quality representative set from massive streams. First, we cluster streams based on core clustering model. The tightness of core set, which means any two streams in core set are highly correlated, ensures high representativeness of representative set; second, we use topological relationship to force each cluster to be connected in the network where streams are generated from. Because streams in one cluster are driven by similar underlying mechanisms, so the representative set becomes much more stable. By utilizing the tightness of core sets, we can get representative set immediately. Moreover, with local optimization strategies, our method can adjust core clusters very efficiently, which enables real-time response. Experiments on real applications illustrate that our method is efficient and produces high-quality representative set.


Data streams Representative set Online clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ackermann, M.R., Märtens, M., Raupach, C., et al.: StreamKM++: A Clustering Algorithm for Data Streams. Journal of Experimental Algorithmics (JEA) 17(1), 2–4 (2012)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., et al.: A Framework for Clustering Evolving Data Streams. In: VLDB (2003)Google Scholar
  3. 3.
    Cheng, J., Ke, Y., Chu, S., et al.: Efficient Core Decomposition in Massive Networks. In: ICDE (2011)Google Scholar
  4. 4.
    Cheng, J., Zhu, L., Ke, Y., et al.: Fast Algorithms for Maximal Clique Enumeration with Limited Memory. In: SIGKDD (2012)Google Scholar
  5. 5.
    Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: A Core Clustering Approach for Cube Slice. Journal of Computer Research and Development, 359–365 (2006)Google Scholar
  6. 6.
    Li, L., McCann, J., Pollard, N., Faloutsos, C.: DynaMMO: Mining and Summarization of Coevolving Sequences with missing values. In: SIGKDD (2009)Google Scholar
  7. 7.
    Li, Q., Ma, X., Tang, S., Xie, S.: Continuously Identifying Representatives out of Massive Streams. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part I. LNCS, vol. 7120, pp. 229–242. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Liu, W., Zheng, Y., Chawla, S.: Discovering Spatio-temporal Causal Interactions in Traffic Data Streams. In: SIGKDD (2011)Google Scholar
  9. 9.
    Ostfeld, A., Uber, J.G., Salomons, E.: Battle of water sensor networks: A Design Challenge for Engineers and Algorithms. In: WDSA (2006)Google Scholar
  10. 10.
    Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Timeseries. In: VLDB (2005)Google Scholar
  11. 11.
    Park, H.S., Jun, C.H.: A Simple and Fast Algorithm for K-medoids Clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)CrossRefGoogle Scholar
  12. 12.
    Rossman, L.A.: EPANET2 user’s manual. National Risk Management Research Laboratory: U.S. Environmental Protection Agency (2000)Google Scholar
  13. 13.
    Xiao, H., Ma, X., Tang, S.: Continuous Summarization of Co-evolving Data in Large Water Distribution Network. In: WAIM (2010)Google Scholar
  14. 14.
    Yeh, M., Dai, B., Chen, M.: Clustering over Multiple Evolving Streams by Events and Correlations. TKDE 19(10), 1349–1362 (2007)Google Scholar
  15. 15.
    The Centre for Water Systems (CWS) at the University of Exeter,

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Xiaokang Ji
    • 1
    • 2
  • Xiuli Ma
    • 1
    • 2
  • Ting Huang
    • 1
    • 2
  • Shiwei Tang
    • 1
    • 2
  1. 1.Key Laboratory of Machine PerceptionPeking University, Ministry of EducationChina
  2. 2.School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina

Personalised recommendations