Continuously Identifying Representatives Out of Massive Streams

Li, Qiong; Ma, Xiuli; Tang, Shiwei; Xie, Shuiyuan

doi:10.1007/978-3-642-25853-4_18

Qiong Li^22,23,
Xiuli Ma²²,
Shiwei Tang^22,23 &
…
Shuiyuan Xie^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7120))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

937 Accesses
1 Citations

Abstract

More and more emerging applications are involved in monitoring multiple data streams concurrently. In these applications, the data flow out of multiple concurrent sources continuously. In such large-scale real-time monitoring applications, continuously identifying representatives out of massive streams is an important task which aims to capture key trends to support online monitoring and analysis. In this paper, we present a framework for continuously extracting representatives out of massive streams. Our framework identifies and traces representatives based on core clustering technique. We adapt the core clustering model under streaming condition and propose a method of extracting representatives by utilizing the advantage characteristic of core clusters that core set is tight. In order to continuously identify the representatives in an efficient way, we apply online representatives adjust processes only when significant clustering evolution happens. As shown in our experimental studies, our algorithm is effective and efficient.

This work is supported by the National Natural Science Foundation of China under Grant No.61103025.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

CANARY User’s Manual, VERSION 4.2., http://www.epa.gov/NHSRC/news/news122007.html
Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming pattern discovery in multiple time-series. In: VLDB (2005)
Google Scholar
Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Adaptive Clustering for Multiple Evolving Streams. IEEE Trans. Knowledge and Data Eng. 18(9), 1166–1180 (2006)
Article Google Scholar
Center for Water System at University of Exeter, http://centres.exeter.ac.uk/cws
Rodrigues, P.P., Gama, J., Pedroso, J.P.: ODAC: Hierarchical Clustering of Time Series Data Streams. In: Proc. Sixth SIAM Int’l Conf. Data Mining, pp. 499–503 (2006)
Google Scholar
Yeh, M., Dai, B., Chen, M.: Clustering over Multiple Evolving Streams by Events and Corre-lations. TKDE 19(10), 1349–1362 (2007)
Google Scholar
Wang, H., Wang, W., Yang, J., et al.: Clustering by Pattern Similarity in Large Data Sets. In: The Int’l Conf on Management of Data, Madison (2002)
Google Scholar
Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: A Core Clustering Approach for Cube Slice. Journal of Computer Research and Development, 359–365 (2006)
Google Scholar
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: SIGMOD (2010)
Google Scholar
Li, L., McCann, J., Pollard, N., Faloutsos, C.: DynaMMO: Mining and Summarization of Coevolving Sequences with missing values. In: SIGKDD (2009)
Google Scholar
Zhou, A., Cao, F., Yan, Y., Sha, C., He, X.: Distributed Data Stream Clustering: A Fast EM-based Approach. In: ICDE (2007)
Google Scholar
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the Divide: Continuous Clustering of Distributed Data Streams. In: ICDE (2007)
Google Scholar
Zhang, Q., Liu, J., Wang, W.: Approximate Clustering on Distributed Data Streams. In: ICDE (2008)
Google Scholar
Rossman, L.A.: EPANET2 user’s manual. National Risk Management Research Labora-tory: U.S. Environmental Protection Agency (2000)
Google Scholar
Ostfeld, A., Uber, J.G., Salomons, E.: Battle of water sensor networks: A design challenge for engineers and algorithms. In: WDSA (2006)
Google Scholar
Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: Mining Maximal Correlated Member Clusters in High Dimensional Database. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 149–159. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics Engineering and Computer Science, Peking University, Beijing, China, 100871
Qiong Li, Xiuli Ma, Shiwei Tang & Shuiyuan Xie
Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China, 100871
Qiong Li, Shiwei Tang & Shuiyuan Xie

Authors

Qiong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiuli Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shuiyuan Xie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jie Tang & Jianyong Wang &
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, SAR, China
Irwin King
Faculty of Engineering and Information Technology, University of Technology, 2007, Sydney, NSW, Australia
Ling Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q., Ma, X., Tang, S., Xie, S. (2011). Continuously Identifying Representatives Out of Massive Streams. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-25853-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics