A Fast Clustering Algorithm for Massive Short Message

Huang, Ya; Zhang, Wenzhi; Zhang, Haiyang; Xu, Saihong

doi:10.1007/978-3-319-31854-7_17

Ya Huang^15,16,
Wenzhi Zhang¹⁶,
Haiyang Zhang¹⁵ &
…
Saihong Xu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9567))

Included in the following conference series:

International Conference on Human Centered Computing

1554 Accesses

Abstract

With the rapid development of mobile communication technology, the short message is playing a more and more important role in the daily life. Most of existing clustering algorithms are hard to be applied in dealing with massive short message due to the huge scale of data and similarity. This paper presents an efficient clustering algorithm by taking a special method to build feature string and a reasonable selection of cluster number. Experiments show that the clustering system based on this algorithm can depose millions of short message per hour with high precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

http://www.cac.gov.cn/cnnic35fzzktjbg.htm (2014)
Hotho, A., Nürnberger, A., Paaβ, G.: A brief survey of text mining. In: LDV Forum, vol. 20, no. 1, pp. 19–62 (2005)
Google Scholar
Peng, Z., Xiaoming, Y., Hongbo, X., Liu, C.: Incomplete clustering for large scale short texts. J. Chin. Inf. Process. 25(1), 54–59 (2011)
Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 176–190 (2012)
Google Scholar
Mocian, H: Survey of Distributed Clustering Techniques [EB/OL]. 1st term ISO report (2009)
Google Scholar
Steinley, D.: K-means clustering: a half-century synthesis. Br. J. Math. Stat. Psychol. 59(May (34)), pp. 1–34 (2006)
Google Scholar
He, H., Chen, B., Xu, W.R., Guo, J.: Short text feature extraction and clustering for web topic mining [EB/OL]. In: Proceeding of the 3rd International Conference on Semantics, Knowledge and Grid, pp. 382–385. IEEE, Washington D.C., USA (2007)
Google Scholar
Zhou, H., Liu, J.: Study on mass chinese short message text density clustering. Comput. Eng. 11, 81–83 (2010)
Google Scholar

Download references

Acknowledgement

This work was supported by The open project of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory (ITD-U14002/KX142600009).

Author information

Authors and Affiliations

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, People’s Republic of China
Ya Huang, Haiyang Zhang & Saihong Xu
Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory, Beijing, China
Ya Huang & Wenzhi Zhang

Authors

Ya Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Saihong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya Huang .

Editor information

Editors and Affiliations

Wuhan, Hubei, China
Qiaohong Zu
Fujitsu Laboratories of Europe Ltd., Middlesex, United Kingdom
Bo Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Y., Zhang, W., Zhang, H., Xu, S. (2016). A Fast Clustering Algorithm for Massive Short Message. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2016. Lecture Notes in Computer Science(), vol 9567. Springer, Cham. https://doi.org/10.1007/978-3-319-31854-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-31854-7_17
Published: 01 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31853-0
Online ISBN: 978-3-319-31854-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics