Abstract
With the rapid development of mobile communication technology, the short message is playing a more and more important role in the daily life. Most of existing clustering algorithms are hard to be applied in dealing with massive short message due to the huge scale of data and similarity. This paper presents an efficient clustering algorithm by taking a special method to build feature string and a reasonable selection of cluster number. Experiments show that the clustering system based on this algorithm can depose millions of short message per hour with high precision and recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hotho, A., Nürnberger, A., Paaβ, G.: A brief survey of text mining. In: LDV Forum, vol. 20, no. 1, pp. 19–62 (2005)
Peng, Z., Xiaoming, Y., Hongbo, X., Liu, C.: Incomplete clustering for large scale short texts. J. Chin. Inf. Process. 25(1), 54–59 (2011)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 176–190 (2012)
Mocian, H: Survey of Distributed Clustering Techniques [EB/OL]. 1st term ISO report (2009)
Steinley, D.: K-means clustering: a half-century synthesis. Br. J. Math. Stat. Psychol. 59(May (34)), pp. 1–34 (2006)
He, H., Chen, B., Xu, W.R., Guo, J.: Short text feature extraction and clustering for web topic mining [EB/OL]. In: Proceeding of the 3rd International Conference on Semantics, Knowledge and Grid, pp. 382–385. IEEE, Washington D.C., USA (2007)
Zhou, H., Liu, J.: Study on mass chinese short message text density clustering. Comput. Eng. 11, 81–83 (2010)
Acknowledgement
This work was supported by The open project of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory (ITD-U14002/KX142600009).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, Y., Zhang, W., Zhang, H., Xu, S. (2016). A Fast Clustering Algorithm for Massive Short Message. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2016. Lecture Notes in Computer Science(), vol 9567. Springer, Cham. https://doi.org/10.1007/978-3-319-31854-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-31854-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31853-0
Online ISBN: 978-3-319-31854-7
eBook Packages: Computer ScienceComputer Science (R0)