Skip to main content

A Fast Clustering Algorithm for Massive Short Message

  • Conference paper
  • First Online:
Human Centered Computing (HCC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9567))

Included in the following conference series:

  • 1554 Accesses

Abstract

With the rapid development of mobile communication technology, the short message is playing a more and more important role in the daily life. Most of existing clustering algorithms are hard to be applied in dealing with massive short message due to the huge scale of data and similarity. This paper presents an efficient clustering algorithm by taking a special method to build feature string and a reasonable selection of cluster number. Experiments show that the clustering system based on this algorithm can depose millions of short message per hour with high precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. http://www.cac.gov.cn/cnnic35fzzktjbg.htm (2014)

  2. Hotho, A., Nürnberger, A., Paaβ, G.: A brief survey of text mining. In: LDV Forum, vol. 20, no. 1, pp. 19–62 (2005)

    Google Scholar 

  3. Peng, Z., Xiaoming, Y., Hongbo, X., Liu, C.: Incomplete clustering for large scale short texts. J. Chin. Inf. Process. 25(1), 54–59 (2011)

    Google Scholar 

  4. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 176–190 (2012)

    Google Scholar 

  5. Mocian, H: Survey of Distributed Clustering Techniques [EB/OL]. 1st term ISO report (2009)

    Google Scholar 

  6. Steinley, D.: K-means clustering: a half-century synthesis. Br. J. Math. Stat. Psychol. 59(May (34)), pp. 1–34 (2006)

    Google Scholar 

  7. He, H., Chen, B., Xu, W.R., Guo, J.: Short text feature extraction and clustering for web topic mining [EB/OL]. In: Proceeding of the 3rd International Conference on Semantics, Knowledge and Grid, pp. 382–385. IEEE, Washington D.C., USA (2007)

    Google Scholar 

  8. Zhou, H., Liu, J.: Study on mass chinese short message text density clustering. Comput. Eng. 11, 81–83 (2010)

    Google Scholar 

Download references

Acknowledgement

This work was supported by The open project of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory (ITD-U14002/KX142600009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ya Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Huang, Y., Zhang, W., Zhang, H., Xu, S. (2016). A Fast Clustering Algorithm for Massive Short Message. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2016. Lecture Notes in Computer Science(), vol 9567. Springer, Cham. https://doi.org/10.1007/978-3-319-31854-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31854-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31853-0

  • Online ISBN: 978-3-319-31854-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics