Skip to main content

Instant Message Clustering Based on Extended Vector Space Model

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4683))

Abstract

Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale mass communication media, clustering on its text content is a practical method to analyze the characteristic of text content in instant messages, and find or track the social hot topics. However, key words in one instant message usually are few, even latent; moreover, single message can not describe the conversational context. This is very different from general document and makes common clustering algorithms unsuitable. A novel method called WR − KMeans is proposed, which synthesizes related instant messages as a conversation and enriches conversation’s vector by words which are not included in this conversation but are closely related with existing words in this conversation. WR − KMeans performs clustering like k-means on this extended vector space of conversations. Experiments on the public datasets show that WR − KMeans outperforms the traditional k-means and bisecting k-means algorithms.

This project is sponsored by national 863 high technology development foundation (No. 2006AA01Z451, No.2006AA10Z237).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Resig, J., Teredesai, A.: A framework for mining instant messaging services. In: Proceedings of the 2004 SIAM Lake Buena Vista, Florida (2004)

    Google Scholar 

  2. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th berkeley SMSP, pp. 281–297 (1967)

    Google Scholar 

  3. Guan, Y., et al.: Quantifying Semantic Similarity of Chinese Words from Hownet. In: IEEE Proceedings of ICMLC 2002, Beijing, vol. 1, pp. 234–239. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  4. Sack, et al.: A Content-Based Usenet Newsgroup Browser. In: Proceedings of the international conference on Intelligent user interfaces, New Orleans, Louisianna, pp. 233–240 (2000)

    Google Scholar 

  5. Khan, F.M., Fisher, T.A., Shuler, L., Wu, T., Pottenger, W.M.: Mining chat-room conversations for social and semantic interactions (2002)

    Google Scholar 

  6. Hearst, M.A.: TextTiling: A Quantitative Approach to Discourse Segmentation, Technical Report UCB: S2K-93-24 (1993)

    Google Scholar 

  7. Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  8. Ding, C.H.Q.: A probabilistic model for dimensionality reduction in information retrieval and filtering. In: Proc. of the 1st SIAM, Raleigh, NC (2000)

    Google Scholar 

  9. Ikehara, S., et al.: Vector space model based on semantic attributes of words. In: PACLING. Proc. of the Pacific Association for Computational Linguistics, Kitakyushu, Japan (2001)

    Google Scholar 

  10. Daemi, A., et al.: From Ontologies to Trust through Entropy. In: Proceedings of the International Conference on Advances in Intelligent System, Luxembourg (2004)

    Google Scholar 

  11. Hotho, A., et al.: Ontology-based Text Document Clustering. KI 16(4), 48–54 (2002)

    Google Scholar 

  12. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining partitions. Journal of Machine Learning Research 3, 583–617 (2002)

    Article  Google Scholar 

  13. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Lishan Kang Yong Liu Sanyou Zeng

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, L., Jia, Y., Han, W. (2007). Instant Message Clustering Based on Extended Vector Space Model. In: Kang, L., Liu, Y., Zeng, S. (eds) Advances in Computation and Intelligence. ISICA 2007. Lecture Notes in Computer Science, vol 4683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74581-5_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74581-5_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74580-8

  • Online ISBN: 978-3-540-74581-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics