Skip to main content

Improving Short Text Clustering Performance with Keyword Expansion

  • Chapter
The Sixth International Symposium on Neural Networks (ISNN 2009)

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 56))

Abstract

Most of traditional text clustering methods are based on bag of words representation, which ignore the important information on semantic relationship between key terms. To overcome this problem, researchers have recently proposed several new methods for improving short text clustering accuracy based on enriching short text representation. However, the computational costs of these methods based on expanding words appeared in short texts are usually time-consuming. In this paper, we improve previous work by enriching short text representation with keyword expansion. Empirical results show that the proposed method can greatly save time without sacrificing clustering accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, S., Ramanathan, K., Gupta, A.: Clustering Short Texts Using Wikipedia. In: 30th Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, pp. 787–788. ACM Press, New York (2007)

    Chapter  Google Scholar 

  2. Hotho, A., Staab, S., Stumme, G.: Ontologies Improve Text Document Clustering. In: Third IEEE International Conference on Data Mining. IEEE Computer Society Press, Florida (2003)

    Google Scholar 

  3. Fellbaum, C.: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press, Cambridge (1998)

    Google Scholar 

  4. The Reuters-21578 benchmark corpus, http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml

  5. Hersh, W., Buckley, C., Leone, T.J.: OHSUMED: an Interactive Retrieval Evaluation and New Large Test Collection for Research. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201. Springer, New York (1994)

    Google Scholar 

  6. Li, J.Z., Fan, Q., Kuo, Z.: Keyword Extraction Based on tf/idf for Chinese News Document. Wuhan University Journal of Natural Sciences (2007)

    Google Scholar 

  7. Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55(3), 311–331 (2004)

    Article  MATH  Google Scholar 

  8. George, H., Adam, S.: Agreement, the F-Measure, and Reliability in Information Retrieval. J. Am. Med. Inform. Assoc. 12, 296–298 (2005)

    Article  Google Scholar 

  9. Diego, I., David, P., Paolo, R.: Evaluation of Internal Validity Measures in Short-Text Corpor. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 555–567. Springer, Heidelberg (2008)

    Google Scholar 

  10. Resnik, P.: Using Information Content to Evaluate Semantic Similarity in Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Canada, pp. 448–453 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wang, J., Zhou, Y., Li, L., Hu, B., Hu, X. (2009). Improving Short Text Clustering Performance with Keyword Expansion. In: Wang, H., Shen, Y., Huang, T., Zeng, Z. (eds) The Sixth International Symposium on Neural Networks (ISNN 2009). Advances in Intelligent and Soft Computing, vol 56. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01216-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01216-7_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01215-0

  • Online ISBN: 978-3-642-01216-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics