Skip to main content

Word Clustering for Collocation-Based Word Sense Disambiguation

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the F-measure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.

Support by National Grant Fundamental Research 973 Program of China Under Grant No. 2004CB318102.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based N-gram Models of Natural Language. Computational Linguistics 4, 467–479 (1992)

    Google Scholar 

  2. Chao, G., Dyer, G.M.: Maximum Entropy Models for Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 155–161 (2002)

    Google Scholar 

  3. Dagan, D., Itai, A.: Word Sense Disambiguation Using a Second Language Monolingual Corpus. Computational Linguistics. 4, 563–596 (1994)

    Google Scholar 

  4. Dang, H.T., Chia, C., Palmer, M., Chiou, F.D., Rosenzweig, J.: Simple Features for Chinese Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 204–211 (2002)

    Google Scholar 

  5. Gelbukh, A., Sidorov, G., Han, S.-Y., Hernández-Rubio, E.: Automatic Enrichment of a Very Large Dictionary of Word Combinations on the Basis of Dependency Formalism. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 430–437. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Kim, S.B., Seo, H.C., Rim, H.C.: Information Retrieval Using Word Senses: Root Sense Tagging Approach. In: SIGIR’04, Sheffield, South Yorkshire, UK, pp. 258–265 (2004)

    Google Scholar 

  7. Lee, H.A., Kim, G.C.: Translation Selection through Source Word Sense Disambiguation and Target Word Selection. In: Proceedings of the 19th International, Taipei, Taiwan (2002)

    Google Scholar 

  8. Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised Word Sense Disambiguation with Support Vector Machines and Multiple Knowledge Sources. In: Proceedings of SENSEVAL-3: Third International Workshop on the Evaluating Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)

    Google Scholar 

  9. Li, H.: Word Clustering and Disambiguation Based on Co-occurrence Data. Natural Language Engineering 8, 25–42 (2002)

    Article  Google Scholar 

  10. Li, W.Y., Lu, Q., Li, W.J.: Integrating Collocation Features in Chinese Word Sense Disambiguation. In: Proceeding of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 87–94 (2005)

    Google Scholar 

  11. Martin, S., Liermann, J., Ney, K.: Algorithms for Bigram and Trigram Word Clustering. Speech Communication 1, 19–37 (1998)

    Article  Google Scholar 

  12. Och, F.J.: An Efficient Method for Determining Bilingual Word Classes. In: Proceeding of the Ninth Conference of the European Chapter of the Association for Computational Linguistics, pp. 71–76 (1999)

    Google Scholar 

  13. Pedersen, T.A: Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation. In: Proceeding of the first Annul Meeting of the North American Chapter for Computational Linguistics, pp. 63–69 (2000)

    Google Scholar 

  14. Stokoe, C., Oakes, M.P., Tait, J.: Word Sense Disambiguation in Information Retrieval Revisited. In: Proceeding of the 26th annual International ACM SIGIR conference On research and development in Information retrieval, ACM Press, New York (2003)

    Google Scholar 

  15. Yarowsky, D.: One Sense Per Collocation. In: Proceeding of ARPA Human Language Technology workshop, Princeton, New Jersey (1993)

    Google Scholar 

  16. Yarowsky, D.: Hierarchical Decision Lists for Word Sense Disambiguation, Computers and the Humanities. Computers and the Humanities 1, 179–186 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, P., Sun, X., Wu, Y., Yu, S. (2007). Word Clustering for Collocation-Based Word Sense Disambiguation . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics