Skip to main content

Measuring Similarity of Word Meaning in Context with Lexical Substitutes and Translations

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Abstract

Representation of word meaning has been a topic of considerable debate within the field of computational linguistics, and particularly in the subfield of word sense disambiguation. While word senses enumerated in manually produced inventories have been very useful as a start point to research, we know that the inventory should be selected for the purposes of the application. Unfortunately we have no clear understanding of how to determine the appropriateness of an inventory for monolingual applications, or when the target language is unknown in cross-lingual applications. In this paper we examine datasets which have paraphrases or translations as alternative annotations of lexical meaning on the same underlying corpus data. We demonstrate that overlap in lexical paraphrases (substitutes) between two uses of the same lemma correlates with overlap in translations. We compare the degree of overlap with annotations of usage similarity on the same data and show that the overlaps in paraphrases or translations also correlate with the similarity judgements. This bodes well for using any of these methods to evaluate unsupervised representations of lexical semantics. We do however find that the relationship breaks down for some lemmas, but this behaviour on a lemma by lemma basis itself correlates with low inter-tagger agreement and higher proportions of mid-range points on a usage similarity dataset. Lemmas which have many inter-related usages might potentially be predicted from such data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  2. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24, 97–123 (1998)

    Google Scholar 

  3. Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, pp. 613–619 (2002)

    Google Scholar 

  4. McCarthy, D., Navigli, R.: SemEval-2007 task 10: English lexical substitution task. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 48–53 (2007)

    Google Scholar 

  5. Mihalcea, R., Sinha, R., McCarthy, D.: Semeval-2010 task 2: Cross-lingual lexical substitution. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 9–14. Association for Computational Linguistics, Uppsala (2010)

    Google Scholar 

  6. Erk, K., McCarthy, D., Gaylord, N.: Investigations on word senses and word usages. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Association for Computational Linguistics, Suntec (2009)

    Google Scholar 

  7. Kilgarriff, A.: Word senses. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 29–46. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Resnik, P.: Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania (1993)

    Google Scholar 

  9. Sanderson, M.: Word sense disambiguation and information retrieval. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151. ACM Press, New York (1994)

    Google Scholar 

  10. Carpuat, M., Wu, D.: Word sense disambiguation vs. statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). Association for Computational Linguistics, Ann Arbor (2005)

    Google Scholar 

  11. Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 61–72. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  12. Resnik, P.: wsd in nlp applications. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 299–337. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Clough, P., Stevenson, M.: Evaluating the contribution of eurowordnet and word sense disambiguation to cross-language retrieval. In: Second International Global WordNet Conference (GWC 2004), pp. 97–105 (2004)

    Google Scholar 

  14. Ide, N., Wilks, Y.: Making sense about sense. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 47–73. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: The 90% solution. In: Proceedings of the HLT-NAACL 2006 Workshop on Learning Word Meaning from Non-linguistic Data. Association for Computational Linguistics, New York City (2006)

    Google Scholar 

  16. Navigli, R., Litkowski, K.C., Hargraves, O.: SemEval-2007 task 7: Coarse-grained english all-words task. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 30–35 (2007)

    Google Scholar 

  17. Navigli, R.: Meaningful clustering of senses helps boost word sense disambiguation performance. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics Joint with the 21st International Conference on Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 105–112 (2006)

    Google Scholar 

  18. Stokoe, C.: Differentiating homonymy and polysemy in information retrieval. In: Proceedings of the Joint Conference on Human Language Technology and Empirical methods in Natural Language Processing, Vancouver, B.C., Canada, pp. 403–410 (2005)

    Google Scholar 

  19. McCarthy, D.: Relating wordnet senses for word sense disambiguation. In: Proceedings of the EACL 2006 Workshop: Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, Trento, Italy, pp. 17–24 (2006)

    Google Scholar 

  20. Sharoff, S.: Open-source corpora: Using the net to fish for linguistic data. International Journal of Corpus Linguistics 11, 435–462 (2006)

    Article  Google Scholar 

  21. McCarthy, D., Navigli, R.: The English lexical substitution task. In: Language Resources and Evaluation Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond, vol. 43(2), pp. 139–159 (2009)

    Google Scholar 

  22. Ng, H.T., Chan, Y.S.: SemEval-2007 task 11: English lexical sample task via english-chinese parallel text. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 54–58 (2007)

    Google Scholar 

  23. Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: Proceedings of ACL 2008: HLT, pp. 236–244. Association for Computational Linguistics, Columbus (2008)

    Google Scholar 

  24. Lefever, E., Hoste, V.: SemEval-2007 task 3: Cross-lingual word sense disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluations (SemEval 2010), Uppsala, Sweden (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McCarthy, D. (2011). Measuring Similarity of Word Meaning in Context with Lexical Substitutes and Translations. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19400-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19399-6

  • Online ISBN: 978-3-642-19400-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics