Abstract
Representation of word meaning has been a topic of considerable debate within the field of computational linguistics, and particularly in the subfield of word sense disambiguation. While word senses enumerated in manually produced inventories have been very useful as a start point to research, we know that the inventory should be selected for the purposes of the application. Unfortunately we have no clear understanding of how to determine the appropriateness of an inventory for monolingual applications, or when the target language is unknown in cross-lingual applications. In this paper we examine datasets which have paraphrases or translations as alternative annotations of lexical meaning on the same underlying corpus data. We demonstrate that overlap in lexical paraphrases (substitutes) between two uses of the same lemma correlates with overlap in translations. We compare the degree of overlap with annotations of usage similarity on the same data and show that the overlaps in paraphrases or translations also correlate with the similarity judgements. This bodes well for using any of these methods to evaluate unsupervised representations of lexical semantics. We do however find that the relationship breaks down for some lemmas, but this behaviour on a lemma by lemma basis itself correlates with low inter-tagger agreement and higher proportions of mid-range points on a usage similarity dataset. Lemmas which have many inter-related usages might potentially be predicted from such data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. The MIT Press, Cambridge (1998)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24, 97–123 (1998)
Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, pp. 613–619 (2002)
McCarthy, D., Navigli, R.: SemEval-2007 task 10: English lexical substitution task. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 48–53 (2007)
Mihalcea, R., Sinha, R., McCarthy, D.: Semeval-2010 task 2: Cross-lingual lexical substitution. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 9–14. Association for Computational Linguistics, Uppsala (2010)
Erk, K., McCarthy, D., Gaylord, N.: Investigations on word senses and word usages. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Association for Computational Linguistics, Suntec (2009)
Kilgarriff, A.: Word senses. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 29–46. Springer, Heidelberg (2006)
Resnik, P.: Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania (1993)
Sanderson, M.: Word sense disambiguation and information retrieval. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151. ACM Press, New York (1994)
Carpuat, M., Wu, D.: Word sense disambiguation vs. statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). Association for Computational Linguistics, Ann Arbor (2005)
Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 61–72. Association for Computational Linguistics, Prague (2007)
Resnik, P.: wsd in nlp applications. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 299–337. Springer, Heidelberg (2006)
Clough, P., Stevenson, M.: Evaluating the contribution of eurowordnet and word sense disambiguation to cross-language retrieval. In: Second International Global WordNet Conference (GWC 2004), pp. 97–105 (2004)
Ide, N., Wilks, Y.: Making sense about sense. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 47–73. Springer, Heidelberg (2006)
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: The 90% solution. In: Proceedings of the HLT-NAACL 2006 Workshop on Learning Word Meaning from Non-linguistic Data. Association for Computational Linguistics, New York City (2006)
Navigli, R., Litkowski, K.C., Hargraves, O.: SemEval-2007 task 7: Coarse-grained english all-words task. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 30–35 (2007)
Navigli, R.: Meaningful clustering of senses helps boost word sense disambiguation performance. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics Joint with the 21st International Conference on Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 105–112 (2006)
Stokoe, C.: Differentiating homonymy and polysemy in information retrieval. In: Proceedings of the Joint Conference on Human Language Technology and Empirical methods in Natural Language Processing, Vancouver, B.C., Canada, pp. 403–410 (2005)
McCarthy, D.: Relating wordnet senses for word sense disambiguation. In: Proceedings of the EACL 2006 Workshop: Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, Trento, Italy, pp. 17–24 (2006)
Sharoff, S.: Open-source corpora: Using the net to fish for linguistic data. International Journal of Corpus Linguistics 11, 435–462 (2006)
McCarthy, D., Navigli, R.: The English lexical substitution task. In: Language Resources and Evaluation Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond, vol. 43(2), pp. 139–159 (2009)
Ng, H.T., Chan, Y.S.: SemEval-2007 task 11: English lexical sample task via english-chinese parallel text. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 54–58 (2007)
Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: Proceedings of ACL 2008: HLT, pp. 236–244. Association for Computational Linguistics, Columbus (2008)
Lefever, E., Hoste, V.: SemEval-2007 task 3: Cross-lingual word sense disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluations (SemEval 2010), Uppsala, Sweden (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McCarthy, D. (2011). Measuring Similarity of Word Meaning in Context with Lexical Substitutes and Translations. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-19400-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)