Abstract
Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged corpus to aid machine learning approaches for Persian texts, nor any suitable parallel corpora. Yet due to the ever-increasing development of Persian pages in Wikipedia, this resource can act as a comparable corpus for English-Persian texts.
In this paper, we propose a cross-lingual approach to tagging the word senses in Persian texts. The new approach makes use of English sense disambiguators, the Wikipedia articles in both English and Persian, and a newly developed lexical ontology, FarsNet. It overcomes the lack of knowledge resources and NLP tools for the Persian language. We demonstrate the effectiveness of the proposed approach by comparing it to a direct sense disambiguation approach for Persian. The evaluation results indicate a comparable performance to the utilized English sense tagger.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys (2009)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography (1990)
Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: A statistical approach to sense disambiguation in machine translation. In: Proceedings of the Workshop on Speech and Natural Language (1991)
Diab, M., Resnik, P.: An unsupervised method for word sense tagging using parallel corpora. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)
Mihltz, M., Pohl, G.: Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation. In: Proceedings of the 5th Conference on Language Resources and Evaluation (2006)
TufiÅž, D., Ion, R., Ide, N.: Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
TufiÅŸ, D., Koeva, S.: Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation. In: Proceedings of the 7th International Workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory (2007)
Motazedi, Y., Shamsfard, M.: English to persian machine translation exploiting semantic word sense disambiguation. In: 14th International CSI Computer Conference, CSICC 2009 (2009)
Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S.M.: Semi Automatic Development of FarsNet; The Persian WordNet. In: Proceedings of 5th Global WordNet Conference (2010)
Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: A Multilingual Semantic Network for the Balkan Languages. In: Proceedings of the 1st Global WordNet Association Conference (2002)
Faili, H.: An experiment of word sense disambiguation in a machine translation system. In: International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2008 (2008)
Harabagiu, S.M., Miller, G.A., Moldovan, D.I.: Wordnet 2 - a morphologically and semantically enhanced resource (1999)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (1986)
Saedi, C., Shamsfard, M., Motazedi, Y.: Automatic Translation between English and Persian Texts. In: In Proceedings of the 3rd Workshop on Computational Approaches to Arabic-script Based Languages (2009)
Mosavi Miangah, T., Delavar Khalafi, A.: Word Sense Disambiguation Using Target Language Corpus in a Machine Translation System (June 2005)
Soltani, M., Faili, H.: A statistical approach on persian word sense disambiguation. In: 2010 The 7th International Conference on Informatics and Systems, INFOS (2010)
Mosavi Miangah, T.: Solving the Polysemy Problem of Persian Words Using Mutual Information Statistics. In: Proceedings of the Corpus Linguistics Conference (CL 2007) (2007)
Makki, R., Homayounpour, M.: Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus. In: Advances in Natural Language Processing (2008)
Pedersen, T., Kolhatkar, V.: WordNet:SenseRelate:AllWords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session (2009)
Banerjee, S.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 805–810 (2003)
Shamsfard, M., Sadat Jafari, H., Ilbeygi, M.: STeP-1: A Set of Fundamental Tools for Persian Text Processing. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010) (2010)
Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation. Nat. Lang. Eng. (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A. (2011). Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-21043-3_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21042-6
Online ISBN: 978-3-642-21043-3
eBook Packages: Computer ScienceComputer Science (R0)