Skip to main content

Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6657))

Included in the following conference series:

Abstract

Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged corpus to aid machine learning approaches for Persian texts, nor any suitable parallel corpora. Yet due to the ever-increasing development of Persian pages in Wikipedia, this resource can act as a comparable corpus for English-Persian texts.

In this paper, we propose a cross-lingual approach to tagging the word senses in Persian texts. The new approach makes use of English sense disambiguators, the Wikipedia articles in both English and Persian, and a newly developed lexical ontology, FarsNet. It overcomes the lack of knowledge resources and NLP tools for the Persian language. We demonstrate the effectiveness of the proposed approach by comparing it to a direct sense disambiguation approach for Persian. The evaluation results indicate a comparable performance to the utilized English sense tagger.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys (2009)

    Google Scholar 

  2. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography (1990)

    Google Scholar 

  3. Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: A statistical approach to sense disambiguation in machine translation. In: Proceedings of the Workshop on Speech and Natural Language (1991)

    Google Scholar 

  4. Diab, M., Resnik, P.: An unsupervised method for word sense tagging using parallel corpora. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)

    Google Scholar 

  5. Mihltz, M., Pohl, G.: Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation. In: Proceedings of the 5th Conference on Language Resources and Evaluation (2006)

    Google Scholar 

  6. TufiÅž, D., Ion, R., Ide, N.: Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)

    Google Scholar 

  7. TufiÅŸ, D., Koeva, S.: Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation. In: Proceedings of the 7th International Workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory (2007)

    Google Scholar 

  8. Motazedi, Y., Shamsfard, M.: English to persian machine translation exploiting semantic word sense disambiguation. In: 14th International CSI Computer Conference, CSICC 2009 (2009)

    Google Scholar 

  9. Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S.M.: Semi Automatic Development of FarsNet; The Persian WordNet. In: Proceedings of 5th Global WordNet Conference (2010)

    Google Scholar 

  10. Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: A Multilingual Semantic Network for the Balkan Languages. In: Proceedings of the 1st Global WordNet Association Conference (2002)

    Google Scholar 

  11. Faili, H.: An experiment of word sense disambiguation in a machine translation system. In: International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2008 (2008)

    Google Scholar 

  12. Harabagiu, S.M., Miller, G.A., Moldovan, D.I.: Wordnet 2 - a morphologically and semantically enhanced resource (1999)

    Google Scholar 

  13. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (1986)

    Google Scholar 

  14. Saedi, C., Shamsfard, M., Motazedi, Y.: Automatic Translation between English and Persian Texts. In: In Proceedings of the 3rd Workshop on Computational Approaches to Arabic-script Based Languages (2009)

    Google Scholar 

  15. Mosavi Miangah, T., Delavar Khalafi, A.: Word Sense Disambiguation Using Target Language Corpus in a Machine Translation System (June 2005)

    Google Scholar 

  16. Soltani, M., Faili, H.: A statistical approach on persian word sense disambiguation. In: 2010 The 7th International Conference on Informatics and Systems, INFOS (2010)

    Google Scholar 

  17. Mosavi Miangah, T.: Solving the Polysemy Problem of Persian Words Using Mutual Information Statistics. In: Proceedings of the Corpus Linguistics Conference (CL 2007) (2007)

    Google Scholar 

  18. Makki, R., Homayounpour, M.: Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus. In: Advances in Natural Language Processing (2008)

    Google Scholar 

  19. Pedersen, T., Kolhatkar, V.: WordNet:SenseRelate:AllWords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session (2009)

    Google Scholar 

  20. Banerjee, S.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 805–810 (2003)

    Google Scholar 

  21. Shamsfard, M., Sadat Jafari, H., Ilbeygi, M.: STeP-1: A Set of Fundamental Tools for Persian Text Processing. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010) (2010)

    Google Scholar 

  22. Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation. Nat. Lang. Eng. (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A. (2011). Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21043-3_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21042-6

  • Online ISBN: 978-3-642-21043-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics