Skip to main content

Multilingual Projections

  • Chapter
  • First Online:
  • 1487 Accesses

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

Abstract

Languages of the world, though different, share structures and vocabulary. Today’s NLP depends crucially on annotation which, however, is costly, needing expertise, money and time. Most languages in the world fall far behind English, when it comes to annotated resources. Since annotation is costly, there has been worldwide effort at leveraging multilinguality in development and use of annotated corpora. The key idea is to project and utilize annotation from one language to another. This means parameters learnt from the annotated corpus of one language is made use of in the NLP of another language. We illustrate multilingual projection through the case study of word sense disambiguation (WSD) whose goal is to obtain the correct meaning of a word in the context. The correct meaning is usually denoted by an appropriate sense id from a sense repository, usually the wordnet. In this paper we show how two languages can help each other in their WSD, even when neither language has any sense marked corpus. The two specific languages chosen are Hindi and Marathi. The sense repository is the IndoWordnet which is a linked structure of wordnets of 19 major Indian languages from Indo-Aryan, Dravidian and Sino-Tibetan families. These wordnets have been created by following the expansion approach from Hindi wordnet. The WSD algorithm is reminiscent of expectation maximization. The sense distribution of either language is estimated through the mediation of the sense distribution of the other language in an iterative fashion. The WSD accuracy arrived at is better than any state of the art accuracy of all words general purpose unsupervised WSD.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://en.wikipedia.org/wiki/Ethnologue.

  2. 2.

    http://www.cfilt.iitb.ac.in/wordnet/webhwn.

  3. 3.

    https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

  4. 4.

    The PP “with a telescope” can get attached to either “saw” (‘I have the telescope’) or “the boy” (‘the boy has the telescope’).

  5. 5.

    http://www.cfilt.iitb.ac.in/wsd/annotated_corpus.

  6. 6.

    http://wordnet.princeton.edu/.

  7. 7.

    http://globalwordnet.org/.

  8. 8.

    http://globalwordnet.org/global-wordnet-grid/.

  9. 9.

    http://www.cfilt.iitb.ac.in/indowordnet/.

  10. 10.

    http://wordnetweb.princeton.edu/perl/webwn.

  11. 11.

    http://www.cfilt.iitb.ac.in/wsd/annotated_corpus.

  12. 12.

    http://babelnet.org/.

References

  • Agirre, E., & Edmonds, P. (2006). Word sense disambiguation. New York: Springer.

    Google Scholar 

  • Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Machine Learning, 2(1), 1–127.

    Article  MATH  MathSciNet  Google Scholar 

  • Bhattacharyya, P. (2010). IndoWordNet. Lexical Resources Engineering Conference 2010 (LREC 2010), Malta.

    Google Scholar 

  • Bhingardive, S., Shaikh, S., & Bhattacharyya, P. (2013). Neighbor help: Bilingual unsupervised WSD using context. Sofia, Bulgaria: ACL.

    Google Scholar 

  • Bhattacharyya, P. (2012). Natural language processing: A perspective from computation in presence of ambiguity, resource constraint and multilinguality. CSI Journal of Computing, 1(2).

    Google Scholar 

  • Brown P. F., Pietra V. J. D., Pietra S. A. D., & Mercer R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263–311.

    Google Scholar 

  • Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. ANLP.

    Google Scholar 

  • Cruze, D. A. (1986). Lexical semantics. Cambridge: Cambridge University Press.

    Google Scholar 

  • DeRose, S. J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1), 31–39.

    Google Scholar 

  • Escudero, G., Màrquez, L., & Rigau, G. (2000). Naive bayes and exemplar-based approaches to word sense disambiguation revisited: European Conference on AI (pp. 421–425).

    Google Scholar 

  • Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Khapra, M., Shah, S., Kedia, P., & Bhattacharyya, P. (2009). Projecting parameters for multilingual word sense disambiguation. EMNLP.

    Google Scholar 

  • Khapra, M., Shah, S., Kedia, P., & Bhattacharyya, P. (2010). Domain-specific word sense disambiguation combining corpus based and wordnet based parameters. 5th International Conference on Global Wordnet, Mumbai, India.

    Google Scholar 

  • Khapra, M., Joshi, S., & Bhattacharyya, P. (2011a). It takes two to tango: A bilingual unsupervised approach for estimating sense distributions using expectation maximization. IJCNLP, Chiang Mai, Thailand.

    Google Scholar 

  • Khapra, M., Joshi, S., Chatterjee, A., & Bhattacharyya, P. (2011b). Together we can: Bilingual bootstrapping for WSD. Oregon, USA: ACL.

    Google Scholar 

  • Klein, D., Toutanova, K., Ilhan, H. T., Kamvar, S. D., & Manning, C. D. (2002). Combining heterogeneous classifers for word-sense disambiguation: Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions WSD’02 (Vol. 8, pp 74–80), Stroudsburg, PA: Association for Computational Linguistics.

    Google Scholar 

  • Lee, K. Y., Ng, H. T., & Chia T. K. (2004). Supervised word sense disambiguation with support vector machines and multiple knowledge sources: Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (pp. 137–140).

    Google Scholar 

  • Manning, C. D., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.

    Google Scholar 

  • Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Mohanty, R., Bhattacharyya, P., Pande, P., Kalele, S., Khapra, M., & Sharma, A. (2008). Synset based multilingual dictionary: Insights, applications and challenges. Global Wordnet Conference, Szeged, Hungary.

    Google Scholar 

  • Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.

    Google Scholar 

  • Ng H. T., & Lee H. B. (1996). Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach: Proceedings of the 34th annual meeting on Association for Computational Linguistics, Morristown, NJ, USA (pp. 40–47).

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McCleland (Eds.), Vol. 1, Chapter 8, Cambridge, MA: MIT Press.

    Google Scholar 

  • Sha, F., & Perreira, F. (2003). Shallow parsing with conditional random fields. HLT, NAACL.

    Google Scholar 

  • Vossen, P. (Ed.). (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht, Netherlands: Kluwer.

    MATH  Google Scholar 

  • Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French: Proceedings of the 32nd Annual Meeting of the association for Computational Linguistics (ACL), (pp. 88–95).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pushpak Bhattacharyya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Bhattacharyya, P. (2015). Multilingual Projections. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08043-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08042-0

  • Online ISBN: 978-3-319-08043-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics