Abstract
Languages of the world, though different, share structures and vocabulary. Today’s NLP depends crucially on annotation which, however, is costly, needing expertise, money and time. Most languages in the world fall far behind English, when it comes to annotated resources. Since annotation is costly, there has been worldwide effort at leveraging multilinguality in development and use of annotated corpora. The key idea is to project and utilize annotation from one language to another. This means parameters learnt from the annotated corpus of one language is made use of in the NLP of another language. We illustrate multilingual projection through the case study of word sense disambiguation (WSD) whose goal is to obtain the correct meaning of a word in the context. The correct meaning is usually denoted by an appropriate sense id from a sense repository, usually the wordnet. In this paper we show how two languages can help each other in their WSD, even when neither language has any sense marked corpus. The two specific languages chosen are Hindi and Marathi. The sense repository is the IndoWordnet which is a linked structure of wordnets of 19 major Indian languages from Indo-Aryan, Dravidian and Sino-Tibetan families. These wordnets have been created by following the expansion approach from Hindi wordnet. The WSD algorithm is reminiscent of expectation maximization. The sense distribution of either language is estimated through the mediation of the sense distribution of the other language in an iterative fashion. The WSD accuracy arrived at is better than any state of the art accuracy of all words general purpose unsupervised WSD.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
The PP “with a telescope” can get attached to either “saw” (‘I have the telescope’) or “the boy” (‘the boy has the telescope’).
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Agirre, E., & Edmonds, P. (2006). Word sense disambiguation. New York: Springer.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Machine Learning, 2(1), 1–127.
Bhattacharyya, P. (2010). IndoWordNet. Lexical Resources Engineering Conference 2010 (LREC 2010), Malta.
Bhingardive, S., Shaikh, S., & Bhattacharyya, P. (2013). Neighbor help: Bilingual unsupervised WSD using context. Sofia, Bulgaria: ACL.
Bhattacharyya, P. (2012). Natural language processing: A perspective from computation in presence of ambiguity, resource constraint and multilinguality. CSI Journal of Computing, 1(2).
Brown P. F., Pietra V. J. D., Pietra S. A. D., & Mercer R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263–311.
Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. ANLP.
Cruze, D. A. (1986). Lexical semantics. Cambridge: Cambridge University Press.
DeRose, S. J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1), 31–39.
Escudero, G., Màrquez, L., & Rigau, G. (2000). Naive bayes and exemplar-based approaches to word sense disambiguation revisited: European Conference on AI (pp. 421–425).
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.
Khapra, M., Shah, S., Kedia, P., & Bhattacharyya, P. (2009). Projecting parameters for multilingual word sense disambiguation. EMNLP.
Khapra, M., Shah, S., Kedia, P., & Bhattacharyya, P. (2010). Domain-specific word sense disambiguation combining corpus based and wordnet based parameters. 5th International Conference on Global Wordnet, Mumbai, India.
Khapra, M., Joshi, S., & Bhattacharyya, P. (2011a). It takes two to tango: A bilingual unsupervised approach for estimating sense distributions using expectation maximization. IJCNLP, Chiang Mai, Thailand.
Khapra, M., Joshi, S., Chatterjee, A., & Bhattacharyya, P. (2011b). Together we can: Bilingual bootstrapping for WSD. Oregon, USA: ACL.
Klein, D., Toutanova, K., Ilhan, H. T., Kamvar, S. D., & Manning, C. D. (2002). Combining heterogeneous classifers for word-sense disambiguation: Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions WSD’02 (Vol. 8, pp 74–80), Stroudsburg, PA: Association for Computational Linguistics.
Lee, K. Y., Ng, H. T., & Chia T. K. (2004). Supervised word sense disambiguation with support vector machines and multiple knowledge sources: Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (pp. 137–140).
Manning, C. D., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.
Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Mohanty, R., Bhattacharyya, P., Pande, P., Kalele, S., Khapra, M., & Sharma, A. (2008). Synset based multilingual dictionary: Insights, applications and challenges. Global Wordnet Conference, Szeged, Hungary.
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.
Ng H. T., & Lee H. B. (1996). Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach: Proceedings of the 34th annual meeting on Association for Computational Linguistics, Morristown, NJ, USA (pp. 40–47).
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McCleland (Eds.), Vol. 1, Chapter 8, Cambridge, MA: MIT Press.
Sha, F., & Perreira, F. (2003). Shallow parsing with conditional random fields. HLT, NAACL.
Vossen, P. (Ed.). (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht, Netherlands: Kluwer.
Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French: Proceedings of the 32nd Annual Meeting of the association for Computational Linguistics (ACL), (pp. 88–95).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Bhattacharyya, P. (2015). Multilingual Projections. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-08043-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)