Implementation of Word Sense Disambiguation on Hadoop Using Map-Reduce

  • Anuja Nair
  • Kaushik Kyada
  • Neel Zadafiya
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 106)


In Natural Language Processing, it is essential to find a correct sense of sentences or document is written for these type of problem is called word sense disambiguation problem. Currently, any machine learning application based on natural language processing requires to solve this type of problem. To identify the correct sense, pywsd (Python implementations of word sense disambiguation technologies) is used, which consists of different lesk algorithms, maximizing similarity tools, superwised WSD, and vector space models. Using simple lesk algorithm of pywsd, WSD is done on a given document and it is established in the Hadoop environment. Implementation on multinode Hadoop environment helps majorly to reduce the complexity of the application. Also, Map-Reduce is a parallel programming environment, which reduces the response time of the implemented application.


Word sense disambiguation Lesk algorithm Multinode Hadoop Map-Reduce Pywsd 


  1. 1.
    Zhong, Z., Ng, H.T.: Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, pp. 273–282 (2012)Google Scholar
  2. 2.
    Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2), 69. Article 10 (2009)Google Scholar
  3. 3.
    Navigli, R., Velardi, P.: Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. (2005)Google Scholar
  4. 4.
    Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (eds.) Computational Linguistics and Intelligent Text Processing, CICLing. Lecture Notes in Computer Science, vol. 2276. Springer, Berlin, Heidelberg (2002)Google Scholar
  5. 5.
    Niu, Z.Y., Ji, D., Tan, C.L., Yang, L.: Word sense disambiguation by semi-supervised learning. In: Gelbukh, A. (eds.) Computational Linguistics and Intelligent Text Processing, CICLing. Lecture Notes in Computer Science, vol. 3406. Springer, Berlin, Heidelberg (2002)Google Scholar
  6. 6.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (ACL’95). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 189–196Google Scholar
  7. 7.
    David, Y.: Word sense disambiguation using statistical models of Roget’s categories trained on large corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COL-ING), Nantes, France, pp. 454–460 (1992)Google Scholar
  8. 8.
    Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, vol. 2 (2001)Google Scholar
  9. 9.
    Mihalcea, R., Faruque, E.: SenseLearner: minimally supervised word sense disambiguation for all words in open text. In: Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain, July 2004Google Scholar
  10. 10.
    Basile, P., Caputo, A., Semeraro, G.: An enhanced lesk word sense disambiguation algorithm through a distributional semantic model. In: COLING (2014)Google Scholar
  11. 11.
    Fellbaum, C.: WordNet. Wiley, Inc. (1998)Google Scholar
  12. 12.
    Shvachko, K., et al.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE (2010)Google Scholar
  13. 13.
    Pakize, S.R.: A comprehensive view of Hadoop MapReduce scheduling algorithms. Int. J. Comput. Netw. Commun. Secur. 2(9), 308–317 (2014)Google Scholar
  14. 14.
    Zaharia, M., et al.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8 (2008)Google Scholar
  15. 15.
    Khokhar, A.A., et al.: Heterogeneous computing: challenges and opportunities. Computer 26(6), 18–27 (1993)CrossRefGoogle Scholar
  16. 16.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Institute of Technology, Nirma UniversityAhmedabadIndia

Personalised recommendations