Skip to main content

A Quantitative Evaluation of Global Word Sense Induction

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Abstract

Word sense induction (WSI) is the task aimed at automatically identifying the senses of words in texts, without the need for handcrafted resources or annotated data. Up till now, most WSI algorithms extract the different senses of a word ‘locally’ on a per-word basis, i.e. the different senses for each word are determined separately. In this paper, we compare the performance of such algorithms to an algorithm that uses a ‘global’ approach, i.e. the different senses of a particular word are determined by comparing them to, and demarcating them from, the senses of other words in a full-blown word space model. We adopt the evaluation framework proposed in the SemEval-2010 Word Sense Induction & Disambiguation task. All systems that participated in this task use a local scheme for determining the different senses of a word. We compare their results to the ones obtained by the global approach, and discuss the advantages and weaknesses of both approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ide, N., Wilks, Y.: Making sense about sense. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation, Algorithms and Applications, pp. 47–73. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Harris, Z.: Distributional structure. Word, 146–162 (1954)

    Google Scholar 

  3. Manandhar, S., Klapaftis, I.P.: Semeval-2010 task 14: Evaluation setting for word sense induction & disambiguation systems. In: Proceedings of the NAACL HLT Workshop on Semantic Evaluations: Recent Achievements and Future Directions, Boulder, Colorado, pp. 117–122 (2009)

    Google Scholar 

  4. Manandhar, S., Klapaftis, I.P., Dligach, D., Pradhan, S.: Semeval-2010 task 14: Word sense induction &disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, Uppsala, Sweden, pp. 63–68 (2010)

    Google Scholar 

  5. Agirre, E., Soroa, A.: Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 7–12. ACL, Prague (2007)

    Chapter  Google Scholar 

  6. Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6, 1–28 (1991)

    Article  Google Scholar 

  7. Navigli, R.: Word sense disambiguation: a survey. ACM Computing Surveys 41, 1–69 (2009)

    Article  Google Scholar 

  8. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24, 97–123 (1998)

    Google Scholar 

  9. Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the Conference on Computational Natural Language Learning (CONLL), Boston, MA, pp. 41–48 (2004)

    Google Scholar 

  10. Pedersen, T., Bruce, R.: Distinguishing word senses in untagged text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, RI, pp. 197–207 (1997)

    Google Scholar 

  11. Bordag, S.: Word sense induction: Triplet-based clustering and automatic evaluation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, pp. 137–144 (2006)

    Google Scholar 

  12. Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING), Taipei, Taiwan, pp. 1093–1099 (2002)

    Google Scholar 

  13. Véronis, J.: Hyperlex: lexical cartography for information retrieval. Computer Speech & Language 18, 223–252 (2004)

    Article  Google Scholar 

  14. Agirre, E., Martínez, D., de Lacalle, O.L., Soroa, A.: Two graph-based algorithms for state-of-the-art wsd. In: Proceedings of the Empirical Methods in Natural Language Processing (EMNLP) Conference, Sydney, Australia, pp. 585–593 (2006)

    Google Scholar 

  15. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, Quebec, Canada, vol. 2, pp. 768–774 (1998)

    Google Scholar 

  16. Pantel, P., Lin, D.: Discovering word senses from text. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 613–619 (2002)

    Google Scholar 

  17. Van de Cruys, T.: Using three way data for word sense discrimination. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, pp. 929–936 (2008)

    Google Scholar 

  18. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2000)

    Google Scholar 

  19. Church, K.W., Hanks, P.: Word association norms, mutual information & lexicography. Computational Linguistics 16, 22–29 (1990)

    Google Scholar 

  20. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC), pp. 63–70 (2000)

    Google Scholar 

  21. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the Human Language Technology / North American Association for Computational Linguistics conference (HLT-NAACL), pp. 252–259 (2003)

    Google Scholar 

  22. Nivre, J., Hall, J., Nilsson, J.: Maltparser: A data-driven parser-generator for dependency parsing. In: Proceedings of the Language Resources and Evaluation Conference (LREC), Genoa, Italy, pp. 2216–2219 (2006)

    Google Scholar 

  23. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proceedings of the Human Language Technology / North American Association for Computational Linguistics conference (HLT-NAACL), Companion Volume: Short Papers on XX, New York, NY, pp. 57–60 (2006)

    Google Scholar 

  24. Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the Joint 2007 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 410–420 (2007)

    Google Scholar 

  25. Artiles, J., Amigó, E., Gonzalo, J.: The role of named entities in web people search. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 534–542 (2009)

    Google Scholar 

  26. Pedersen, T.: Duluth-wsi: Senseclusters applied to the sense induction task of semeval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 363–366. Association for Computational Linguistics, Uppsala (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Apidianaki, M., Van de Cruys, T. (2011). A Quantitative Evaluation of Global Word Sense Induction. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19400-9_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19399-6

  • Online ISBN: 978-3-642-19400-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics