Unsupervised Corpus-Based Methods for WSD

Pedersen, Ted

doi:10.1007/978-1-4020-4809-8_6

Ted Pedersen⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 33))

987 Accesses
25 Citations

This chapter focuses on unsupervised corpus-based methods of word sense discrimination that are knowledge-lean, and do not rely on external knowledge sources such as machine readable dictionaries, concept hierarchies, or sense-tagged text. They do not assign sense tags to words; rather, they discriminate among word meanings based on information found in unannotated corpora. This chapter reviews distributional approaches that rely on monolingual corpora and methods based on translational equivalence as found in word-aligned parallel corpora. These techniques are organized into type- and token-based approaches. The former identify sets of related words, while the latter distinguish among the senses of a word used in multiple contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, Satanjeev & Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, 805-810.
Google Scholar
Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra & Robert L. Mercer. 1991. Word-sense disambiguation using statistical methods proceedings of the 29th Meeting of the Association for Computational Linguistics (ACL), Berkeley, U.S.A., 264-270.
Google Scholar
Bruce, Rebecca & Janyce Wiebe. 1994. Word sense disambiguation using decomposable models. Proceedings of the 32nd Meeting of the Association for Computational Linguistics, Las Cruces, U.S.A., 139-146.
Chapter Google Scholar
Buitelaar, Paul, Jan Alexandersson, Tilman Jaeger, Stephan Lesch, Norbert Pfleger, Diana Raileanu, Tanja von den Berg, Kerstin Klöckner, Holger Neis & Hubert Schlarb. 2001. An unsupervised semantic tagger applied to German. Proceedings of the Conference on Recent Advances in Natural Language Processing, Tzigov Chark, Bulgaria, 52-57.
Google Scholar
Burgess, Curt & Kevin Lund. 1997. Modeling parsing constraints with high- dimensional context space. Language and Cognitive Processes, 12(2-3): 177-210.
Google Scholar
Burgess, Curt & Kevin Lund. 2000. The dynamics of meaning in memory. Cognitive Dynamics: Conceptual Representational Change in Humans and Machines, ed. by Eric Dietrich and Arthur Markman, 117-156. Mahmah, U.S.A.: Lawrence Erlbaum Associates.
Google Scholar
Carpuat, Marine & Dekai Wu. 2005. Word sense disambiguation vs. statistical machine translation. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, U.S.A., 387-394.
Google Scholar
Chklovski, Tim, Rada Mihalcea, Ted Pedersen & Amruta Purandare. 2004. The Senseval-3 multilingual English-Hindi lexical sample task. Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, 5-8.
Google Scholar
Cutting, Douglas, Jan Pedersen, David Karger & John Tukey. 1992. Scatter/ Gather: A cluster-based approach to browsing large document collections. Proceedings of the 15th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), Copenhagen, Denmark, 318-329.
Chapter Google Scholar
Dagan, Ido, Alon Itai & Ulrike Schwall. 1991. Two languages are more informative than one. Proceedings of the 29th Meeting of the Association for Computational Linguistics, Berkeley, U.S.A, 130-137.
Chapter Google Scholar
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer & Richard Harshman. 1990. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6): 391-407.
Article Google Scholar
Dempster, Arthur P., Nam M. Laird & Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39: 1-38.
Google Scholar
Furnas, George W., Scott Deerwester, Susan T. Dumais, Thomas K. Landauer, Richard Harshman, L. A. Streeter & K. E. Lochbaum. 1988. Information retrieval using a Singular Value Decomposition model of latent semantic structure. Proceedings of the 11th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), Grenoble, France, 465-480.
Chapter Google Scholar
Gale, William, Kenneth W. Church & David Yarowsky. 1992a. Using bilingual materials to develop word sense disambiguation methods. Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, Canada, 101-112.
Google Scholar
Gale, William, Kenneth W. Church & David Yarowsky. 1992b. A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26 (5): 415-439.
Article Google Scholar
Hanks, Patrick. 2000. Do word meanings exist? Computers and the Humanities. 34(1-2): 205-215.
Article Google Scholar
Harris, Zellig. 1968. Mathematical structures of language. New York: Interscience Publishers.
Google Scholar
Jiang, Jay & David Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. International Conference on Research in Computational Linguistics, Taipei, Taiwan, 19-33.
Google Scholar
Kilgarriff, Adam. 1997. “I don’t believe in word senses”. Computers and the Humanities, 31(2): 91-113.
Article Google Scholar
Landauer, Thomas K. & Susan T. Dumais. 1997. A solution to Plato’s problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104: 211-240.
Article Google Scholar
Landauer, Thomas K., Peter W. Foltz & Darrell Laham. 1998. An introduction to Latent Semantic Analysis. Discourse Processes, 25: 259-284.
Article Google Scholar
Leacock, Claudia, Geoff Towell & Ellen Voorhees. 1993. Corpus based statistical sense resolution. Proceedings of the ARPA Workshop on Human Language Technology, Plainsboro, U.S.A., 260-265.
Chapter Google Scholar
Lin, Dekang. 1998. Automatic retrieval and clustering of similar words. Proceedings of the 17th International Joint Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (IJCAI/ACL), Montreal, Canada, 768-774.
Google Scholar
Lin, Dekang & Patrick Pantel. 2002. Concept discovery from text. Proceedings of the 19th International Conference on Computational Linguistics (COLING), Taipei, Taiwan, 577-583.
Google Scholar
Martin, Joel, Rada Mihalcea & Ted Pedersen. 2005. Word alignment for languages with scarce resources. Proceedings of the ACL Workshop on Building and Using Parallel Texts, Ann Arbor, U.S.A., 65-74.
Chapter Google Scholar
McCarthy, Diana, Rob Koeling, Julie Weeds & John Carroll. 2004. Finding predominant senses in untagged text. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barclona, Spain, 577-583.
Google Scholar
McQuitty, Louis. 1966. Similarity analysis by reciprocal pairs for discrete and continuous data. Educational and Psychological Measurement, 26: 825-831.
Article Google Scholar
Mihalcea, Rada & Ted Pedersen. 2003. An evaluation exercise for word alignment. Proceedings of the HLT/NAACL Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, Edmonton, Canada, 1-10.
Chapter Google Scholar
Miller, George & Walter Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1): 1-28.
Article Google Scholar
Ng, Hwee Tou, Bin Wang & Yee Seng Chan. 2003. Exploiting parallel texts for word sense disambiguation: An empirical study. Proceedings of the 41^st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, 455-462.
Google Scholar
Pedersen, Ted & Rebecca Bruce. 1997. Distinguishing word senses in untagged text. Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Providence, U.S.A., 197-207.
Google Scholar
Pedersen, Ted & Rebecca Bruce. 1998. Knowledge lean word sense disambiguation. Proceedings of the 15th National Conference on Artificial Intelligence, Madison, U.S.A., 800-805.
Google Scholar
Purandare, Amruta & Ted Pedersen. 2004. Word sense discrimination by clustering contexts in vector and similarity spaces. Proceedings of the Conference on Computational Natural Language Learning, Boston, U.S.A., 41-48.
Google Scholar
Resnik, Philip. 1997. Selectional preference and sense disambiguation. Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, U.S.A., 52-57.
Google Scholar
Resnik, Philip & David Yarowsky. 1997. A perspective on word sense disambiguation methods and their evaluation. ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, U.S.A., 79-86.
Google Scholar
Rigau, German, Jordi Atserias & Eneko Agirre. 1997. Combining unsupervised lexical knowledge methods for word sense disambiguation. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL), Madrid, Spain, 48-55.
Chapter Google Scholar
Schütze, Hinrich. 1998. Automatic word sense discrimination. Computational Linguistics, 24(1): 97-123.
Google Scholar
Turney, Peter. 2001. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany, 491-502.
Google Scholar
Ward, J. 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58: 236-244.
Article Google Scholar
Yarowsky, David. 1993. One sense per collocation. Proceedings of the ARPA Workshop Human Language Technology, Plainsboro, U.S.A., 265-271.
Google Scholar
Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL), Cambridge, U.S.A., 189-196.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Minnesota, 1114 Kirby Drive, 55812, Duluth, MN, USA
Associate Professor Ted Pedersen

Authors

Associate Professor Ted Pedersen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of the Basque Country, Manuel de Lardizabal 1, E-20018, Donostia, Basque Country, Spain
Eneko Agirre
Sharp Laboratories of Europe Limited, Oxford Science Park, OX4 4GB, Oxford, UK
Philip Edmonds

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pedersen, T. (2007). Unsupervised Corpus-Based Methods for WSD. In: Agirre, E., Edmonds, P. (eds) Word Sense Disambiguation. Text, Speech and Language Technology, vol 33. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4809-8_6

Download citation

DOI: https://doi.org/10.1007/978-1-4020-4809-8_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4808-1
Online ISBN: 978-1-4020-4809-8
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics