Abstract
Supervised learning on a corpus-based Word Sense Disambiguation (WSD) system uses a previously classified set of linguistic contexts. In order to perform the training of the system, it is usual to define a set of functions that inform of any linguistic feature in each example. It is usual to look for the same kind of information for each word too, at least on words of the same part-of-speech.
In this paper, a study of feature selection in a supervised learning method of WSD based on corpus, Maximum Entropy conditional probability models, is presented. For a few words selected from the DSO corpus, the behaviour of several types of features has been analyzed in order to identify their contribution to gains in accuracy and to determine the influence of sense frequency in that corpus. This paper shows that not all words are better disambiguated with the same combination of features. Moreover, an improved definition of features in order to increase efficiency is presented as well.
This paper has been partially supported by the Spanish Government (CICYT) project number TIC2000-0664-C02-02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gerard Escudero, Lluis Màrquez, and German Rigau. Boosting applied to word sense disambiguation. In Proceedings of the 12th Conference on Machine Learning ECML2000, Barcelona, Spain, 2000.
Gerard Escudero, Lluis Màrquez, and German Rigau. On the portability and tuning of supervised word sense disambiguation systems. In Schütze and Su [10].
Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.
David Martínez and Eneko Agirre. One sense per collocation and genre/topic variations. In Schütze and Su [10].
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. Five Papers on WordNet. Special Issue of the International journal of lexicography, 3(4), 1993.
Hwee Tou Ng and Hian Beng Lee. Integrating multiple knowledge sources to disambiguate word senses: An exemplar-based approach. In Arivind Joshi and Martha Palmer, editors, Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, 1996. Morgan Kaufmann Publishers.
Ted Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In ACL, editor, Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, PA, USA, 2001.
Adwait Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, 1998.
Maximiliano Saiz-Noeda, Armando Suárez, and Manuel Palomar. Semantic pattern learning through maximum entropy-based wsd technique. In Proceedings of CoNLL-2001, pages 23–29. Toulouse, France, 2001.
Hinrich Schütze and Keh-Yih Su, editors. Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora 2000, Hong Kong, China, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suárez, A., Palomar, M. (2002). Feature Selection Analysis for Maximum Entropy-Based WSD. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_12
Download citation
DOI: https://doi.org/10.1007/3-540-45715-1_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43219-7
Online ISBN: 978-3-540-45715-2
eBook Packages: Springer Book Archive