Feature Selection Analysis for Maximum Entropy-Based WSD

Suárez, Armando; Palomar, Manuel

doi:10.1007/3-540-45715-1_12

Armando Suárez⁵ &
Manuel Palomar⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2276))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1509 Accesses
2 Citations

Abstract

Supervised learning on a corpus-based Word Sense Disambiguation (WSD) system uses a previously classified set of linguistic contexts. In order to perform the training of the system, it is usual to define a set of functions that inform of any linguistic feature in each example. It is usual to look for the same kind of information for each word too, at least on words of the same part-of-speech.

In this paper, a study of feature selection in a supervised learning method of WSD based on corpus, Maximum Entropy conditional probability models, is presented. For a few words selected from the DSO corpus, the behaviour of several types of features has been analyzed in order to identify their contribution to gains in accuracy and to determine the influence of sense frequency in that corpus. This paper shows that not all words are better disambiguated with the same combination of features. Moreover, an improved definition of features in order to increase efficiency is presented as well.

This paper has been partially supported by the Spanish Government (CICYT) project number TIC2000-0664-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gerard Escudero, Lluis Màrquez, and German Rigau. Boosting applied to word sense disambiguation. In Proceedings of the 12th Conference on Machine Learning ECML2000, Barcelona, Spain, 2000.
Google Scholar
Gerard Escudero, Lluis Màrquez, and German Rigau. On the portability and tuning of supervised word sense disambiguation systems. In Schütze and Su [10].
Google Scholar
Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.
MATH Google Scholar
David Martínez and Eneko Agirre. One sense per collocation and genre/topic variations. In Schütze and Su [10].
Google Scholar
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. Five Papers on WordNet. Special Issue of the International journal of lexicography, 3(4), 1993.
Google Scholar
Hwee Tou Ng and Hian Beng Lee. Integrating multiple knowledge sources to disambiguate word senses: An exemplar-based approach. In Arivind Joshi and Martha Palmer, editors, Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, 1996. Morgan Kaufmann Publishers.
Google Scholar
Ted Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In ACL, editor, Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, PA, USA, 2001.
Google Scholar
Adwait Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, 1998.
Google Scholar
Maximiliano Saiz-Noeda, Armando Suárez, and Manuel Palomar. Semantic pattern learning through maximum entropy-based wsd technique. In Proceedings of CoNLL-2001, pages 23–29. Toulouse, France, 2001.
Google Scholar
Hinrich Schütze and Keh-Yih Su, editors. Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora 2000, Hong Kong, China, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Armando Suárez & Manuel Palomar

Authors

Armando Suárez
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Palomar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CIC Centro de Investigacion en Computacion, IPN Instituto Politecnico Nacional, Col Zacateno, CP 07738, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suárez, A., Palomar, M. (2002). Feature Selection Analysis for Maximum Entropy-Based WSD. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_12

Download citation

DOI: https://doi.org/10.1007/3-540-45715-1_12
Published: 05 February 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43219-7
Online ISBN: 978-3-540-45715-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics