Abstract
In this paper, a supervised learning system of word sense disambiguation is presented. It is based on conditional maximum entropy models. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. The system were evaluated both using WordNet’s senses and domains as the sets of classes of each word. Domain labels are obtained from the enrichment ofWordNet with subject field codes which produces a polysemy reduction. Several types of features has been analyzed for a few words selected from the DSO corpus. Using the domain enrichment of WordNet, a 7% of accuracy improvement is achieved.
This paper has been partially supported by the Spanish Government (CICYT) under project number TIC2000-0664-C02-02.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pedersen, T.: A decision tree of bigrams is an accurate predictor of word sense. In: Proceedings of the Second Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh (2001) 79–86.
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts (1999).
Preiss, J., Yarowsky, D., eds.: Proceedings of SENSEVAL-2. In Preiss, J., Yarowsky, D., eds.: Proceedings of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France, ACL-SIGLEX (2001).
Yarowsky, D.: Hierarchical decision lists for word sense disambiguation. Computers and the Humanities 34 (2000) 179–186.
Escudero, G., Màrquez, L., Rigau, G.: Boosting applied to word sense disambiguation. In: Proceedings of the 12th Conference on Machine Learning ECML2000, Barcelona, Spain (2000).
Pedersen, T.: A baseline methodology for word sense disambiguation. [18] 126–135.
García-Varea, I., Och, F.J., Ney, H., Casacuberta, F.: Refined lexicon models for statistical machine translation using a maximum entropy approach. In: Proceedings of 39th Annual Meeting of the Association for Computational Linguistics. (2001) 204–211.
Magnini, B., Strapparava, C.: Experiments in Word Domain Disambiguation for Parallel Texts. In: Proceedings of the ACL Workshop on Word Senses and Multilinguality, Hong Kong, China (2000).
Magnini, B., Strapparava, C., Pezzulo, G., Gliozzo, A.: Using Domain Information forWord Sense Disambiguation. [3] 111–114.
Montoyo, A., Palomar, M., Rigau, G.: WordNet Enrichment with Classification Systems. In Preiss, J., Yarowsky, D., eds.: Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, PA, USA (2001).
Ratnaparkhi, A.: Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, University of Pennsylvania (1998).
Suárez, A., Palomar, M.: Feature selection analysis for maximum entropy-based wsd. [18] 146–155.
Ng, H.T., Lee, H.B.: Integrating multiple knowledge sources to disambiguate word senses: An exemplar-based approach. In Joshi, A., Palmer, M., (Eds.): Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, Morgan Kaufmann Publishers (1996).
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Five Papers onWordNet. Special Issue of the International journal of lexicography 3 (1993).
Daude, J., Padro, L., Rigau, G.: Mapping wordnets using structural information. In: Proceedings of the 38th Anual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong (2000).
Lin, D.: Dependency-based evaluation of minipar. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation, Granada, Spain (1998).
Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10 (1998) 1895–1923.
Gelbukh, A.F.: Computational Linguistics and Intelligent Text Processing, Third International Conference, CICLing 2002, Mexico City, Mexico, February 17–23, 2002, Proceedings. In: Gelbukh, A.F., (Ed.): CICLing. Volume 2276 of Lecture Notes in Computer Science, Springer (2002).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suárez, A., Palomar, M. (2002). Word Sense vs. Word Domain Disambiguation: A Maximum Entropy Approach. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_17
Download citation
DOI: https://doi.org/10.1007/3-540-46154-X_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44129-8
Online ISBN: 978-3-540-46154-8
eBook Packages: Springer Book Archive