Boosting Applied to Word Sense Disambiguation

Escudero, Gerard; Màrquez, Lluís; Rigau, German

doi:10.1007/3-540-45164-1_14

Boosting Applied to Word Sense Disambiguation

Gerard Escudero⁴,
Lluís Màrquez⁴ &
German Rigau⁴

Conference paper
First Online: 01 January 2003

1581 Accesses
40 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1810))

Abstract

In this paper Schapire and Singer’s AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.

This research has been partially funded by the Spanish Research Department (CICYT’s BASURDE project TIC98-0423-C06) and by the Catalan Research Department (CIRIT’s consolidated research group 1999SGR-150, CREL’s Catalan WordNet project and CIRIT’s grant 1999FI 00773).

Download to read the full chapter text

Chapter PDF

References

Abney, S., Schapire, R.E. and Singer, Y.: Boosting Applied to Tagging and PP-attachment. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999. 130, 132, 134, 139
Google Scholar
Bauer, E. and Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants. Machine Learning Journal. Special issue on IMLM for Improving and Scaling Machine Learning Algorithms, 1999. 130
Google Scholar
Breiman, L.: Arcing Classifiers. The Annals of Statistics, 26(3), 1998.
Google Scholar
Duda, R. O. and Hart, P. E.: Pattern Classification and Scene Analysis. Wiley, New York, 1973. 134
MATH Google Scholar
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning (to appear). 130
Google Scholar
Engelson, S.P. and Dagan, I.: Minimizing Manual Annotation Cost in Supervised Training from Corpora. In S. Wermter, E. Riloff and G. Scheler, editors, Connectionist, Statistical an Symbolic Approaches to Learning for Natural Language Processing, LNAI, 1040. Springer, 1996. 130
Google Scholar
Escudero, G., Màrquez, L. and Rigau, G. Boosting Applied to Word Sense Disambiguation. Technical Report LSI-00-3-R, LSI Department, UPC, 2000. 134, 139
Google Scholar
Freund, Y. and Schapire, R.E.: Experiments with a New Boosting Algorithm. In Procs. of the 13th International Conference on Machine Learning, ICML, 1996. 130
Google Scholar
Freund, Y. and Schapire, R.E.: A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 1997. 130
Google Scholar
Fujii, A., Inui, K., Tokunaga, T. and Tanaka, H.: Selective Sampling for Example-based Word Sense Disambiguation. Computational Linguistics, 24(4), ACL, 1998. 130
Google Scholar
Ide, N. and Véronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics, 24(1), ACL, 1998. 129
Google Scholar
Leacock, C., Chodorow, M. and Miller, G.A.: Using Corpus Statistics and Word-Net Relations for Sense Identification. Computational Linguistics, 24(1), ACL, 1998. 130
Google Scholar
López de Mántaras, R.: A Distance-based Attribute Selection Measure for Decision Tree Induction. Machine Learning, 6(1), 1991. 136
Google Scholar
Màrquez, L.: Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees. Phd. Thesis, LSI Department, UPC, 1999. 137
Google Scholar
Mihalcea, R. and Moldovan, I.: An Automatic Method for Generating Sense Tagged Corpora. In Proceedings of the 16th National Conference on Artificial Intelligence, AAAI, 1999. 130
Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K.: Five Papers on WordNet. Special Issue of International Journal of Lexicography, 3(4), 1990. 130
Google Scholar
Mooney, R.J.: Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning. In Proceedings of the 1st Conference on Empirical Methods in Natural Language Processing, EMNLP, 1996. 130
Google Scholar
Ng, H.T. and Lee, H.B.: Integrating Multiple Knowledge Sources to Disambiguate Word Senses: An Exemplar-based Approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, ACL, 1996. 133
Google Scholar
Ng, H.T.: Exemplar-based Word Sense Disambiguation: Some Recent Improvements. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing, EMNLP, 1997. 130, 134, 135
Google Scholar
Ng, H.T.: Getting Serious about Word Sense Disambiguation. In Proceedings of the SIGLEX Workshop “Tagging Text with Lexical Semantics: Why, What and How”, 1997. 130
Google Scholar
Ng, H.T., Chung, Y. L. and Shou, K. F.: A Case Study on Inter-Annotation Agreement for WSD. In Proceedings of the SIGLEX Workshop “Standardizing Lexical Resources”, Maryland, USA, 1999. 133
Google Scholar
Pedersen, T. and Bruce, R.: Knowledge Lean Word-Sense Disambiguation. In Proceedings of the 15th National Conference on Artificial Intelligence, 1998. 130
Google Scholar
Quinlan, J.R.: Bagging, Boosting and C4.5. In Proceedings of the 13th National Conference on Artificial Intelligence, AAAI, 1996. 130
Google Scholar
Samuel, K.: Lazy Transformation-Based Learning. In Proceedings of the 11th International Florida AI Research Symposium Conference, 1998. 137
Google Scholar
Schapire, R.E. and Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning (to appear). 130, 131, 132, 133
Google Scholar
Schapire, R.E. and Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning (to appear). 130, 131, 133, 139
Google Scholar
Towell, G. and Voorhees, E.M.: Disambiguating Highly Ambiguous Words. Computational Linguistics, 24(1), ACL, 1998. 130
Google Scholar
Yarowsky, D.: Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, ACL, 1994. 130
Google Scholar

Download references

Author information

Authors and Affiliations

TALP Research Center, LSI Department, Universitat Politècnica de Catalunya (UPC), Jordi Girona Salgado 1-3, E-08034, Barcelona. Catalonia
Gerard Escudero, Lluís Màrquez & German Rigau

Authors

Gerard Escudero
View author publications
You can also search for this author in PubMed Google Scholar
Lluís Màrquez
View author publications
You can also search for this author in PubMed Google Scholar
German Rigau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut d’Investigació en Intelligència Artificial, IIIA, Spanish Council for Scientific Research, CSIC, Campus, U.A.B., 08193, Bellaterra, Catalonia, Spain
Ramon López de Mántaras & Enric Plaza &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Escudero, G., Màrquez, L., Rigau, G. (2000). Boosting Applied to Word Sense Disambiguation. In: López de Mántaras, R., Plaza, E. (eds) Machine Learning: ECML 2000. ECML 2000. Lecture Notes in Computer Science(), vol 1810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45164-1_14

Download citation

DOI: https://doi.org/10.1007/3-540-45164-1_14
Published: 14 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67602-7
Online ISBN: 978-3-540-45164-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics