Abstract
Data sparseness is a major problem in word sense disambiguation. Automatic sample acquisition and smoothing are two ways that have been explored to alleviate the influence of data sparseness. In this paper, we consider a combination of these two methods. Firstly, we propose a pattern-based way to acquire pseudo samples, and then we estimate conditional probabilities for variables by combining pseudo data set with sense tagged data set. By using the combinational estimation, we build an appropriate leverage between the two different data sets, which is vital to achieve the best performance. Experiments show that our approach brings significant improvement for Chinese word sense disambiguation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agirre, E., Martinez, D.: Exploring Automatic Word Sense Disambiguation With Decision Lists and the Web. In: Proceedings of the Semantic Annotation And Intelligent Annotation workshop organized by COLING, Luxembourg (2000)
Diab, M., Resnik, P.: An Unsupervised Method for Word Sense Tagging using Parallel Corpora. In: Proceedings of ACL2002, pp. 255–262 (2002)
Zhendong Dong (2000), http://www.keenage.com/
Gale, W.W., Church, K.W., Yarowsky, D.: A Method for Disambiguating Word Senses in a Large Corpus. Computers and Humanities 26, 415–439 (1992)
Ide, N., Veronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)
Karov, Y., Edelman, S.: Similarity-based Word Sense Disambiguation. Computational Linguistics 24(1), 41–59 (1998)
Leacook, C., Chodorow, M., Miller, G.A.: Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics 24(1), 147–166 (1998)
Li, C., Li, H.: Word Translation Disambiguation Using Bilingual Bootstrapping. In: Proceedings of ACL 2002, pp. 343–351 (2002)
Luk, A.K.: Statistical sense disambiguation with relatively small corpora using dictionary definition. In: Proceedings of ACL 1995, pp. 181–188 (1995)
Mihalcea, R., Moldovan, D.: An Automatic Method for Generating Sense Tagged Corpora. In: Proceedings of AAAI 1999, Orlando, FL, July 1999, pp. 461–466 (1999)
Mihalcea, R.: Bootstrapping Large Sense Tagged Corpora. In: Proceedings of the 3rd International Conference on Languages Resources and Evaluations LREC 2002, Las Palmas, Spain (May 2002)
Ng, H.T.: Exemplar-Based Word Sense Disambiguation: Some Recent Improvements. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Providence, Rhode Island, USA, pp. 208–213 (1997)
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Method. In: Proceedings of ACL 1995, pp. 189–196 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Matsumoto, Y. (2005). Improving Word Sense Disambiguation by Pseudo-samples. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)