Abstract
The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the F-measure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.
Support by National Grant Fundamental Research 973 Program of China Under Grant No. 2004CB318102.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P.F., Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based N-gram Models of Natural Language. Computational Linguistics 4, 467–479 (1992)
Chao, G., Dyer, G.M.: Maximum Entropy Models for Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 155–161 (2002)
Dagan, D., Itai, A.: Word Sense Disambiguation Using a Second Language Monolingual Corpus. Computational Linguistics. 4, 563–596 (1994)
Dang, H.T., Chia, C., Palmer, M., Chiou, F.D., Rosenzweig, J.: Simple Features for Chinese Word Sense Disambiguation. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 204–211 (2002)
Gelbukh, A., Sidorov, G., Han, S.-Y., Hernández-Rubio, E.: Automatic Enrichment of a Very Large Dictionary of Word Combinations on the Basis of Dependency Formalism. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 430–437. Springer, Heidelberg (2004)
Kim, S.B., Seo, H.C., Rim, H.C.: Information Retrieval Using Word Senses: Root Sense Tagging Approach. In: SIGIR’04, Sheffield, South Yorkshire, UK, pp. 258–265 (2004)
Lee, H.A., Kim, G.C.: Translation Selection through Source Word Sense Disambiguation and Target Word Selection. In: Proceedings of the 19th International, Taipei, Taiwan (2002)
Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised Word Sense Disambiguation with Support Vector Machines and Multiple Knowledge Sources. In: Proceedings of SENSEVAL-3: Third International Workshop on the Evaluating Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)
Li, H.: Word Clustering and Disambiguation Based on Co-occurrence Data. Natural Language Engineering 8, 25–42 (2002)
Li, W.Y., Lu, Q., Li, W.J.: Integrating Collocation Features in Chinese Word Sense Disambiguation. In: Proceeding of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 87–94 (2005)
Martin, S., Liermann, J., Ney, K.: Algorithms for Bigram and Trigram Word Clustering. Speech Communication 1, 19–37 (1998)
Och, F.J.: An Efficient Method for Determining Bilingual Word Classes. In: Proceeding of the Ninth Conference of the European Chapter of the Association for Computational Linguistics, pp. 71–76 (1999)
Pedersen, T.A: Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation. In: Proceeding of the first Annul Meeting of the North American Chapter for Computational Linguistics, pp. 63–69 (2000)
Stokoe, C., Oakes, M.P., Tait, J.: Word Sense Disambiguation in Information Retrieval Revisited. In: Proceeding of the 26th annual International ACM SIGIR conference On research and development in Information retrieval, ACM Press, New York (2003)
Yarowsky, D.: One Sense Per Collocation. In: Proceeding of ARPA Human Language Technology workshop, Princeton, New Jersey (1993)
Yarowsky, D.: Hierarchical Decision Lists for Word Sense Disambiguation, Computers and the Humanities. Computers and the Humanities 1, 179–186 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, P., Sun, X., Wu, Y., Yu, S. (2007). Word Clustering for Collocation-Based Word Sense Disambiguation . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)