Semi-supervised Word Sense Disambiguation Using the Web as Corpus

Guzmán-Cabrera, Rafael; Rosso, Paolo; Montes-y-Gómez, Manuel; Villaseñor-Pineda, Luis; Pinto-Avendaño, David

doi:10.1007/978-3-642-00382-0_21

Rafael Guzmán-Cabrera^17,18,
Paolo Rosso¹⁸,
Manuel Montes-y-Gómez¹⁹,
Luis Villaseñor-Pineda¹⁹ &
…
David Pinto-Avendaño²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1775 Accesses
2 Citations

Abstract

As any other classification task, Word Sense Disambiguation requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using a Web-based approach for determining the correct sense of an ambiguous word based only in its surrounding context. In particular, we propose a semi-supervised method that is specially suited to work with just a few training examples. The method considers the automatic extraction of unlabeled examples from the Web and their iterative integration into the training data set. The experimental results, obtained over a subset of ten nouns from the SemEval lexical sample task, are encouraging. They showed that it is possible to improve the baseline accuracy of classifiers such as Naïve Bayes and SVM using some unlabeled examples extracted from the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aguirre, E., Rigau, G.: A Proposal for Word Sense Disambiguation using Conceptual Distance. In: Proc. of the Int. Conf. on Recent Advances in NLP. RANLP 1995 (1995)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. COLT, pp. 92–100 (1998)
Google Scholar
Buscaldi, D., Rosso, P.: A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographical Information Science 22(3), 143–153 (2008)
Article Google Scholar
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proc. ICML, pp. 327–334 (2000)
Google Scholar
Guzmán-Cabrera, R., Montes-y-Gómez, M., Rosso, P., Villaseñor-Pineda, L.: Using the Web as Corpus for Self-training Text Categorization. Journal of Information Retrieval (forthcoming, 2009) ISSN 1386-4564
Google Scholar
Ide, N., Veronis, J.: Introduction to the special Issue on word sense disambiguation: the state of the art, Computational Linguistics. Special Issue on word sense Disambiguation 24(1), 1–40 (1998)
Google Scholar
Kilgarriff, A., Greffenstette, G.: Introduction to the Special Issue on Web as Corpus. Computational Linguistics 29(3), 1–15 (2003)
Article MathSciNet Google Scholar
Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proc. EMNLP, pp. 41–48 (2002)
Google Scholar
Mihalcea, R.: Co-training and Self-training for Word Sense Disambiguation. In: Proc. CoNLL, pp. 33–40 (2004)
Google Scholar
Pham, T.P., Ng, H.T., Lee, W.S.: Word Sense Disambiguation with Semi-Supervised Learning. In: Proc. AAAI, pp. 1093–1098 (2005)
Google Scholar
Pinto, D.: On Clustering and Evaluation of Narrow Domain Short-Text Corpora. PhD thesis, Universidad Politécnica de Valencia, Spain (2008)
Google Scholar
Solorio, T.: Using unlabeled data to improve classifier accuracy. M.Sc. thesis, Computer Science Department, INAOE, Mexico (2002)
Google Scholar
Su, W., Carpuat, M., Wu, D.: Semi-Supervised Training of a Kernel PCA-Based Model for Word Sense Disambiguation. In: Proc. COLING, pp. 1298–1304 (2004)
Google Scholar
Tratz, S., Sanfilippo, A., Gregory, M., Chappell, A., Posse, C., Paul, W.: PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval. 2007), pp. 264–267 (2007)
Google Scholar
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. ACL, pp. 189–196 (1995)
Google Scholar
Yu, N.Z., Hong, J.D., Lim, T.C.: Word Sense Disambiguation Using Label Propagation Based Semi-supervised Learning Method. In: Proc. ACL, pp. 395–402 (2005)
Google Scholar
Zelikovitz, S., Kogan, M.: Using Web Searches on Important Words to Create Background Sets for LSI Classification. In: 19th Int. FLAIRS Conf., Melbourne Beach, Florida (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

FIMEE, Universidad de Guanajuato, Mexico
Rafael Guzmán-Cabrera
NLE Lab, DSIC, Universidad Politécnica de Valencia, Spain
Rafael Guzmán-Cabrera & Paolo Rosso
LabTL, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico
Manuel Montes-y-Gómez & Luis Villaseñor-Pineda
FCC, Benemérita Universidad Autónoma de Puebla, Mexico
David Pinto-Avendaño

Authors

Rafael Guzmán-Cabrera
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Luis Villaseñor-Pineda
View author publications
You can also search for this author in PubMed Google Scholar
David Pinto-Avendaño
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guzmán-Cabrera, R., Rosso, P., Montes-y-Gómez, M., Villaseñor-Pineda, L., Pinto-Avendaño, D. (2009). Semi-supervised Word Sense Disambiguation Using the Web as Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-00382-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics