The Role of Word Sense Disambiguation in Automated Text Categorization

Gómez Hidalgo, José María; de Buenaga Rodríguez, Manuel; Cortizo Pérez, José Carlos

doi:10.1007/11428817_27

The Role of Word Sense Disambiguation in Automated Text Categorization

José María Gómez Hidalgo¹⁹,
Manuel de Buenaga Rodríguez¹⁹ &
José Carlos Cortizo Pérez²⁰

Conference paper

1378 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Abstract

Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate automatic classifiers by using Information Retrieval and Machine Learning Techniques. However, performance of this approach is damaged by the problems derived from language variation (specially polysemy and synonymy). We investigate how Word Sense Disambiguation can be used to alleviate these problems, by using two traditional methods for thesaurus usage in Information Retrieval, namely Query Expansion and Concept Indexing. These methods are evaluated on the problem of using the Lexical Database WordNet for text categorization, focusing on the Word Sense Disambiguation step involved. Our experiments demonstrate that rather simple dictionary methods, and baseline statistical approaches, can be used to disambiguate words and improve text representation and learning in both Query Expansion and Concept Indexing approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhdanova, A.V., Shishkin, D.V.: Classification of email queries by topic: Approach based on hierarchically structured subject domain. In: Yin, H., Allinson, N.M., Freeman, R., Keane, J.A., Hubbard, S. (eds.) IDEAL 2002. LNCS, vol. 2412, pp. 99–104. Springer, Heidelberg (2002)
Chapter Google Scholar
Mladenić, D.: Turning Yahoo! into an automatic Web page classifier. In: Prade, H. (ed.) Proceedings of ECAI 1998, 13th European Conference on Artificial Intelligence, Brighton, UK, pp. 473–474. John Wiley and Sons, Chichester (1998)
Google Scholar
Gómez, J.: Evaluating cost-sensitive unsolicited bulk email categorization. In: Proceedings of SAC 2002, 17th ACM Symposium on Applied Computing, Madrid, ES, pp. 615–620 (2002)
Google Scholar
Hepple, M., Ireson, N., Allegrini, P., Marchi, S., Montemagni, S., Gómez, J.: NLPenhanced content filtering within the POESIA project. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004 (2004)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Google Scholar
Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)
Article Google Scholar
Voorhees, E.M.: Using wordnet to disambiguate word sense for text retrieval. In: Proceedings of SIGIR 1993, 16th ACM International Conference on Research and Development in Information Retrieval, Pittsburgh, US, pp. 171–180 (1993)
Google Scholar
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Croft, W.B., van Rijsbergen, C.J. (eds.) Proceedings of the 17th Annual International Conference on Research and Development in Information Retrieval, London, UK, pp. 61–70. Springer, Heidelberg (1994)
Google Scholar
Voorhees, E.: Using WordNet for text retrieval. In: WordNet: An Electronic Lexical Database, MIT Press, Cambridge (1998)
Google Scholar
Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems (1998)
Google Scholar
Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading (1989)
Google Scholar
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proc. Of the 14th International Conf. On Machine Learning (1997)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Hearst, M.A., Gey, F., Tong, R. (eds.) Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 42–49. ACM Press, New York (1999)
Chapter Google Scholar
Scott, S.: Feature engineering for a symbolic approach to text classification. Master’s thesis, Computer Science Dept., University of Ottawa, Ottawa, CA (1998)
Google Scholar
Fukumoto, F., Suzuki, Y.: Learning lexical representation for text categorization. In: Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources (2001)
Google Scholar
Petridis, V., Kaburlasos, V., Fragkou, P., Kehagias, A.: Text classification using the σ-FLNMAP neural network. In: Proceedings of the 2001 International Joint Conference on Neural Networks (2001)
Google Scholar
Gómez, J., Cortizo, J., Puertas, E., Ruíz, M.: Concept indexing for automated text categorization. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 195–206. Springer, Heidelberg (2004)
Chapter Google Scholar
de Buenaga Rodríguez, M., Gómez Hidalgo, J., Díaz Agudo, B.: Using wordnet to complement training information in text categorization. In: Nicolov, N., Mitkov, R. (eds.) Recent Advances in Natural Language Processing II: Selected Papers from RANLP 1997. Current Issues in Linguistic Theory (CILT), vol. 189, pp. 353–364. John Benjamins, Amsterdam (2000)
Google Scholar
Ureña-López, L.A., Buenaga, M., Gómez, J.M.: Integrating linguistic resources in TC through WSD. Computers and the Humanities 35, 215–230 (2001)
Article Google Scholar
Benkhalifa, M., Mouradi, A., Bouyakhf, H.: Integrating external knowledge to supplement training data in semi-supervised learning for text categorization. Information Retrieval 4, 91–113 (2001)
Article MATH Google Scholar
Manning, C., Schütze, H.: 16: Text Categorization. In: Foundations of Statistical Natural Language Processing, pp. 575–608. The MIT Press, Cambridge (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Europea de Madrid, 28670, Villaviciosa de Odón, Madrid, Spain
José María Gómez Hidalgo & Manuel de Buenaga Rodríguez
AINet Solutions, 28943, Fuenlabrada, Madrid, Spain
José Carlos Cortizo Pérez

Authors

José María Gómez Hidalgo
View author publications
You can also search for this author in PubMed Google Scholar
Manuel de Buenaga Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
José Carlos Cortizo Pérez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
Andrés Montoyo
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gómez Hidalgo, J.M., de Buenaga Rodríguez, M., Cortizo Pérez, J.C. (2005). The Role of Word Sense Disambiguation in Automated Text Categorization. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_27

Download citation

DOI: https://doi.org/10.1007/11428817_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics