Skip to main content

The Role of Word Sense Disambiguation in Automated Text Categorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Abstract

Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate automatic classifiers by using Information Retrieval and Machine Learning Techniques. However, performance of this approach is damaged by the problems derived from language variation (specially polysemy and synonymy). We investigate how Word Sense Disambiguation can be used to alleviate these problems, by using two traditional methods for thesaurus usage in Information Retrieval, namely Query Expansion and Concept Indexing. These methods are evaluated on the problem of using the Lexical Database WordNet for text categorization, focusing on the Word Sense Disambiguation step involved. Our experiments demonstrate that rather simple dictionary methods, and baseline statistical approaches, can be used to disambiguate words and improve text representation and learning in both Query Expansion and Concept Indexing approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhdanova, A.V., Shishkin, D.V.: Classification of email queries by topic: Approach based on hierarchically structured subject domain. In: Yin, H., Allinson, N.M., Freeman, R., Keane, J.A., Hubbard, S. (eds.) IDEAL 2002. LNCS, vol. 2412, pp. 99–104. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Mladenić, D.: Turning Yahoo! into an automatic Web page classifier. In: Prade, H. (ed.) Proceedings of ECAI 1998, 13th European Conference on Artificial Intelligence, Brighton, UK, pp. 473–474. John Wiley and Sons, Chichester (1998)

    Google Scholar 

  3. Gómez, J.: Evaluating cost-sensitive unsolicited bulk email categorization. In: Proceedings of SAC 2002, 17th ACM Symposium on Applied Computing, Madrid, ES, pp. 615–620 (2002)

    Google Scholar 

  4. Hepple, M., Ireson, N., Allegrini, P., Marchi, S., Montemagni, S., Gómez, J.: NLPenhanced content filtering within the POESIA project. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004 (2004)

    Google Scholar 

  5. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  6. Van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)

    Google Scholar 

  7. Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)

    Article  Google Scholar 

  8. Voorhees, E.M.: Using wordnet to disambiguate word sense for text retrieval. In: Proceedings of SIGIR 1993, 16th ACM International Conference on Research and Development in Information Retrieval, Pittsburgh, US, pp. 171–180 (1993)

    Google Scholar 

  9. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Croft, W.B., van Rijsbergen, C.J. (eds.) Proceedings of the 17th Annual International Conference on Research and Development in Information Retrieval, London, UK, pp. 61–70. Springer, Heidelberg (1994)

    Google Scholar 

  10. Voorhees, E.: Using WordNet for text retrieval. In: WordNet: An Electronic Lexical Database, MIT Press, Cambridge (1998)

    Google Scholar 

  11. Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems (1998)

    Google Scholar 

  12. Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  13. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proc. Of the 14th International Conf. On Machine Learning (1997)

    Google Scholar 

  14. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Hearst, M.A., Gey, F., Tong, R. (eds.) Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 42–49. ACM Press, New York (1999)

    Chapter  Google Scholar 

  15. Scott, S.: Feature engineering for a symbolic approach to text classification. Master’s thesis, Computer Science Dept., University of Ottawa, Ottawa, CA (1998)

    Google Scholar 

  16. Fukumoto, F., Suzuki, Y.: Learning lexical representation for text categorization. In: Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources (2001)

    Google Scholar 

  17. Petridis, V., Kaburlasos, V., Fragkou, P., Kehagias, A.: Text classification using the σ-FLNMAP neural network. In: Proceedings of the 2001 International Joint Conference on Neural Networks (2001)

    Google Scholar 

  18. Gómez, J., Cortizo, J., Puertas, E., Ruíz, M.: Concept indexing for automated text categorization. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 195–206. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. de Buenaga Rodríguez, M., Gómez Hidalgo, J., Díaz Agudo, B.: Using wordnet to complement training information in text categorization. In: Nicolov, N., Mitkov, R. (eds.) Recent Advances in Natural Language Processing II: Selected Papers from RANLP 1997. Current Issues in Linguistic Theory (CILT), vol. 189, pp. 353–364. John Benjamins, Amsterdam (2000)

    Google Scholar 

  20. Ureña-López, L.A., Buenaga, M., Gómez, J.M.: Integrating linguistic resources in TC through WSD. Computers and the Humanities 35, 215–230 (2001)

    Article  Google Scholar 

  21. Benkhalifa, M., Mouradi, A., Bouyakhf, H.: Integrating external knowledge to supplement training data in semi-supervised learning for text categorization. Information Retrieval 4, 91–113 (2001)

    Article  MATH  Google Scholar 

  22. Manning, C., Schütze, H.: 16: Text Categorization. In: Foundations of Statistical Natural Language Processing, pp. 575–608. The MIT Press, Cambridge (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gómez Hidalgo, J.M., de Buenaga Rodríguez, M., Cortizo Pérez, J.C. (2005). The Role of Word Sense Disambiguation in Automated Text Categorization. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_27

Download citation

  • DOI: https://doi.org/10.1007/11428817_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26031-8

  • Online ISBN: 978-3-540-32110-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics