Skip to main content
Log in

A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Most of the text categorization algorithms in the literature represent documents as collections of words. An alternative which has not been sufficiently explored is the use of word meanings, also known as senses. In this paper, using several algorithms, we compare the categorization accuracy of classifiers based on words to that of classifiers based on senses. The document collection on which this comparison takes place is a subset of the annotated Brown Corpus semantic concordance. A series of experiments indicates that the use of senses does not result in any significant categorization improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bengio, Y., Ducharme, R., and Vincent, P. (2000). A Neural Probabilistic Language Model. Technical Report No. 1178, Universite de Montreal, Montreal, Quebec, Canada.

    Google Scholar 

  • Benkhalifa, M., Mouradi, A., and Bouyakhf, H. (2001). Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization. Information Retrieval, 4, 91–113.

    Google Scholar 

  • Birkhoff, G. (1967). Lattice Theory, Vol. 25. Providence, RI: American Mathematical Society, Colloquium Publications.

    Google Scholar 

  • Buenaga, R.M., Gomez-Hidalgo, J.M., and Diaz-Agudo, B. (1997). Using WordNet to Complement Training Information in Text Categorization. In Proc. of the 2nd International Conf. on Recent Advances in Natural Language Processing.

  • Domingos, P. and Pazzani, M. (1997). On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss. Mach. Learning, 29, 103–130.

    Google Scholar 

  • Duda, R.O., Hart, P.E., and Stork, D.G. (2001). Pattern Classification, John Wiley & Sons.

  • Francis, W.N. and Kucera, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar, Houghton Mifflin.

  • Gonzalo, J., Verdejo, F., Chugur, I., and Cigarran, J. (1998). Indexing with WordNet Synsets can Improve Text Retrieval. In Proc. of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems.

  • Joachims, T. (1997). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proc. of the 14th International Conference on Machine Learning ICML97 (pp. 143–151).

  • Kaburlasos, V.G. and Petridis, V. (2000). Fuzzy Lattice Neurocomputing (FLN) Models. Neural Networks, 13, 1145–1170.

    Google Scholar 

  • Lewis, D.D. (1998). Naive Bayes at Forty: The Independence Assumption in Information Retrieval. In Proc. of the ECML'98 (pp. 4–15).

  • Lewis, D.D., Schapire, R.E., Callan, J.P., and Papka, R. (1996). Training Algorithms for Linear Text Classifiers. In Proc. of the ACM/SIGIR-96 Conference (pp. 298–306).

  • Manning, C.D. and Schuetze, H. (1999). Foundations of Statistical Natural Language Processing, MIT Press.

  • Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K.J. (1990). Introduction to WordNet: An On-Line Lexical Database. Int. J. of Lexicography, 3, 235–244.

    Google Scholar 

  • Mitchell, T.M. (1997). Machine Learning, McGraw-Hill.

  • Mladenic, D. (1998). Machine Learning of Non-Homogeneous Distributed Text Data, Ph.D. dissertation, Dept. of Computer and Information Science, Univ. of Ljubljana.

  • Mladenic, D. (1999). Text Learning and Related Intelligent Agents:A Survey. IEEE Intelligent Systems, 14, 44–54.

    Google Scholar 

  • Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. (2000). Text Classification from Labeled and Unlabeled Documents Using EM. Mach. Learning, 39, 103–134.

    Google Scholar 

  • Petridis, V. and Kaburlasos, V.G. (1999). Learning in the Framework of Fuzzy Lattices. IEEE Trans. on Fuzzy Systems, 7, 422–440.

    Google Scholar 

  • Petridis, V. and Kaburlasos, V.G. (2000). An Intelligent Mechatronics Solution for Automated Tool Guidance in the Epidural Surgical Procedure. In Proc. 7th Conf. on Mechatronics and Machine Vision in Practice (M2VIP'00) (pp. 201–206).

  • Petridis, V. and Kehagias, A. (1996). Modular Neural Networks for Bayesian Classification of Time Series and the Partition Algorithm, IEEE Trans. on Neural Networks, 7, 73–86.

    Google Scholar 

  • Petridis, V. and Kehagias, A. (1998). Predictive Modular Neural Networks: Time Series Applications, Kluwer.

  • Sanderson, M. (1994). Word Sense Disambiguation and Information Retrieval. In Proc. of the 17th ACM/SIGIR-94 Conference, pp. 142–150.

  • Sanderson, M. (2000). Retrieving with Good Sense. Information Retrieval, 2, 49–69.

    Google Scholar 

  • Sanderson, M. and van Rijsbergen, C.J. (1999). The Impact on Retrieval Effectiveness of Skewed Frequency Distributions. ACM Trans. on Information Systems, 17, 440–465.

    Google Scholar 

  • Scott, S. and Matwin, S. (1998). Text Classification Using WordNet Hypernyms. In Proc. of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, 45–52.

  • Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Comp. Surv., 34, 1–47.

    Google Scholar 

  • Urena-Lopez, L.A., Buenaga, M., Garcia, M., and Gomez-Hidalgo, J.M. (1998). Integrating and Evaluating WSD in the Adaptation of a Lexical Database in Text Categorization Task. In Proc. of the 1st Workshop on Text, Speech, Dialogue.

  • Urena-Lopez, L.A., Buenaga, and Gomez-Hidalgo, J.M. (2001). Integrating Linguistic Resources in TC through WSD. Computers and the Humanities, 35, 215–230.

    Google Scholar 

  • Yang, Y. (1999). An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval, 1, 69–90.

    Google Scholar 

  • Yang, Y. and Liu, X. (1999). A Re-Examination of Text Categorization Methods. In Proc. of 22nd Annual International SIGIR Conference (pp. 42–49).

  • Yang, Y. and Pedersen, J.O. (1997). A Comparative Study on Feature Selection in Text Categorization. In Proc. of the 14th International Conf. on Machine Learning (ICML'97) (pp. 412–420).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kehagias, A., Petridis, V., Kaburlasos, V.G. et al. A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms. Journal of Intelligent Information Systems 21, 227–247 (2003). https://doi.org/10.1023/A:1025554732352

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025554732352

Navigation