A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Kehagias, Athanasios; Petridis, Vassilios; Kaburlasos, Vassilis G.; Fragkou, Pavlina

doi:10.1023/A:1025554732352

A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Published: November 2003

Volume 21, pages 227–247, (2003)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Athanasios Kehagias¹,
Vassilios Petridis²,
Vassilis G. Kaburlasos³ &
…
Pavlina Fragkou²

329 Accesses
59 Citations
Explore all metrics

Abstract

Most of the text categorization algorithms in the literature represent documents as collections of words. An alternative which has not been sufficiently explored is the use of word meanings, also known as senses. In this paper, using several algorithms, we compare the categorization accuracy of classifiers based on words to that of classifiers based on senses. The document collection on which this comparison takes place is a subset of the annotated Brown Corpus semantic concordance. A series of experiments indicates that the use of senses does not result in any significant categorization improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Bengio, Y., Ducharme, R., and Vincent, P. (2000). A Neural Probabilistic Language Model. Technical Report No. 1178, Universite de Montreal, Montreal, Quebec, Canada.
Google Scholar
Benkhalifa, M., Mouradi, A., and Bouyakhf, H. (2001). Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization. Information Retrieval, 4, 91–113.
Google Scholar
Birkhoff, G. (1967). Lattice Theory, Vol. 25. Providence, RI: American Mathematical Society, Colloquium Publications.
Google Scholar
Buenaga, R.M., Gomez-Hidalgo, J.M., and Diaz-Agudo, B. (1997). Using WordNet to Complement Training Information in Text Categorization. In Proc. of the 2nd International Conf. on Recent Advances in Natural Language Processing.
Domingos, P. and Pazzani, M. (1997). On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss. Mach. Learning, 29, 103–130.
Google Scholar
Duda, R.O., Hart, P.E., and Stork, D.G. (2001). Pattern Classification, John Wiley & Sons.
Francis, W.N. and Kucera, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar, Houghton Mifflin.
Gonzalo, J., Verdejo, F., Chugur, I., and Cigarran, J. (1998). Indexing with WordNet Synsets can Improve Text Retrieval. In Proc. of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems.
Joachims, T. (1997). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proc. of the 14th International Conference on Machine Learning ICML97 (pp. 143–151).
Kaburlasos, V.G. and Petridis, V. (2000). Fuzzy Lattice Neurocomputing (FLN) Models. Neural Networks, 13, 1145–1170.
Google Scholar
Lewis, D.D. (1998). Naive Bayes at Forty: The Independence Assumption in Information Retrieval. In Proc. of the ECML'98 (pp. 4–15).
Lewis, D.D., Schapire, R.E., Callan, J.P., and Papka, R. (1996). Training Algorithms for Linear Text Classifiers. In Proc. of the ACM/SIGIR-96 Conference (pp. 298–306).
Manning, C.D. and Schuetze, H. (1999). Foundations of Statistical Natural Language Processing, MIT Press.
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K.J. (1990). Introduction to WordNet: An On-Line Lexical Database. Int. J. of Lexicography, 3, 235–244.
Google Scholar
Mitchell, T.M. (1997). Machine Learning, McGraw-Hill.
Mladenic, D. (1998). Machine Learning of Non-Homogeneous Distributed Text Data, Ph.D. dissertation, Dept. of Computer and Information Science, Univ. of Ljubljana.
Mladenic, D. (1999). Text Learning and Related Intelligent Agents:A Survey. IEEE Intelligent Systems, 14, 44–54.
Google Scholar
Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. (2000). Text Classification from Labeled and Unlabeled Documents Using EM. Mach. Learning, 39, 103–134.
Google Scholar
Petridis, V. and Kaburlasos, V.G. (1999). Learning in the Framework of Fuzzy Lattices. IEEE Trans. on Fuzzy Systems, 7, 422–440.
Google Scholar
Petridis, V. and Kaburlasos, V.G. (2000). An Intelligent Mechatronics Solution for Automated Tool Guidance in the Epidural Surgical Procedure. In Proc. 7th Conf. on Mechatronics and Machine Vision in Practice (M2VIP'00) (pp. 201–206).
Petridis, V. and Kehagias, A. (1996). Modular Neural Networks for Bayesian Classification of Time Series and the Partition Algorithm, IEEE Trans. on Neural Networks, 7, 73–86.
Google Scholar
Petridis, V. and Kehagias, A. (1998). Predictive Modular Neural Networks: Time Series Applications, Kluwer.
Sanderson, M. (1994). Word Sense Disambiguation and Information Retrieval. In Proc. of the 17th ACM/SIGIR-94 Conference, pp. 142–150.
Sanderson, M. (2000). Retrieving with Good Sense. Information Retrieval, 2, 49–69.
Google Scholar
Sanderson, M. and van Rijsbergen, C.J. (1999). The Impact on Retrieval Effectiveness of Skewed Frequency Distributions. ACM Trans. on Information Systems, 17, 440–465.
Google Scholar
Scott, S. and Matwin, S. (1998). Text Classification Using WordNet Hypernyms. In Proc. of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, 45–52.
Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Comp. Surv., 34, 1–47.
Google Scholar
Urena-Lopez, L.A., Buenaga, M., Garcia, M., and Gomez-Hidalgo, J.M. (1998). Integrating and Evaluating WSD in the Adaptation of a Lexical Database in Text Categorization Task. In Proc. of the 1st Workshop on Text, Speech, Dialogue.
Urena-Lopez, L.A., Buenaga, and Gomez-Hidalgo, J.M. (2001). Integrating Linguistic Resources in TC through WSD. Computers and the Humanities, 35, 215–230.
Google Scholar
Yang, Y. (1999). An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval, 1, 69–90.
Google Scholar
Yang, Y. and Liu, X. (1999). A Re-Examination of Text Categorization Methods. In Proc. of 22nd Annual International SIGIR Conference (pp. 42–49).
Yang, Y. and Pedersen, J.O. (1997). A Comparative Study on Feature Selection in Text Categorization. In Proc. of the 14th International Conf. on Machine Learning (ICML'97) (pp. 412–420).

Download references

Author information

Authors and Affiliations

Department of Math., Phys. and Comp. Sciences, Division of Mathematics, Aristotle University of Thessaloniki (AUTh), GR-54124, Thessaloniki, Greece
Athanasios Kehagias
Department of Electrical and Computer Engineering, Division of Electronics and Computer Engineering, Aristotle University of Thessaloniki (AUTh), GR-54124, Thessaloniki, Greece
Vassilios Petridis & Pavlina Fragkou
Department of Industrial Informatics, Division of Software Systems, Technological Educational Institute of Kavala, GR-65404, Kavala, Greece
Vassilis G. Kaburlasos

Authors

Athanasios Kehagias
View author publications
You can also search for this author in PubMed Google Scholar
Vassilios Petridis
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis G. Kaburlasos
View author publications
You can also search for this author in PubMed Google Scholar
Pavlina Fragkou
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kehagias, A., Petridis, V., Kaburlasos, V.G. et al. A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms. Journal of Intelligent Information Systems 21, 227–247 (2003). https://doi.org/10.1023/A:1025554732352

Download citation

Issue Date: November 2003
DOI: https://doi.org/10.1023/A:1025554732352

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation