Abstract
The information retrieval community is becoming increasingly interested in machine learning techniques, of which text categorization is an application. This paper describes how we have applied an existing similarity-based learning algorithm, Charade, to the text categorization problem and compares the results with those obtained using decision tree construction algorithms. From a machine learning point of view, this study was motivated by the size of the inspected data in such applications. Using the same representation of documents, Charade offers better performance than earlier reported experiments with decision trees on the same corpus. In addition, the way in which learning with redundancy influences categorization performance is also studied.
Preview
Unable to display preview. Download preview PDF.
References
P. Hayes and S. Weinstein. CONSTRUE/TIS: a system for content-based indexing of a database of news stories. In Second Annual Conference on Innovative Applications of Artificial Intelligence, 1990.
P. Hayes, P. Andersen, I. Nirenburg, and L. Schmandt. TCS: A Shell for Content-Based Text Categorization. In Proceeding of the Sixth IEEE CAIA, pages 321–325, 1990.
G. DeJong. An Overview of the FRUMP system. In W. H. Lehnert and M.H. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, New Jersey, USA, 1982.
D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, USA, 1994.
C. Apté, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, July 1994.
M. E. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, (8):404–417, 1961.
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proc. of the 17th SIGIR, 1994.
N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, and K. Tzeras. AIR/X — a Rule-Based Multistage Indexing System for Large Subject Fields. In Proc. of RIAO'91, Barcelona, Spain, 1991.
B. Masand, G. Linoff, and D. Waltz. Classifying News Stories using Memory Based Reasoning. In Proc. of the 15th SIGIR, Copenhagen, Denmark, 1992.
R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. Machine learning: an artificial intelligence approach (Vol. 2). Morgan Kaufmann, Los Altos, California, USA, 1986.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–284, 1989.
J.-G. Ganascia. Deriving the learning bias from rule properties. In J. E. Hayes, D. Mitchie, and E. Tyngu, editors, Machine Intelligence 12, pages 151–167. Clarendon Press, Oxford, 1991.
J.-G. Ganascia. TDIS: an Algebraic Formalization. In International Joint Conference on Artificial Intelligence, Chambéry, France, 1993.
G. Birkhoff. Lattice Theory. American Mathematical Society, Providence, Rhode Island, third edition, 1967.
M. Gams. New measurements that highlight the importance of redundant knowledge. In M. Morik, editor, Proc. of the Fourth European Working Session on Learning, Montpellier, France, 1989. Pitman-Morgan Kaufman.
J. Thomas, J.-G. Ganascia, and P. Laublet. Model-driven knowledge acquisition and knowledge-based machine learning: an example of a principled association. In Workshop IJCAI 16, Chambéry, France, 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moulinier, I., Ganascia, J.G. (1996). Applying an existing machine learning algorithm to text categorization. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_58
Download citation
DOI: https://doi.org/10.1007/3-540-60925-3_58
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive