Applying an existing machine learning algorithm to text categorization

Moulinier, Isabelle; Ganascia, Jean -Gabriel

doi:10.1007/3-540-60925-3_58

Isabelle Moulinier¹ &
Jean -Gabriel Ganascia¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

International Joint Conference on Artificial Intelligence

219 Accesses
15 Citations

Abstract

The information retrieval community is becoming increasingly interested in machine learning techniques, of which text categorization is an application. This paper describes how we have applied an existing similarity-based learning algorithm, Charade, to the text categorization problem and compares the results with those obtained using decision tree construction algorithms. From a machine learning point of view, this study was motivated by the size of the inspected data in such applications. Using the same representation of documents, Charade offers better performance than earlier reported experiments with decision trees on the same corpus. In addition, the way in which learning with redundancy influences categorization performance is also studied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Hayes and S. Weinstein. CONSTRUE/TIS: a system for content-based indexing of a database of news stories. In Second Annual Conference on Innovative Applications of Artificial Intelligence, 1990.
Google Scholar
P. Hayes, P. Andersen, I. Nirenburg, and L. Schmandt. TCS: A Shell for Content-Based Text Categorization. In Proceeding of the Sixth IEEE CAIA, pages 321–325, 1990.
Google Scholar
G. DeJong. An Overview of the FRUMP system. In W. H. Lehnert and M.H. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, New Jersey, USA, 1982.
Google Scholar
D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, USA, 1994.
Google Scholar
C. Apté, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, July 1994.
Google Scholar
M. E. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, (8):404–417, 1961.
Google Scholar
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proc. of the 17th SIGIR, 1994.
Google Scholar
N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, and K. Tzeras. AIR/X — a Rule-Based Multistage Indexing System for Large Subject Fields. In Proc. of RIAO'91, Barcelona, Spain, 1991.
Google Scholar
B. Masand, G. Linoff, and D. Waltz. Classifying News Stories using Memory Based Reasoning. In Proc. of the 15th SIGIR, Copenhagen, Denmark, 1992.
Google Scholar
R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. Machine learning: an artificial intelligence approach (Vol. 2). Morgan Kaufmann, Los Altos, California, USA, 1986.
Google Scholar
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
Google Scholar
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–284, 1989.
Google Scholar
J.-G. Ganascia. Deriving the learning bias from rule properties. In J. E. Hayes, D. Mitchie, and E. Tyngu, editors, Machine Intelligence 12, pages 151–167. Clarendon Press, Oxford, 1991.
Google Scholar
J.-G. Ganascia. TDIS: an Algebraic Formalization. In International Joint Conference on Artificial Intelligence, Chambéry, France, 1993.
Google Scholar
G. Birkhoff. Lattice Theory. American Mathematical Society, Providence, Rhode Island, third edition, 1967.
Google Scholar
M. Gams. New measurements that highlight the importance of redundant knowledge. In M. Morik, editor, Proc. of the Fourth European Working Session on Learning, Montpellier, France, 1989. Pitman-Morgan Kaufman.
Google Scholar
J. Thomas, J.-G. Ganascia, and P. Laublet. Model-driven knowledge acquisition and knowledge-based machine learning: an example of a principled association. In Workshop IJCAI 16, Chambéry, France, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

LAFORIA-IBP-CNRS, Université Paris VI, 4 place Jussieu, F-75252, Paris Cedex 05, France
Isabelle Moulinier & Jean -Gabriel Ganascia

Authors

Isabelle Moulinier
View author publications
You can also search for this author in PubMed Google Scholar
Jean -Gabriel Ganascia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moulinier, I., Ganascia, J.G. (1996). Applying an existing machine learning algorithm to text categorization. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_58

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_58
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics