Skip to main content

Applying an existing machine learning algorithm to text categorization

  • Conference paper
  • First Online:
Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing (IJCAI 1995)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

Abstract

The information retrieval community is becoming increasingly interested in machine learning techniques, of which text categorization is an application. This paper describes how we have applied an existing similarity-based learning algorithm, Charade, to the text categorization problem and compares the results with those obtained using decision tree construction algorithms. From a machine learning point of view, this study was motivated by the size of the inspected data in such applications. Using the same representation of documents, Charade offers better performance than earlier reported experiments with decision trees on the same corpus. In addition, the way in which learning with redundancy influences categorization performance is also studied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Hayes and S. Weinstein. CONSTRUE/TIS: a system for content-based indexing of a database of news stories. In Second Annual Conference on Innovative Applications of Artificial Intelligence, 1990.

    Google Scholar 

  2. P. Hayes, P. Andersen, I. Nirenburg, and L. Schmandt. TCS: A Shell for Content-Based Text Categorization. In Proceeding of the Sixth IEEE CAIA, pages 321–325, 1990.

    Google Scholar 

  3. G. DeJong. An Overview of the FRUMP system. In W. H. Lehnert and M.H. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, New Jersey, USA, 1982.

    Google Scholar 

  4. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, USA, 1994.

    Google Scholar 

  5. C. Apté, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, July 1994.

    Google Scholar 

  6. M. E. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, (8):404–417, 1961.

    Google Scholar 

  7. Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proc. of the 17th SIGIR, 1994.

    Google Scholar 

  8. N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, and K. Tzeras. AIR/X — a Rule-Based Multistage Indexing System for Large Subject Fields. In Proc. of RIAO'91, Barcelona, Spain, 1991.

    Google Scholar 

  9. B. Masand, G. Linoff, and D. Waltz. Classifying News Stories using Memory Based Reasoning. In Proc. of the 15th SIGIR, Copenhagen, Denmark, 1992.

    Google Scholar 

  10. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. Machine learning: an artificial intelligence approach (Vol. 2). Morgan Kaufmann, Los Altos, California, USA, 1986.

    Google Scholar 

  11. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.

    Google Scholar 

  12. P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–284, 1989.

    Google Scholar 

  13. J.-G. Ganascia. Deriving the learning bias from rule properties. In J. E. Hayes, D. Mitchie, and E. Tyngu, editors, Machine Intelligence 12, pages 151–167. Clarendon Press, Oxford, 1991.

    Google Scholar 

  14. J.-G. Ganascia. TDIS: an Algebraic Formalization. In International Joint Conference on Artificial Intelligence, Chambéry, France, 1993.

    Google Scholar 

  15. G. Birkhoff. Lattice Theory. American Mathematical Society, Providence, Rhode Island, third edition, 1967.

    Google Scholar 

  16. M. Gams. New measurements that highlight the importance of redundant knowledge. In M. Morik, editor, Proc. of the Fourth European Working Session on Learning, Montpellier, France, 1989. Pitman-Morgan Kaufman.

    Google Scholar 

  17. J. Thomas, J.-G. Ganascia, and P. Laublet. Model-driven knowledge acquisition and knowledge-based machine learning: an example of a principled association. In Workshop IJCAI 16, Chambéry, France, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moulinier, I., Ganascia, J.G. (1996). Applying an existing machine learning algorithm to text categorization. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_58

Download citation

  • DOI: https://doi.org/10.1007/3-540-60925-3_58

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60925-4

  • Online ISBN: 978-3-540-49738-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics