Skip to main content

Multiple Sets of Rules for Text Categorization

  • Conference paper
  • 1405 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3261))

Abstract

This paper concerns how multiple sets of rules can be generated using a rough sets-based inductive learning method and how they can be combined for text categorization by using Dempster’s rule of combination. We first propose a boosting-like technique for generating multiple sets of rules based on rough set theory, and then model outcomes inferred from rules as pieces of evidence. The various experiments have been carried out on 10 out of the 20-newsgroups – a benchmark data collection – individually and in combination. Our experimental results support the claim that “k experts may be better than any one if their individual judgements are appropriately combined”.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Duda, R., Hart, P., Stork, D.: Pattern classification. John Wiley & Sons, Inc., New York (2001)

    MATH  Google Scholar 

  2. Klein, L.A.: Sensor and data fusion concepts and applications. Society of Photooptical Instrumentation Engineers. 2nd edn. (1999)

    Google Scholar 

  3. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156 (1996)

    Google Scholar 

  5. Mitchell, T.: Machine learning and data mining. Communications of ACM 42(11) (1999)

    Google Scholar 

  6. Bi, Y.: Combining Multiple Piece of Evidence for Text Categorization using Dempster’s rule of combination. Internal report (2004)

    Google Scholar 

  7. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Matero (1993)

    Google Scholar 

  8. Apte, C., Damerau, F., Weiss, S.: Automated Learning of Decision Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)

    Article  Google Scholar 

  9. Weiss, S.M., Indurkhya, N.: Lightweight Rule Induction. In: Proceedings of the International Conference on Machine Learning, ICML (2000)

    Google Scholar 

  10. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)

    MATH  Google Scholar 

  11. van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths (1979)

    Google Scholar 

  12. Joachims, T.: Text categorization With Support Vector Machines: Learning With Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)

    Google Scholar 

  13. Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)

    Google Scholar 

  14. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bi, Y., Anderson, T., McClean, S. (2004). Multiple Sets of Rules for Text Categorization. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30198-1_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23478-4

  • Online ISBN: 978-3-540-30198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics