Skip to main content

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document

  • Conference paper
Book cover AI 2004: Advances in Artificial Intelligence (AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Abstract

Nowadays, automated Web document classification is considered as an important method to manage and process an enormous amount of Web documents in digital forms that are extensive and constantly increasing. Recently, document classification has been addressed with various classified techniques such as naïve Bayesian, TFIDF (Term Frequency Inverse Document Frequency), FCA (Formal Concept Analysis) and MCRDR (Multiple Classification Ripple Down Rules). We suggest the BayesTH-MCRDR algorithm for useful new Web document classification in this paper. We offer a composite algorithm that combines a naïve Bayesian algorithm using Threshold and the MCRDR algorithm. The prominent feature of the BayesTH-MCRDR algorithm is optimisation of the initial relationship between keywords before final assignment to a category in order to get higher document classification accuracy. We also present the system we have developed in order to demonstrate and compare a number of classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mitchell, T.: Machine Learning. International Edition. McGraw-Hill, New York (1995)

    Google Scholar 

  2. Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), pp. 143–151 (1997)

    Google Scholar 

  3. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Ordered sets, pp. 445–470 (1982)

    Google Scholar 

  4. Kang, B.H.: Validating Knowledge Acquisition: Multiple Classification Ripple Down Rules, PhD dissertation, School of Computer Science and Engineering at the University of New South Wales (1995)

    Google Scholar 

  5. McCallum, A., Nigram, K.: A Comparison of Event Models for Naïve Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Tex Categorization (1998)

    Google Scholar 

  6. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  7. http://snowball.tartarus.org/porter/stemmer.html

  8. Birkhoff, G.: Lattice Theory 3rdedition, American Mathematical Society, Incremental Clustering for Dynamic Information Processing. ACM Transactions on Information Processing Systems 11, 143–164 (1993)

    Article  Google Scholar 

  9. Ganter, B., Wille, R.: General lattice theory, 2nd edn., pp. 591–605. Birkhauser, Basel (1998)

    MATH  Google Scholar 

  10. Ganter, B., Wille, R.: Formal Concept Analysis – mathematical Foundations Berlin. Springer, Heidelberg (1999)

    Google Scholar 

  11. Lewis, D.D.: Feature Selection and Feature Extraction for Text Categorization. In: Proceedings of Speech and Natural Language Workshop, pp. 212–217 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cho, WC., Richards, D. (2004). BayesTH-MCRDR Algorithm for Automatic Classification of Web Document. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30549-1_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24059-4

  • Online ISBN: 978-3-540-30549-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics