Skip to main content

Rule Generation Based on Rough Set Theory for Text classification

  • Conference paper
Research and Development in Intelligent Systems XVII

Abstract

In this paper we describe an approach based on rough set techniques for decision rule generation applied to text classification. A minimal discriminating set - a reduct - for the original data set is achieved through analyzing the degree of dependency among attributes. To speed up the search for reducts, the information gain criterion is used to reduce the number of attributes considered and rank the attributes in decreasing order, and heuristic functions are incorporated into a range of rule generation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schütze H, Hull D and Pedersen JO. A comparison of classifiers and document representations for the routing problem. In proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, 1995, pp229–237

    Google Scholar 

  2. Yang Y and Chute CG. An example-based mapping method for text categorization and retrieval. ACM transactions on information systems, 1994, 12(3), pp252–277

    Article  Google Scholar 

  3. Lewis DD and Ringuette M. A comparison of two learning algorithms for categorization. In symposium on document analysis and information retrieval, pp81–93, 1994

    Google Scholar 

  4. Wiener EJ, Pedersen O and Weigend AS. A neural network approach to topic spotting. In symposium on document analysis and information retrieval, 1995, pp 317–332

    Google Scholar 

  5. Cohen WW and Yoram S. Context-sensitive learning methods for text categorization. In proceedings of the annual ACM SIGIR conference on research and development in information retrieval, 1996, pp307–315

    Google Scholar 

  6. Salton G, Allan J, Buckley C and Singhal A. Automatic analysis, theme generation, and summarization of machine-readable texts. Science. 1994, 264:1421–1426

    Article  Google Scholar 

  7. Apté C, Damerau F and Weiss SM. Towards language independent automated learning of text categorization models. In proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, 1994, pp24–30

    Google Scholar 

  8. Stefanowski J. On rough set based approaches to induction of decision rules. In Lech Polkowski and Andrzej Skowron (Eds) Studies in fuzziness and soft computing. Physica-Verlag, 1998, 1:500–529

    Google Scholar 

  9. Guang JW and Bell D. Rough computational methods for information systems. Artificial Intelligence 1998, 105:77–103

    Article  Google Scholar 

  10. Pawlak Z. Rough Set: Theoretical aspects of reasoning about data. Kluwer Academic, 1991

    Google Scholar 

  11. Wroblewski J. Genetic algorithms in decomposition and classification problems. Polkowski, L. and Skowron, A. (Eds) Finding minimal reducts using genetic Algorithms. Rough set in knowledge discovery 2: applications, cases studies and software systems, Physica-Verlag, Heidelberg, 1998, pp472–492

    Google Scholar 

  12. Yang Y and Pedersen JP. A comparative study on feature selection in text categorization proceedings of the fourteenth international conference on machine learning, 1997

    Google Scholar 

  13. Quinlan JR. C4.5: Programs for machine learning. Morgan Kaufmann, 1993

    Google Scholar 

  14. Agrawal R, Imielinski T and Swami A. Mining association rules between sets of items in large databases. In proceedings of the ACM SIGMOD conference, 1993

    Google Scholar 

  15. Lewis DD. Reuters-21578: http://www.research.att.com/~lewis/reuters21578.html

  16. van Rijsbergen CJ. Information Retrieval (second edition). Butterworths, 1979

    Google Scholar 

  17. Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, 1(1/2): 67–88

    Google Scholar 

  18. Bi Y, Murtagh F, McClean S and Anderson T. Text passage classification using supervised learning. Workshop on logical and uncertainty models for information systems, 1999, pp22–34

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag London

About this paper

Cite this paper

Bi, Y., Anderson, T., McClean, S. (2001). Rule Generation Based on Rough Set Theory for Text classification. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XVII. Springer, London. https://doi.org/10.1007/978-1-4471-0269-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0269-4_12

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-403-1

  • Online ISBN: 978-1-4471-0269-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics