Abstract
In this paper we describe an approach based on rough set techniques for decision rule generation applied to text classification. A minimal discriminating set - a reduct - for the original data set is achieved through analyzing the degree of dependency among attributes. To speed up the search for reducts, the information gain criterion is used to reduce the number of attributes considered and rank the attributes in decreasing order, and heuristic functions are incorporated into a range of rule generation algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schütze H, Hull D and Pedersen JO. A comparison of classifiers and document representations for the routing problem. In proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, 1995, pp229–237
Yang Y and Chute CG. An example-based mapping method for text categorization and retrieval. ACM transactions on information systems, 1994, 12(3), pp252–277
Lewis DD and Ringuette M. A comparison of two learning algorithms for categorization. In symposium on document analysis and information retrieval, pp81–93, 1994
Wiener EJ, Pedersen O and Weigend AS. A neural network approach to topic spotting. In symposium on document analysis and information retrieval, 1995, pp 317–332
Cohen WW and Yoram S. Context-sensitive learning methods for text categorization. In proceedings of the annual ACM SIGIR conference on research and development in information retrieval, 1996, pp307–315
Salton G, Allan J, Buckley C and Singhal A. Automatic analysis, theme generation, and summarization of machine-readable texts. Science. 1994, 264:1421–1426
Apté C, Damerau F and Weiss SM. Towards language independent automated learning of text categorization models. In proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, 1994, pp24–30
Stefanowski J. On rough set based approaches to induction of decision rules. In Lech Polkowski and Andrzej Skowron (Eds) Studies in fuzziness and soft computing. Physica-Verlag, 1998, 1:500–529
Guang JW and Bell D. Rough computational methods for information systems. Artificial Intelligence 1998, 105:77–103
Pawlak Z. Rough Set: Theoretical aspects of reasoning about data. Kluwer Academic, 1991
Wroblewski J. Genetic algorithms in decomposition and classification problems. Polkowski, L. and Skowron, A. (Eds) Finding minimal reducts using genetic Algorithms. Rough set in knowledge discovery 2: applications, cases studies and software systems, Physica-Verlag, Heidelberg, 1998, pp472–492
Yang Y and Pedersen JP. A comparative study on feature selection in text categorization proceedings of the fourteenth international conference on machine learning, 1997
Quinlan JR. C4.5: Programs for machine learning. Morgan Kaufmann, 1993
Agrawal R, Imielinski T and Swami A. Mining association rules between sets of items in large databases. In proceedings of the ACM SIGMOD conference, 1993
Lewis DD. Reuters-21578: http://www.research.att.com/~lewis/reuters21578.html
van Rijsbergen CJ. Information Retrieval (second edition). Butterworths, 1979
Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, 1(1/2): 67–88
Bi Y, Murtagh F, McClean S and Anderson T. Text passage classification using supervised learning. Workshop on logical and uncertainty models for information systems, 1999, pp22–34
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag London
About this paper
Cite this paper
Bi, Y., Anderson, T., McClean, S. (2001). Rule Generation Based on Rough Set Theory for Text classification. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XVII. Springer, London. https://doi.org/10.1007/978-1-4471-0269-4_12
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0269-4_12
Publisher Name: Springer, London
Print ISBN: 978-1-85233-403-1
Online ISBN: 978-1-4471-0269-4
eBook Packages: Springer Book Archive