Abstract
This paper concerns how multiple sets of rules can be generated using a rough sets-based inductive learning method and how they can be combined for text categorization by using Dempster’s rule of combination. We first propose a boosting-like technique for generating multiple sets of rules based on rough set theory, and then model outcomes inferred from rules as pieces of evidence. The various experiments have been carried out on 10 out of the 20-newsgroups – a benchmark data collection – individually and in combination. Our experimental results support the claim that “k experts may be better than any one if their individual judgements are appropriately combined”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Duda, R., Hart, P., Stork, D.: Pattern classification. John Wiley & Sons, Inc., New York (2001)
Klein, L.A.: Sensor and data fusion concepts and applications. Society of Photooptical Instrumentation Engineers. 2nd edn. (1999)
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156 (1996)
Mitchell, T.: Machine learning and data mining. Communications of ACM 42(11) (1999)
Bi, Y.: Combining Multiple Piece of Evidence for Text Categorization using Dempster’s rule of combination. Internal report (2004)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Matero (1993)
Apte, C., Damerau, F., Weiss, S.: Automated Learning of Decision Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)
Weiss, S.M., Indurkhya, N.: Lightweight Rule Induction. In: Proceedings of the International Conference on Machine Learning, ICML (2000)
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths (1979)
Joachims, T.: Text categorization With Support Vector Machines: Learning With Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bi, Y., Anderson, T., McClean, S. (2004). Multiple Sets of Rules for Text Categorization. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-30198-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23478-4
Online ISBN: 978-3-540-30198-1
eBook Packages: Computer ScienceComputer Science (R0)