Abstract
Most classification studies are done by using all the objects data. It is expected to classify objects by using some subsets data in the total data. A rough set based reduct is a minimal subset of features, which has almost the same discernible power as the entire conditional features. Here, we propose multiple reducts with confidence, which are followed by the k-nearest neighbor to classify documents to improve the classification accuracy. To select better multiple reducts for the classification, we develop a greedy algorithm for the multiple reducts, which is based on the selection of useful attributes for the documents classification. These proposed methods are verified to be effective in the classification on benchmark datasets from the Reuters 21578 data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11, 341–356 (1982)
Pawlak, Z., Slowinski, R.: Rough Set Approach to Multi-attribute Decision Analysis. European Journal of Operations Research 72, 443–459 (1994)
Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Intelligent Decision Support- Handbook of Application and Advances of Rough Sets Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)
Skowron, A., Polkowski, L.: Decision Algorithms, A Survey of Rough Set Theoretic Methods. Fundamenta Informaticae 30(3-4), 345–358 (1997)
http://www.daviddlewis.com/resources/testcollections/reuters21578
Bao, Y., Aoyama, S., Du, X., Yamada, K., Ishii, N.: A Rough Set –Based Hybrid Method to Text Categorization. In: Proc. 2nd International Conference on Web Information Systems Engineering, pp. 254–261. IEEE Computer Society, Los Alamitos (2001)
Bao, Y., Tsuchiya, E., Ishii, N.: Classification by Instance-Based Learning. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 133–140. Springer, Heidelberg (2005)
Momin, B.F., Mitra, S., Gupta, R.D.: Reduct Generation and Classification of Gene Expression Data. In: Proc. International Conference on Hybrid Information Technology-ICHIT’06, vol. I, pp. 699–708. IEEE Computer Society, Los Alamitos (2006)
Cheetham, W., Price, K.: Measures of solution accuracy in case-based reasoning system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 106–118. Springer, Heidelberg (2004)
Delany, S.J., Cunningham, P., Doyle, D., Zamolotskikh, A.: Generating Estimates of Classification Confidence for a Case-Based Spam Filter. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 177–190. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishii, N., Morioka, Y., Kimura, H., Bao, Y. (2010). Classification by Multiple Reducts-kNN with Confidence. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2010. IDEAL 2010. Lecture Notes in Computer Science, vol 6283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15381-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-15381-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15380-8
Online ISBN: 978-3-642-15381-5
eBook Packages: Computer ScienceComputer Science (R0)