Abstract
Automatic categorization of documents into pre-defined taxonomies is a crucial step in data mining and knowledge discovery. Standard machine learning techniques like support vector machines(SVM) and related large margin methods have been successfully applied for this task. Unfortunately, the high dimensionality of input feature vectors impacts on the classification speed. The kernel parameters setting for SVM in a training process impacts on the classification accuracy. Feature selection is another factor that impacts classification accuracy. The objective of this work is to reduce the dimension of feature vectors, optimizing the parameters to improve the SVM classification accuracy and speed. In order to improve classification speed we spent rough sets theory to reduce the feature vector space. We present a genetic algorithm approach for feature selection and parameters optimization to improve classification accuracy. Experimental results indicate our method is more effective than traditional SVM methods and other traditional methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burges, C.A.: Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines, pp. 100–103. Cambridge University Press, Cambridge (2000)
Pal, S.K., Skowron, A. (eds.): Rough Fuzzy Hybridization: A New Trend in Decision-Making, pp. 36–70. Springer, Singapore (1983)
Pawlak, Z.: Rough sets. Int. J. Comput. Sci. 11, 341–356 (1982)
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning About Data, pp. 10–50. Kluwer, Dordrecht (1991)
Pawlak, Z., Skowron, A.: Rough membership functions. In: Yaeger, R.R., Fedrizzi, M., Kacprzyk, J. (eds.) Advances in the Dempster Shafer Theory of Evidence, pp. 251–271. Wiley, Chichester (1994)
Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man-Mach. Stud. 29, 81–85 (1988)
Davis, L.: Handbook of genetic algorithms, pp. 55–61. Nostrand Reinhold, New York (1991)
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning, pp. 23–32. Addison-Wesley, Reading (1989)
Grefenstette, J.J.: Genetic algorithms for machine learning, pp. 100–106. Kluwer Academic Publishers, Boston (1994)
Vapnik, V.N.: The nature of statistical learning theory, pp. 61–70. Springer, New York (1995)
Frohlich, H., Chapelle, O.: Feature selection for support vector machines by means of genetic algorithms. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, CA, USA, pp. 142–148 (2003)
Yu, G.X., et al.: An SVM based algorithm for identification of photosynthesis-specific genome features. In: Second IEEE computer society bioinformatics conference, CA, USA, pp. 235–243 (2003)
Bradley, P.S., Mangasarian, O.L., Street, W.N.: Feature selection via mathematical programming. INFORMS Journal on Computing 10, 209–217 (1998)
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2003), Available at http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
LaValle, S.M., Branicky, M.S.: On the relationship between classical grid search and probabilistic roadmaps. International Journal of Robotics Research 23, 673–692 (2002)
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Pontil, M., Verri, A.: Support vector machines for 3D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(6), 637–646 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Bai, R., Wang, X., Liao, J. (2007). Combination of Rough Sets and Genetic Algorithms for Text Classification. In: Gorodetsky, V., Zhang, C., Skormin, V.A., Cao, L. (eds) Autonomous Intelligent Systems: Multi-Agents and Data Mining. AIS-ADM 2007. Lecture Notes in Computer Science(), vol 4476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72839-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-72839-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72838-2
Online ISBN: 978-3-540-72839-9
eBook Packages: Computer ScienceComputer Science (R0)