Skip to main content

Combination of Rough Sets and Genetic Algorithms for Text Classification

  • Conference paper
Autonomous Intelligent Systems: Multi-Agents and Data Mining (AIS-ADM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4476))

Abstract

Automatic categorization of documents into pre-defined taxonomies is a crucial step in data mining and knowledge discovery. Standard machine learning techniques like support vector machines(SVM) and related large margin methods have been successfully applied for this task. Unfortunately, the high dimensionality of input feature vectors impacts on the classification speed. The kernel parameters setting for SVM in a training process impacts on the classification accuracy. Feature selection is another factor that impacts classification accuracy. The objective of this work is to reduce the dimension of feature vectors, optimizing the parameters to improve the SVM classification accuracy and speed. In order to improve classification speed we spent rough sets theory to reduce the feature vector space. We present a genetic algorithm approach for feature selection and parameters optimization to improve classification accuracy. Experimental results indicate our method is more effective than traditional SVM methods and other traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burges, C.A.: Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  2. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  3. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines, pp. 100–103. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  4. Pal, S.K., Skowron, A. (eds.): Rough Fuzzy Hybridization: A New Trend in Decision-Making, pp. 36–70. Springer, Singapore (1983)

    Google Scholar 

  5. Pawlak, Z.: Rough sets. Int. J. Comput. Sci. 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  6. Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning About Data, pp. 10–50. Kluwer, Dordrecht (1991)

    MATH  Google Scholar 

  7. Pawlak, Z., Skowron, A.: Rough membership functions. In: Yaeger, R.R., Fedrizzi, M., Kacprzyk, J. (eds.) Advances in the Dempster Shafer Theory of Evidence, pp. 251–271. Wiley, Chichester (1994)

    Google Scholar 

  8. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man-Mach. Stud. 29, 81–85 (1988)

    Article  MATH  Google Scholar 

  9. Davis, L.: Handbook of genetic algorithms, pp. 55–61. Nostrand Reinhold, New York (1991)

    Google Scholar 

  10. Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning, pp. 23–32. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  11. Grefenstette, J.J.: Genetic algorithms for machine learning, pp. 100–106. Kluwer Academic Publishers, Boston (1994)

    Google Scholar 

  12. Vapnik, V.N.: The nature of statistical learning theory, pp. 61–70. Springer, New York (1995)

    MATH  Google Scholar 

  13. Frohlich, H., Chapelle, O.: Feature selection for support vector machines by means of genetic algorithms. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, CA, USA, pp. 142–148 (2003)

    Google Scholar 

  14. Yu, G.X., et al.: An SVM based algorithm for identification of photosynthesis-specific genome features. In: Second IEEE computer society bioinformatics conference, CA, USA, pp. 235–243 (2003)

    Google Scholar 

  15. Bradley, P.S., Mangasarian, O.L., Street, W.N.: Feature selection via mathematical programming. INFORMS Journal on Computing 10, 209–217 (1998)

    MATH  MathSciNet  Google Scholar 

  16. Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2003), Available at http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  17. LaValle, S.M., Branicky, M.S.: On the relationship between classical grid search and probabilistic roadmaps. International Journal of Robotics Research 23, 673–692 (2002)

    Google Scholar 

  18. Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  19. Pontil, M., Verri, A.: Support vector machines for 3D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(6), 637–646 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Vladimir Gorodetsky Chengqi Zhang Victor A. Skormin Longbing Cao

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Bai, R., Wang, X., Liao, J. (2007). Combination of Rough Sets and Genetic Algorithms for Text Classification. In: Gorodetsky, V., Zhang, C., Skormin, V.A., Cao, L. (eds) Autonomous Intelligent Systems: Multi-Agents and Data Mining. AIS-ADM 2007. Lecture Notes in Computer Science(), vol 4476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72839-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72839-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72838-2

  • Online ISBN: 978-3-540-72839-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics