Advertisement

Study on Web Text Feature Selection Based on Rough Set

  • Xianghua Lu
  • Weijing Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7389)

Abstract

This paper uses vector space model as the description of the Web text, analyses the feature of the Web pages which are written in HTML, and improves the traditional formula of TF-IDF. The feature weight is calculated according to the term location in the document. In addition, a text classification system based on Vector Space Model is studied. In the article, feature selection and text classification is connected and feature terms are selected depending on the term’s importance to classification, and then the paper proposes a feature selection algorithm based on rough set. Experiments show that this method can effectively improve the classification accuracy. It can not only reduce the dimension of feature space, but also improve the accuracy of classification.

Keywords

feature weight feature selection rough set text classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wang, J.C., Pan, J.G., Zhang, F.Y.: Research on Web Text Mining. Journal of Computer Research and Development 37, 513–520 (2007) (in Chinese)Google Scholar
  2. 2.
    Liu, H.: Research on Some Problems in Text Classification. Jilin University, Jilin (2009)Google Scholar
  3. 3.
    Liu, L.: The Research and Implementation of Automatic Classification for Chinese Web Text. University of Changchun for Science and Technology, Changchun (2007)Google Scholar
  4. 4.
    Chu, J.C., Liu, P.Y., Wang, W.L.: Improvement Approach to Weighting Terms in Web Text. Computer Engineering and Applications 43, 192–194 (2007) (in Chinese)Google Scholar
  5. 5.
    Tai, D.Y., Xie, F., Hu, X.G.: Text Categorization Based on Position Weight of Feature Term. Journal of Anhui Technical College of Water Resources and Hydroelectric (3), 64–66 (2008) (in Chinese)Google Scholar
  6. 6.
    Tan, J.B., Yang, X.J., Li, Y.: An Improved Approach to Term Weighting in Automatic Web Page Classification. Journal of the China Society for Scientific and Technical Information 27, 56–61 (2008) (in Chinese)Google Scholar
  7. 7.
    Liu, H.F., Zhao, H., Liu, S.S.: An Improved Method of Chinese Text Feature Selection Based on Position. Library and Information Service 53, 102–105 (2009) (in Chinese)Google Scholar
  8. 8.
    Wang, G.Y.: Rough Sets Theory and Knowledge Acquisition. Xi’an JiaoTong University Press, Xi’an (2001) (in Chinese)Google Scholar
  9. 9.
    Zhang, B.F., Shi, H.J.: Improved Algorithm of Automatic Classification Based on Rough Sets. Computer Engineering and Applications 47, 129–131 (2011) (in Chinese)Google Scholar
  10. 10.
    Chen, S.R., Zhang, Y., Yang, Z.Y.: The Research of the Feature Selection Method Based on Rough Set. Computer Engineering and Applications 42, 159–161 (2006) (in Chinese)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Xianghua Lu
    • 1
  • Weijing Wang
    • 1
  1. 1.Department of Computer and Information EngineeringLuoyang Institute of Science and TechnologyLuoyangP.R. China

Personalised recommendations