Advertisement

Extract Semantic Information from WordNet to Improve Text Classification Performance

  • Rujiang Bai
  • Xiaoyue Wang
  • Junhua Liao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6059)

Abstract

Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual documents according to their meaning. In this paper, we propose a Concept-based Vector Space Model which reflects the more abstract version of the semantic information instead of the Vector Space Model for the text. This model adjusts the weight of the Vector Space by importing the hypernymy-hyponymy relation between synonymy sets and the Concept Chain in the WordNet. Experimental results on several data sets show that the proposed approach, conception built from Wordnet, can achieve significant improvements with respect to the baseline algorithm.

Keywords

Text classification document representation Wordnet conception based VSM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Yang, Y., Lin, X.: A re-examination of text categorization methods. SIGIR, 42–49 (1999)Google Scholar
  2. 2.
    Han, E., Karypis, G.: Centroid-Based Document Classification Analysis & Experimental Result. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    McCallum, A., Nigam, K.: A Comparison of Event Models for Naïve Bayes Text Classification. In: AAAI/ICML, Workshop on Learning for Text Categorization (1998)Google Scholar
  4. 4.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  5. 5.
    Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of the 19th international joint conference on artificial intelligence, IJCAI 2005 (2005)Google Scholar
  6. 6.
    Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21st AAAI conference on artificial intelligence, AAAI 2006 (2006)Google Scholar
  7. 7.
    Miller, G.: WordNet: a lexical database for english. Communications of the ACM (1995)Google Scholar
  8. 8.
    de Buenaga Rodriguez, M., Gomez Hidalgo, J.M., Agudo, B.D.: UsingWordNet to complement training information in text categorization. In: The 2nd international conference on recent advances in natural language processing, RANLP 1997 (1999)Google Scholar
  9. 9.
    Urena-Lopez, L.A., Buenaga, M., Gomez, J.M.: Integrating linguistic resources in TC through WSD. Comput. Hum. 35, 215–230 (2001)CrossRefGoogle Scholar
  10. 10.
    Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international World Wide Web conference, WWW 2003 (2003)Google Scholar
  11. 11.
    Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceedings of the semantic web workshop at SIGIR 2003 (2003)Google Scholar
  12. 12.
    Reuters-21578 text categorization test collection, Distribution 1.0. Reuters (1997), http://www.daviddlewis.com/resources/testcollections/reuters21578/
  13. 13.
    Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM-SIGIR conference on research and development in information retrieval (SIGIR 1994), pp. 192–201 (1994)Google Scholar
  14. 14.
    Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning (ICML 1995), pp. 331–339 (1995)Google Scholar
  15. 15.
    Joachims, T.: Text categorization with support vectormachines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Rujiang Bai
    • 1
  • Xiaoyue Wang
    • 1
  • Junhua Liao
    • 1
  1. 1.Shandong University of Technology Library ZiboChina

Personalised recommendations