Advertisement

Abstract

As we know, current classification methods are mostly based on the VSM (Vector Space Model), which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. We proposed a system that uses an integrated ontologies and Natural Language Processing techniques to index texts. Traditional Words matrix is replaced by Concepts based matrix. For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly.

Keywords

Text classification ontology RDF SVM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kehagias, A., et al.: A comparison of word- and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004)Google Scholar
  4. 4.
    Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society (FLAIRS), Orlando, Florida, USA. AAAI Press, Menlo Park (2000)Google Scholar
  5. 5.
    Voorhees, E.: Natural language processing and information retrieval. In: Pazienza, M.T. (ed.) Information Extraction: Towards Scalable, Adaptable Systems, pp. 32–48. Springer, New York (1999)Google Scholar
  6. 6.
    Fellbaum, C.: WordNet: An Electronic Lexical Database, Language, Speech, and Communication. MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Lenat, D.B., Guha, R.V.: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley Pub. Co., Reading (1989)Google Scholar
  8. 8.
    McGuinness, D.L., Fikes, R., Rice, J., Wilder, S.: An Environment for Merging and Testing Large Ontologies. In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado (April 2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Rujiang Bai
    • 1
  • Xiaoyue Wang
    • 1
  • Junhua Liao
    • 1
  1. 1.Shandong University of Technology Library ZiboChina

Personalised recommendations