Advertisement

A Knowledge-Based Semantic Kernel for Text Classification

  • Jamal Abdul Nasir
  • Asim Karim
  • George Tsatsaronis
  • Iraklis Varlamis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7024)

Abstract

Typically, in textual document classification the documents are represented in the vector space using the “Bag of Words” (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a semantic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both semantic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers.

Keywords

Text Classification Thesaurus Semantic Kernels 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. JASIS 41(6), 391–407 (1990)CrossRefGoogle Scholar
  2. 2.
    Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys 41(2), 10:1–10:69 (2009)CrossRefGoogle Scholar
  3. 3.
    Basili, R., Cammisa, M., Moschitti, A.: A semantic kernel to exploit linguistic knowledge. In: Proc. of the AI*IA 2005, pp. 290–302 (2005)Google Scholar
  4. 4.
    Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M., Theobald, M., Weikum, G.: Word sense disambiguation for exploiting hierarchical thesauri in text classification. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 181–192. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1–39 (2010)zbMATHGoogle Scholar
  6. 6.
    Siolas, G., d’Alché-Buc, F.: Support vector machines based on a semantic kernel for text categorization. In: Proc. of IEEE IJCNN 2000, Washington, DC, USA (2000)Google Scholar
  7. 7.
    Bloehdorn, S., Basili, R., Cammisa, M., Moschitti, A.: Semantic kernels for text classification based on topological measures of feature similarity. In: Proc. of ICDM 2006, pp. 808–812 (2006)Google Scholar
  8. 8.
    Cristianini, N., Taylor, J.S., Lodhi, H.: Latent Semantic Kernels. In: Proc. of the Eighteenth International Conference on Machine Learning, pp. 66–73 (2001)Google Scholar
  9. 9.
    Basili, R., Cammisa, M., Moschitti, A.: A Semantic Kernel to classify texts with very few training examples. Informatica 30(2), 163–172 (2006)zbMATHGoogle Scholar
  10. 10.
    Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I.: Word sense disambiguation with spreading activation networks generated from thesauri. In: Proc. of IJCAI, pp. 1725–1730 (2007)Google Scholar
  11. 11.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jamal Abdul Nasir
    • 1
  • Asim Karim
    • 1
  • George Tsatsaronis
    • 2
  • Iraklis Varlamis
    • 3
  1. 1.School of Science and EngineeringLUMSPakistan
  2. 2.Biotechnology Center (BIOTEC)Technische Universität DresdenGermany
  3. 3.Department of Informatics and TelematicsHarokopio University of AthensGreece

Personalised recommendations