Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier

  • Suman Saha
  • C. A. Murthy
  • Sankar K. Pal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5909)

Abstract

As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we introduce tensor space model for representing hypertext documents. We exploit the local-structure and neighborhood recommendation encapsulated in the proposed representation model. Instead of using the text on a page for representing features in a vector space model, we have used features on the page and neighborhood features to represent a hypertext document in a tensor space model. Tensor similarity measure is defined. We have demonstrated the use of rough set based ensemble classifier on proposed tensor space model. Experimental results of classification obtained by using our method outperform existing hypertext classification techniques.

Keywords

Hypertext classification tensor space model rough ensemble classifier 

References

  1. 1.
    Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Intelligent Information Systems 18(2-3), 219–241 (2002)CrossRefGoogle Scholar
  2. 2.
    Saha, S., Murthy, C.A., Pal, S.K.: Rough set based ensemble classifier for web page classification. Fundamentae Informetica 76(1-2), 171–187 (2007)MathSciNetGoogle Scholar
  3. 3.
    Furnkranz, J.: Web mining. The Data Mining and Knowledge Discovery Handbook, pp. 899–920. Springer, Heidelberg (2005)Google Scholar
  4. 4.
    Saha, S., Murthy, C.A., Pal, S.K.: Tensor space model for hypertext representation. In: ICIT 2008: Proceedings of the 2008 International Conference on Information Technology, pp. 261–266. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  5. 5.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD 1998, pp. 307–318. ACM, New York (1998)CrossRefGoogle Scholar
  6. 6.
    Cohen, W.: Improving a page classifier with anchor extraction and link analysis (2002)Google Scholar
  7. 7.
    Kan, M.Y., Thi, H.O.N.: Fast webpage classification using url features. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 325–326. ACM, New York (2005)CrossRefGoogle Scholar
  8. 8.
    Utard, H., Fürnkranz, J.: Link-local features for hypertext classification. In: Ackermann, M., Berendt, B., Grobelnik, M., Hotho, A., Mladenič, D., Semeraro, G., Spiliopoulou, M., Stumme, G., Svátek, V., van Someren, M. (eds.) EWMF 2005 and KDO 2005. LNCS (LNAI), vol. 4289, pp. 51–64. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR 2006, pp. 485–492. ACM, New York (2006)CrossRefGoogle Scholar
  10. 10.
    Xu, Z., King, I., Lyu, M.R.: Web page classification with heterogeneous data fusion. In: WWW 2007, pp. 1171–1172. ACM, New York (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Suman Saha
    • 1
  • C. A. Murthy
    • 1
  • Sankar K. Pal
    • 1
  1. 1.Center for Soft Computing ResearchIndian Statistical Institute 

Personalised recommendations