Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier
As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we introduce tensor space model for representing hypertext documents. We exploit the local-structure and neighborhood recommendation encapsulated in the proposed representation model. Instead of using the text on a page for representing features in a vector space model, we have used features on the page and neighborhood features to represent a hypertext document in a tensor space model. Tensor similarity measure is defined. We have demonstrated the use of rough set based ensemble classifier on proposed tensor space model. Experimental results of classification obtained by using our method outperform existing hypertext classification techniques.
KeywordsHypertext classification tensor space model rough ensemble classifier
- 3.Furnkranz, J.: Web mining. The Data Mining and Knowledge Discovery Handbook, pp. 899–920. Springer, Heidelberg (2005)Google Scholar
- 4.Saha, S., Murthy, C.A., Pal, S.K.: Tensor space model for hypertext representation. In: ICIT 2008: Proceedings of the 2008 International Conference on Information Technology, pp. 261–266. IEEE Computer Society, Los Alamitos (2008)Google Scholar
- 6.Cohen, W.: Improving a page classifier with anchor extraction and link analysis (2002)Google Scholar
- 8.Utard, H., Fürnkranz, J.: Link-local features for hypertext classification. In: Ackermann, M., Berendt, B., Grobelnik, M., Hotho, A., Mladenič, D., Semeraro, G., Spiliopoulou, M., Stumme, G., Svátek, V., van Someren, M. (eds.) EWMF 2005 and KDO 2005. LNCS (LNAI), vol. 4289, pp. 51–64. Springer, Heidelberg (2006)CrossRefGoogle Scholar