Advertisement

Individual Link Model for Text Classification

  • Nam Do-Hoang Le
  • Thai-Son Tran
  • Minh-Triet Tran
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 297)

Abstract

Standard supervised learning approaches apply the decision model to the new document to be classified in a “context-free” manner. To exploit the potential information in the relationship between the items, there are link-based methods which model the relationship as a graph and exploit further information for classification from the neighbourhood. These link-based methods learn from the collective information of the whole neighbourhood. Rather than taking the whole neighbourhood into consideration, the author proposes a model to calculate link certainty to evaluate the influence of individual link into the classification process. The link certainty is later combined with the content-only model to form a complex model named individual-link model. This new approach reduces the effects of loss of information. The author systematically evaluates new model on standard data sets to compare with traditional content-only methods and other link-based method to see the improvement of the new model. Besides the promising results when working on the graph of documents, the individual link model can be applied on the general graph of objects in the future.

Keywords

document categorization logistic regression graphical model bibliography networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 485–492 (2006)Google Scholar
  2. 2.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. SIGMOD Rec. (1998)Google Scholar
  3. 3.
    Couto, T., Ziviani, N., Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Brandão, W.C.: Classifying documents with link-based bibliometric measures. Inf. Retr. (2010)Google Scholar
  4. 4.
    de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Masegosa, A.R., Romero, A.E.: Link-based text classification using bayesian networks. In: INEX, pp. 397–406 (2009)Google Scholar
  5. 5.
    Dean, J., Henzinger, M.R.: Finding related pages in the world wide web. Computer Networks (1999)Google Scholar
  6. 6.
    Hosmer, D.W., Lemeshow, S.: Applied logistic regression. Wiley (2000)Google Scholar
  7. 7.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Kim, S.-B., Han, K.-S., Rim, H.-C., Myaeng, S.H.: Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering (2006)Google Scholar
  9. 9.
    Lu, Q., Getoor, L.: Link-based classification. In: International Conference on Machine Learning (2003)Google Scholar
  10. 10.
    McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Inf. Retr. (2000)Google Scholar
  11. 11.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)Google Scholar
  12. 12.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization (1998)Google Scholar
  13. 13.
    Wiener, E.J., Pedersen, J., Weigend, A.: A neural network approach to topic spotting. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (1995)Google Scholar
  14. 14.
    Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Inf. Retr. (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nam Do-Hoang Le
    • 1
  • Thai-Son Tran
    • 1
  • Minh-Triet Tran
    • 1
  1. 1.University of ScienceHo Chi Minh cityVietNam

Personalised recommendations