HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks

  • Chen Luo
  • Renchu Guan
  • Zhe Wang
  • Chenghua Lin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)

Abstract

Transductive classification (TC) using a small labeled data to help classifying all the unlabeled data in information networks. It is an important data mining task on information networks. Various classification methods have been proposed for this task. However, most of these methods are proposed for homogeneous networks but not for heterogeneous ones, which include multi-typed objects and relations and may contain more useful semantic information. In this paper, we firstly use the concept of meta path to represent the different relation paths in heterogeneous networks and propose a novel meta path selection model. Then we extend the transductive classification problem to heterogeneous information networks and propose a novel algorithm, named HetPathMine. The experimental results show that: (1) HetPathMine can get higher accuracy than the existing transductive classification methods and (2) the weight obtained by HetPathMine for each meta path is consistent with human intuition or real-world situations.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sun, Y., Han, J.: Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery 3(2), 1–159 (2012)CrossRefGoogle Scholar
  2. 2.
    Gao, J., Liang, F.E.: On community outliers and their efficient detection in information networks. In: KDD 2010, pp. 813–822. ACM (2010)Google Scholar
  3. 3.
    Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: UAI 2002, pp. 485–492. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  4. 4.
    Castells, M.: The rise of the network society: The information age: Economy, society, and culture, vol. 1 (2011), Wiley.com
  5. 5.
    Even, S.: Graph algorithms. Cambridge University Press (2011)Google Scholar
  6. 6.
    Zhou, D., Bousquet, O.E.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16(16), pp. 321–328 (2004)Google Scholar
  7. 7.
    Wu, M., Schölkopf, B.: Transductive classification via local learning regularization. In: AISTATS 2007, pp. 628–635 (2007)Google Scholar
  8. 8.
    Sun, Y., Han, J.E.: Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: EDBT 2009, pp. 565–576. ACM (2009)Google Scholar
  9. 9.
    Macskassy, S.A., Provost, F.: A simple relational classifier. Technical report, DTIC Document (2003)Google Scholar
  10. 10.
    Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 570–586. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)Google Scholar
  12. 12.
    Getoor, L., Taskar, B.: Introduction to statistical relational learning. The MIT Press (2007)Google Scholar
  13. 13.
    La Fond, T., Neville, J.: Randomization tests for distinguishing social influence and homophily effects. In: WWW 2010, pp. 601–610. ACM (2010)Google Scholar
  14. 14.
    Sun, Y., Norick, B., Han, J., Yan, X., Yu, P.S., Yu, X.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: KDD 2012, pp. 1348–1356. ACM (2012)Google Scholar
  15. 15.
    Cai, D., Shao, Z., He, X., Yan, X., Han, J.: Mining hidden community in heterogeneous social networks. In: LinkKDD 2005, pp. 58–65. ACM (2005)Google Scholar
  16. 16.
    Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to linear regression analysis, vol. 821. Wiley (2012)Google Scholar
  17. 17.
    Mintz, M.E.: Distant supervision for relation extraction without labeled data. In: ACL 2009, pp. 1003–1011. Association for Computational Linguistics (2009)Google Scholar
  18. 18.
    Nguyen, T.V.T., Moschitti, A., Riccardi, G.: Convolution kernels on constituent, dependency and sequential structures for relation extraction. In: EMNLP 2009, pp. 1378–1387. Association for Computational Linguistics (2009)Google Scholar
  19. 19.
    Sun, Y.E.: Co-author relationship prediction in heterogeneous bibliographic networks. In: ASONAM 2011, pp. 121–128. IEEE (2011)Google Scholar
  20. 20.
    Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research 8, 935–983 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Chen Luo
    • 1
  • Renchu Guan
    • 1
  • Zhe Wang
    • 1
  • Chenghua Lin
    • 2
  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.School of Natural and Computing SciencesUniversity of AberdeenUK

Personalised recommendations