Semi-supervised Clustering on Heterogeneous Information Networks

  • Chen Luo
  • Wei Pang
  • Zhe Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8444)

Abstract

Semi-supervised clustering on information networks combines both the labeled and unlabeled data sets with an aim to improve the clustering performance. However, the existing semi-supervised clustering methods are all designed for homogeneous networks and do not deal with heterogeneous ones. In this work, we propose a semi-supervised clustering approach to analyze heterogeneous information networks, which include multi-typed objects and links and may contain more useful semantic information. The major challenge in the clustering task here is how to handle multi-relations and diverse semantic meanings in heterogeneous networks. In order to deal with this challenge, we introduce the concept of relation-path to measure the similarity between two data objects of the same type. Thereafter, we make use of the labeled information to extract different weights for all relation-paths. Finally, we propose SemiRPClus, a complete framework for semi-supervised learning in heterogeneous networks. Experimental results demonstrate the distinct advantages in effectiveness and efficiency of our framework in comparison with the baseline and some state-of-the-art approaches.

Keywords

Heterogeneous information network Semi-supervised clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fortunato, S.: Community detection in graphs. Physics Reports 486(3), 75–174 (2010)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Lipka, N., Stein, B., Anderka, M.: Cluster-based one-class ensemble for classification problems in information retrieval. In: SIGIR 2012, pp. 1041–1042. ACM (2012)Google Scholar
  3. 3.
    Pham, M.C., Cao, Y., et al.: A clustering approach for collaborative filtering recommendation using social network analysis. J. UCS 17(4), 583–604 (2011)Google Scholar
  4. 4.
    Sun, Y., Han, J., Zhao, P., et al.: Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: ICDT 2009, pp. 565–576. ACM (2009)Google Scholar
  5. 5.
    Zhu, X.: Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison 2, 3 (2006)Google Scholar
  6. 6.
    Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: ICML, vol. 2, pp. 27–34 (2002)Google Scholar
  7. 7.
    Zhou, D., Bousquet, O., Lal, T.N., et al.: Learning with local and global consistency. Advances in Neural Information Processing Systems 16(16), 321–328 (2004)Google Scholar
  8. 8.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, p. 11. ACM (2004)Google Scholar
  9. 9.
    Sun, Y.E.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: KDD 2012, pp. 1348–1356. ACM (2012)Google Scholar
  10. 10.
    Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)Google Scholar
  11. 11.
    Shi, C., Kong, X., Yu, P.S., Xie, S., Wu, B.: Relevance search in heterogeneous networks. In: ICDT 2012, pp. 180–191. ACM (2012)Google Scholar
  12. 12.
    Sun, Y., Barber, R., Gupta, M., et al.: Co-author relationship prediction in heterogeneous bibliographic networks. In: ASONAM 2011, pp. 121–128. IEEE (2011)Google Scholar
  13. 13.
    Lü, L., Zhou, T.: Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
  14. 14.
    Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to linear regression analysis, vol. 821. Wiley (2012)Google Scholar
  15. 15.
    Cai, D., Shao, Z., He, X., Yan, X., Han, J.: Mining hidden community in heterogeneous social networks. In: LinkKDD, pp. 58–65. ACM (2005)Google Scholar
  16. 16.
    Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied logistic regression. Wiley. com (2013)Google Scholar
  17. 17.
    Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)Google Scholar
  18. 18.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)CrossRefMATHMathSciNetGoogle Scholar
  19. 19.
    Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. The Computer Journal 26(4), 354–359 (1983)CrossRefMATHGoogle Scholar
  20. 20.
    Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS (LNAI), vol. 6321, pp. 570–586. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Chen Luo
    • 1
  • Wei Pang
    • 2
  • Zhe Wang
    • 1
  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.School of Natural and Computing SciencesUniversity of AberdeenAberdeenUK

Personalised recommendations