cluTM: Content and Link Integrated Topic Model on Heterogeneous Information Networks

  • Qian Wang
  • Zhaohui PengEmail author
  • Senzhang Wang
  • Philip S. Yu
  • Qingzhong Li
  • Xiaoguang Hong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)


Topic model is extensively studied to automatically discover the main themes that pervade a large and unstructured collection of documents. Traditional topic models assume the documents are independent and there are no correlations among them. However, in many real scenarios, a document may be interconnected with other documents and objects, and thus form a text related heterogeneous network, such as the DBLP bibliographic network. It is challenging for traditional topic models to capture the link information associated to diverse types of objects in such a network. To this end, we propose a unified Topic Model cluTM by incorporating both the document content and various links in the text related heterogeneous network. cluTM combines the textual documents and the link structures by the proposed joint matrix factorization on both the text matrix and link matrices. Joint matrix factorization can derive a common latent semantic space shared by multi-typed objects. With the multi-typed objects represented by the common latent features, the semantic information can be therefore largely enhanced simultaneously. Experimental results on DBLP datasets demonstrate the effectiveness of cluTM in both topic mining and multiple objects clustering in text related heterogeneous networks by comparing against state-of-the-art baselines.


Information Network Heterogeneous Network Topic Model Latent Dirichlet Allocation Textual Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: SIGIR, pp. 178–185 (2006)Google Scholar
  2. 2.
    Xie, P.T., Xing, E.P.: Integrating document clustering and topic modeling. In: UAI, pp. 694–703 (2013)Google Scholar
  3. 3.
    Sun, Y.Z., Han, J.W., Yan, X., Yu, P.S.: Mining knowledge from interconnected data: a heterogeneous information network analysis approach. In: VLDB Endowment, pp. 2022–2023 (2012)Google Scholar
  4. 4.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology, 391–407 (1990)Google Scholar
  5. 5.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research, 993–1022 (2003). ACM Press, New YorkGoogle Scholar
  7. 7.
    Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: CIKM, pp. 911–920 (2008)Google Scholar
  8. 8.
    Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: WWW, pp. 101–110 (2008)Google Scholar
  9. 9.
    Sun, Y.Z., Han, J.W., Gao, J., Yu, Y.T.: Itopicmodel: information network-integrated topic modeling. In: ICDM, pp. 493–502 (2009)Google Scholar
  10. 10.
    Wang, Q., Peng, Z., Jiang, F., Li, Q.: LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 13–24. Springer, Heidelberg (2013) Google Scholar
  11. 11.
    Deng, H., Han, J., Zhao, B., Yu, Y.: Probabilistic topic models with biased propagation on heterogeneous information networks. In: KDD, pp. 1271–1279 (2011)Google Scholar
  12. 12.
    Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematic, 403–420 (1970). Springer, HeidelbergGoogle Scholar
  13. 13.
    Yin, Z., Cao, L., Han, J., Zhai, C.: Geographical topic discovery and comparison. In: WWW, pp. 247–256 (2011)Google Scholar
  14. 14.
    Zhu, S., Yu, K., Chi, Y., Gong, Y.: Combining content and link for classification using matrix factorization. In: SIGIR, pp. 487–494 (2007)Google Scholar
  15. 15.
    Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.L.: Probabilistic author-topic models for information discovery. In: KDD, pp. 306–315 (2004)Google Scholar
  16. 16.
    Tang, J., Zhang, R.M., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: ICDM, pp. 1055–1060 (2008)Google Scholar
  17. 17.
    Deng, H., Lyu, M.R., King, I.: A generalized Co-HITS algorithm and its application to bipartite graphs. In: KDD, pp. 239–248 (2009)Google Scholar
  18. 18.
    Wang, S.Z., Hu, X., Yu, P.S., Li, Z.J.: MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: KDD, pp. 1246–1255 (2014)Google Scholar
  19. 19.
    Deng, H., Lyu, M.R., King, I.: Effective latent space graph-based re-ranking model with global consistency. In: WSDM, pp. 212–221 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Qian Wang
    • 1
  • Zhaohui Peng
    • 1
    Email author
  • Senzhang Wang
    • 2
  • Philip S. Yu
    • 3
    • 4
  • Qingzhong Li
    • 1
  • Xiaoguang Hong
    • 1
  1. 1.School of Computer Science and TechnologyShandong UniversityJinanChina
  2. 2.School of Computer Science and EngineeringBeihang UniversityBeijingChina
  3. 3.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA
  4. 4.Institute for Data ScienceTsinghua UniversityBeijingChina

Personalised recommendations