Abstract
Topic model is extensively studied to automatically discover the main themes that pervade a large and unstructured collection of documents. Traditional topic models assume the documents are independent and there are no correlations among them. However, in many real scenarios, a document may be interconnected with other documents and objects, and thus form a text related heterogeneous network, such as the DBLP bibliographic network. It is challenging for traditional topic models to capture the link information associated to diverse types of objects in such a network. To this end, we propose a unified Topic Model cluTM by incorporating both the document content and various links in the text related heterogeneous network. cluTM combines the textual documents and the link structures by the proposed joint matrix factorization on both the text matrix and link matrices. Joint matrix factorization can derive a common latent semantic space shared by multi-typed objects. With the multi-typed objects represented by the common latent features, the semantic information can be therefore largely enhanced simultaneously. Experimental results on DBLP datasets demonstrate the effectiveness of cluTM in both topic mining and multiple objects clustering in text related heterogeneous networks by comparing against state-of-the-art baselines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: SIGIR, pp. 178–185 (2006)
Xie, P.T., Xing, E.P.: Integrating document clustering and topic modeling. In: UAI, pp. 694–703 (2013)
Sun, Y.Z., Han, J.W., Yan, X., Yu, P.S.: Mining knowledge from interconnected data: a heterogeneous information network analysis approach. In: VLDB Endowment, pp. 2022–2023 (2012)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology, 391–407 (1990)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research, 993–1022 (2003). ACM Press, New York
Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: CIKM, pp. 911–920 (2008)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: WWW, pp. 101–110 (2008)
Sun, Y.Z., Han, J.W., Gao, J., Yu, Y.T.: Itopicmodel: information network-integrated topic modeling. In: ICDM, pp. 493–502 (2009)
Wang, Q., Peng, Z., Jiang, F., Li, Q.: LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 13–24. Springer, Heidelberg (2013)
Deng, H., Han, J., Zhao, B., Yu, Y.: Probabilistic topic models with biased propagation on heterogeneous information networks. In: KDD, pp. 1271–1279 (2011)
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematic, 403–420 (1970). Springer, Heidelberg
Yin, Z., Cao, L., Han, J., Zhai, C.: Geographical topic discovery and comparison. In: WWW, pp. 247–256 (2011)
Zhu, S., Yu, K., Chi, Y., Gong, Y.: Combining content and link for classification using matrix factorization. In: SIGIR, pp. 487–494 (2007)
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.L.: Probabilistic author-topic models for information discovery. In: KDD, pp. 306–315 (2004)
Tang, J., Zhang, R.M., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: ICDM, pp. 1055–1060 (2008)
Deng, H., Lyu, M.R., King, I.: A generalized Co-HITS algorithm and its application to bipartite graphs. In: KDD, pp. 239–248 (2009)
Wang, S.Z., Hu, X., Yu, P.S., Li, Z.J.: MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: KDD, pp. 1246–1255 (2014)
Deng, H., Lyu, M.R., King, I.: Effective latent space graph-based re-ranking model with global consistency. In: WSDM, pp. 212–221 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Q., Peng, Z., Wang, S., Yu, P.S., Li, Q., Hong, X. (2015). cluTM: Content and Link Integrated Topic Model on Heterogeneous Information Networks. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-21042-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21041-4
Online ISBN: 978-3-319-21042-1
eBook Packages: Computer ScienceComputer Science (R0)