Abstract
Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of documents. Recently, it was even shown that relations among documents such as hyper-links or citations allow one to share information between documents and in turn to improve topic generation. Although fully generative, in many situations we are actually not interested in predicting relations among documents. In this paper, we therefore present a Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations. On networks of scientific abstracts and of Wikipedia documents we show that this approach meets or exceeds the performance of several baseline topic models.
Chapter PDF
Similar content being viewed by others
Keywords
- Gaussian Process
- Topic Model
- Latent Dirichlet Allocation
- Neural Information Processing System
- Link Structure
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 1981–2014 (2008)
Bhattacharya, I., Getoor, L.: A latent dirichlet model for unsupervised entity resolution. In: Proceeding of SIAM Conference on Data Mining, SDM (2006)
Blei, D., Lafferty, J.: Topic models. In: Srivastava, A., Sahami, M. (eds.) Text Mining: Theory and Applications. Taylor & Francis, Abington (2009)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM, New York (2006)
Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Buntine, W., Jakulin, A.: Applying discrete pca in data analysis. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 59–66 (2004)
Chang, J., Blei, D.: Relational topic models for document networks. In: Proceeding of the International Conference on Artificial Intelligence and Statistics, AISTATS (2009)
Chu, W., Sindhwani, V., Ghahramani, Z., Keerthi, S.: Relational learning with gaussian processes. In: Neural Information Processing Systems (2006)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influence. In: Proceeding of the International Conference on Machine Learning, ICML (2007)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)
Gruber, A., Rosen-Zvi, M., Weiss, Y.: Latent topic models for hypertext. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, UAI (2008)
Hofmann, T.: Probabilistic latent semantic indexing. Research and Development in Information Retrieval, 50–57 (1999)
Tenenbaum, J., Sutskever, I., Salakhutdinov, R.: Modelling relational data using bayesian clustered tensor factorization. Neural Information Processing Systems (2009)
Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. In: Proc. 21st AAAI (2006)
Li, F.-F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceeding of IEEE CVPR (2005)
Li, W., Yeung, D., Zhang, Z.: Probabilistic relational pca. In: Neural Information Processing Systems (2009)
McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: Proceeding of the International Joint Conference on Artificial Intelligence, IJCAI (2005)
McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: Proceedings of International Joint Conference on Artificial Intelligence (2005)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceeding of the 17th International Conference on World Wide Web (2008)
Mimno, D., McCallum, A.: Expertise modeling for matching papers with reviewers. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD (2007)
Mimno, D., McCallum, A.: Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, UAI (2008)
Nallapati, R., Cohen, W.: Link-plsa-lda: A new unsupervised model for topics and influence of blogs. In: Proceedings of the International Conference on Weblogs and Social Media, ICWSM (2008)
Neumann, M., Kersting, K., Xu, Z., Schulz, D.: Stacked gaussian process learning. In: Kargupta, W.W.H. (ed.) Proceedings of the 9th IEEE International Conference on Data Mining (ICDM-09), Miami, FL, USA (December 6-9, 2009)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceeding of UAI (2004)
Silva, R., Chu, W., Ghahramani, Z.: Hidden common cause relations in relational learning. In: Neural Information Processing Systems (2007)
Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: Proc. 14th Intl. Conf. on Knowledge Discovery and Data Mining (2008)
Smola, A.J., Kondor, I.R.: Kernels and regularization on graphs. In: Annual Conference on Computational Learning Theory (2003)
Steyvers, M., Griffiths, T.L., Dennis, S.: Probabilistic inference in human semantic memory. Trends in Cognitive Science 10, 327–334 (2006)
Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008)
Xu, Z., Kersting, K., Tresp, V.: Multi-relational learning with gaussian processes. In: Boutilier, C. (ed.) Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI-09 (2009)
Xu, Z., Tresp, V., Yu, K., Kriegel, H.-P.: Infinite hidden relational models. In: Proc. 22nd UAI (2006)
Yu, K., Chu, W.: Gaussian process models for link analysis and transfer learning. In: Neural Information Processing Systems (2007)
Zhu, X., Kandola, J., Lafferty, J., Ghahramani, Z.: Graph kernels by spectral transforms. In: Chapelle, O., Schoelkopf, B., Zien, A. (eds.) Semi-Supervised Learning. MIT Press, Cambridge (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wahabzada, M., Xu, Z., Kersting, K. (2010). Topic Models Conditioned on Relations. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-15939-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)