Abstract
Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have many variants such as NMF, PLSI and LDA, and are used in many fields such as genetics, text and the web, image analysis and recommender systems. However, only recently have reasonable methods for estimating the likelihood of unseen documents, for instance to perform testing or model comparison, become available. This paper explores a number of recent methods, and improves their theory, performance, and testing.
Keywords
- Independent Component Analysis
- Recommender System
- Topic Model
- Independent Component Analysis
- Proposal Distribution
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: SIGIR 2003, pp. 369–370 (2003)
Blei, D., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Buntine, W., Jakulin, A.: Applying discrete PCA in data analysis. In: UAI-2004, Banff, Canada (2004)
Buntine, W.L., Jakulin, A.: Discrete components analysis. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 1–33. Springer, Heidelberg (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Canny, J.: GaP: a factor model for discrete data. In: SIGIR 2004, pp. 122–129 (2004)
Carlin, B.P., Chib, S.: Bayesian model choice via MCMC. Journal of the Royal Statistical Society B 57, 473–484 (1995)
Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: NIPS, pp. 507–513 (2000)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: PNAS Colloquium (2004)
Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 537–544. MIT Press, Cambridge (2005)
Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)
Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML 2006: Proc. of the 23rd Int. Conf. on Machine learning, pp. 577–584. ACM, New York (2006)
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with Pachinko allocation. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 633–640. ACM, New York (2007)
Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proc. of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Las Vegas, pp. 542–550. ACM, New York (2008)
Pritchard, J.K., Stephens, M., Donnelly, P.J.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. of the 20th Annual Conf. on Uncertainty in Artificial Intelligence (UAI 2004), Arlington, Virginia, pp. 487–494. AUAI Press (2004)
Wallach, H.: Structured Topic Models for Language. PhD thesis, University of Cambridge (2008)
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, ICML 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buntine, W. (2009). Estimating Likelihoods for Topic Models. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-05224-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05223-1
Online ISBN: 978-3-642-05224-8
eBook Packages: Computer ScienceComputer Science (R0)