A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis
Computing the probability of unseen documents is a natural evaluation task in topic modeling. Previous work has addressed this problem for the well-known Latent Dirichlet Allocation (LDA) model. However, the same problem for a more general class of topic models, referred here to as Gamma-Poisson Factor Analysis (GaP-FA), remains unexplored, which hampers a fair comparison between models. Recent findings on the exact marginal likelihood of GaP-FA enable the derivation of a closed-form expression. In this paper, we show that its exact computation grows exponentially with the number of topics and non-zero words in a document, thus being only solvable for relatively small models and short documents. Experimentation in various corpus also indicates that existing methods in the literature are unlikely to accurately estimate this probability. With that in mind, we propose L2R, a left-to-right sequential sampler that decomposes the document probability into a product of conditionals and estimates them separately. We then proceed by confirming that our estimator converges and is unbiased for both small and large collections. Code related to this paper is available at: https://github.com/jcapde/L2R, https://doi.org/10.7910/DVN/GDTAAC.
KeywordsTopic models Gamma-Poisson Factor Analysis Left-to-right Importance Sampling Estimation methods
This work was supported in part by Obra Social “LaCaixa”, by the Australian Research Council under award DE170100037, by the SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), by the Severo Ochoa Program SEV2015-0493 and by the the Spanish Ministry of Economy and Competitivity (MINECO) and the European Regional Development Fund (ERDF) under contracts TIN2015-65316 and Collectiveware TIN2015-66863-C2-1-R (MINECO/FEDER).
- 4.Buntine, W.L.: Estimating likelihoods for topic models. ACML 9, 51–64 (2009)Google Scholar
- 5.Canny, J.: GaP: a factor model for discrete data. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 122–129. ACM (2004)Google Scholar
- 6.Filstroff, L., Lumbreras, A., Févotte, C.: Closed-form marginal likelihood in gamma-Poisson factorization. arXiv preprint arXiv:1801.01799 (2018)
- 7.Gopalan, P., Ruiz, F.J., Ranganath, R., Blei, D.: Bayesian nonparametric Poisson factorization for recommendation systems. In: Artificial Intelligence and Statistics, pp. 275–283 (2014)Google Scholar
- 9.Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)Google Scholar
- 10.Murray, I., Salakhutdinov, R.R.: Evaluating probabilities under high-dimensional latent variable models. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2009)Google Scholar
- 14.Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)Google Scholar
- 15.Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1105–1112. ACM (2009)Google Scholar
- 16.Zhao, H., Du, L., Buntine, W.: Leveraging node attributes for incomplete relational data. In: International Conference on Machine Learning, pp. 4072–4081 (2017)Google Scholar
- 17.Zhou, M., Hannah, L., Dunson, D.B., Carin, L.: Beta-negative binomial process and Poisson factor analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 1462–1471 (2012)Google Scholar