Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Vorontsov, Konstantin; Potapenko, Anna

doi:10.1007/978-3-319-12580-0_3

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Konstantin Vorontsov⁶ &
Anna Potapenko⁷

Conference paper
First Online: 01 January 2014

1453 Accesses
26 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 436))

Abstract

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article MathSciNet Google Scholar
Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statistical topic models for multi-label document classification. Mach. Learn. 88(1–2), 157–208 (2012)
Article MATH MathSciNet Google Scholar
Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Front. Comput. Sci. China 4(2), 280–301 (2010)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Shashanka, M., Raj, B., Smaragdis, P.: Sparse overcomplete latent variable decomposition of counts data. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, NIPS-2007, pp. 1313–1320. MIT Press, Cambridge (2008)
Google Scholar
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 1982–1989. Curran Associates Inc., New York (2009)
Google Scholar
Eisenstein, J., Ahmed, A., Xing, E.P.: Sparse additive generative models of text. In: ICML’11, pp. 1041–1048 (2011)
Google Scholar
Larsson, M.O., Ugander, J.: A concave regularization technique for sparse mixture models. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 24, pp. 1890–1898 (2011)
Google Scholar
Chien, J.T., Chang, Y.L.: Bayesian sparse topic model. J. Signal Process. Syst., 1–15 (2013)
Google Scholar
Khalifa, O., Corne, D.W., Chantler, M., Halley, F.: Multi-objective topic modeling. In: Purshouse, R.C., Fleming, P.J., Fonseca, C.M., Greco, S., Shaw, J. (eds.) EMO 2013. LNCS, vol. 7811, pp. 51–65. Springer, Heidelberg (2013)
Chapter Google Scholar
Vorontsov, K.V.: Additive regularization for topic models of text collections. Doklady Math. 88(3) (to appear, 2014)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM, New York (1999)
Google Scholar
Wang, Y.: Distributed Gibbs sampling of latent dirichlet allocation: The gritty details (2008)
Google Scholar
Teh, Y.W., Newman, D., Welling, M.: A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In: NIPS, pp. 1353–1360 (2006)
Google Scholar
Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, pp. 27–34 (2009)
Google Scholar
Varadarajan, J., Emonet, R., Odobez, J.M.: A sparsity constraint for topic models – application to temporal activity mining. In: NIPS-2010 Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions (2010)
Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling general and specific aspects of documents with a probabilistic topic model, vol. 19, pp. 241–248. MIT Press (2007)
Google Scholar
Potapenko, A., Vorontsov, K.: Robust PLSA performs better than LDA. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 784–787. Springer, Heidelberg (2013)
Chapter Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Article MATH MathSciNet Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)
Article MathSciNet Google Scholar
Tan, Y., Ou, Z.: Topic-weak-correlated latent dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)
Google Scholar
Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp. 233–240. ACM, New York (2007)
Google Scholar
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, JCDL ’10, pp. 215–224. ACM, New York (2010)
Google Scholar
Newman, D., Bonilla, E.V., Buntine, W.L.: Improving topic coherence with regularized topic models. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 24, pp. 496–504 (2011)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pp. 262–272. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Zhou, S., Li, K., Liu, Y.: Text categorization based on topic model. Int. J. Comput. Intell. Syst. 2(4), 398–409 (2009)
Article MathSciNet Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04, pp. 487–494. AUAI Press, Arlington (2004)
Google Scholar
Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z., Qu, H., Tong, X.: TextFlow: towards better understanding of evolving topics in text. IEEE Trans. Vis. Comput. Graph. 17(12), 2412–2421 (2011)
Article Google Scholar
Kataria, S., Mitra, P., Caragea, C., Giles, C.L.: Context sensitive topic models for author influence in document networks. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI’11, vol. 3, pp. 2274–2280. AAAI Press (2011)
Google Scholar
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448–456. ACM, New York (2011)
Google Scholar
Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 100–108. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~mccallum/bow

Download references

Acknowledgements

The work was supported by the Russian Foundation for Basic Research grants 14-07-00847, 14-07-00908. We thank Alexander Frey for his help and valuable discussion, and Vitaly Glushachenkov for his experimental work on model data.

Author information

Authors and Affiliations

The Higher School of Economics, Dorodnicyn Computing Centre of RAS, Moscow Institute of Physics and Technology, Moscow, Russia
Konstantin Vorontsov
Dorodnicyn Computing Centre of RAS, Moscow State University, Moscow, Russia
Anna Potapenko

Authors

Konstantin Vorontsov
View author publications
You can also search for this author in PubMed Google Scholar
Anna Potapenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konstantin Vorontsov .

Editor information

Editors and Affiliations

National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovsky Inst. of Math. and Mechanics, Yekaterinburg, Russia
Mikhail Yu. Khachay
Université catholique de Louvain, Moscow, Russia
Alexander Panchenko
University of Wolverhampton, Wolverhampton, United Kingdom
Natalia Konstantinova
National Research University, Moscow, Russia
Rostislav E. Yavorsky

Appendices

A The Karush–Kuhn–Tucker (KKT) Conditions

Consider the following nonlinear optimization problem:

$$ f(x) \rightarrow \max _x; \qquad g_i(x) \ge 0, i=1,\dots , m; \qquad h_j(x) = 0, j=1,\dots , k. $$

Suppose that the objective function ${f:\mathbb {R}^n \rightarrow \mathbb {R}}$ and the constraint functions ${g_i:\mathbb {R}^n \rightarrow \mathbb {R}}$ and ${h_j:\mathbb {R}^n \rightarrow \mathbb {R}}$ are continuously differentiable at a point $x^{*}$. If $x^{*}$ is a local maximum that satisfies some regularity conditions (which are always true if $g_i$ and $h_j$ are linear functions), then there exist constants $\mu _i$, ${i = 1,\ldots ,m}$ and $\lambda _j$, ${j = 1,\ldots ,k}$, called KKT multipliers, such that

$$\begin{aligned}&\frac{\partial }{\partial x} \biggl (f(x) + \sum _{i=1}^m \mu _i g_i(x) + \sum _{j=1}^k \lambda _j g_j(x) \biggr )=0;&\text { (stationarity)}\\&g_i(x) \ge 0; h_j(x) = 0;&\text { (primal feasibility)}\\&\mu _i \ge 0;&\text { (dual feasibility)}\\&\mu _ig_i(x) = 0.&\text { (complementary slackness)} \end{aligned}$$

B The Kullback–Leibler Divergence

The Kullback–Leibler divergence or relative entropy is a non-symmetric measure of the difference between probability distributions ${P = (p_i)_{i=1}^n}$ and ${Q = (q_i)_{i=1}^n}$:

$$ \mathop {\text {KL}}\nolimits (P \Vert Q) \equiv \mathop {\text {KL}}\nolimits _i(p_i \Vert q_i) = \sum _{i=1}^n p_i \ln \frac{p_i}{q_i}. $$

From the informational point of view, $\mathop {\text {KL}}\nolimits (P \Vert Q)$ is a measure of the information lost when $Q$ is used to approximate $P$. KL-divergence measures the expected number of extra bits required to code samples from $P$ when using a code based on $Q$, rather than using a code based on $P$. Typically $P$ represents the empirical distribution of data, $Q$ represents a model or approximation of $P$.

The KL-divergence is always non-negative.

${\mathop {\text {KL}}\nolimits (P \Vert Q) = 0}$ if and only if ${P=Q}$.

The KL-divergence minimization is equivalent to the likelihood maximization of a model distribution $Q(\alpha )$ over parameter vector $\alpha $:

$$ \mathop {\text {KL}}\nolimits (P \Vert Q(\alpha )) = \sum _{i=1}^n p_i \ln \frac{p_i}{q_i(\alpha )} \rightarrow \min _\alpha \quad \Longleftrightarrow \quad \sum _{i=1}^n p_i \ln q_i(\alpha ) \rightarrow \max _\alpha . $$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vorontsov, K., Potapenko, A. (2014). Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization. In: Ignatov, D., Khachay, M., Panchenko, A., Konstantinova, N., Yavorsky, R. (eds) Analysis of Images, Social Networks and Texts. AIST 2014. Communications in Computer and Information Science, vol 436. Springer, Cham. https://doi.org/10.1007/978-3-319-12580-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-12580-0_3
Published: 07 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12579-4
Online ISBN: 978-3-319-12580-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics