A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models

Chen, Peixian; Chen, Zhourong; Zhang, Nevin L.

doi:10.1007/978-3-030-29765-7_22

A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models

Peixian Chen^10,11,
Zhourong Chen¹⁰ &
Nevin L. Zhang¹⁰

Conference paper
First Online: 04 September 2019

549 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11726))

Abstract

We propose a novel document generation process based on hierarchical latent tree models (HLTMs) learned from data. An HLTM has a layer of observed word variables at the bottom and multiple layers of latent variables on top. For each document, the generative process first samples values for the latent variables layer by layer via logic sampling, then draws relative frequencies for the words conditioned on the values of the latent variables, and finally generates words for the document using the relative word frequencies. The motivation for this work is to take word counts into consideration with HLTMs. In comparison with LDA-based hierarchical document generation processes, the new process achieves drastically better model fit with much fewer parameters. It also yields more meaningful topics and topic hierarchies. It is the new state-of-the-art for the hierarchical topic detection.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Ahmed, A., Xing, E.: Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: ICDM (2008)
Google Scholar
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: ICML (2009)
Google Scholar
Bartholomew, D.J., Knott, M.: Latent Variable Models and Factor Analysis, 2nd edn. Arnold, New York (1999)
MATH Google Scholar
Blei, D.M., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS (2004)
Google Scholar
Blei, D.M., Griffiths, T., Jordan, M.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)
Article MathSciNet Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML (2006)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. Roy. Stat. Soc. Seri. B (Stat. Method.) 71(3), 593–613 (2009)
Article MathSciNet Google Scholar
Chen, P., Chen, Z., Zhang, N.L.: A novel document generation process for topic detection based on hierarchical latent tree models. arXiv preprint arXiv:1712.04116 (2018)
Chen, P., Zhang, N.L., Liu, T., Poon, L.K., Chen, Z., Khawar, F.: Latent tree models for hierarchical topic detection. Artif. Intell. 250, 105–124 (2017)
Article MathSciNet Google Scholar
Chen, P., Zhang, N.L., Poon, L.K., Chen, Z.: Progressive EM for latent tree models and hierarchical topic detection. In: AAAI (2016)
Google Scholar
Chen, Z., Zhang, N.L., Yeung, D., Chen, P.: Sparse Boltzmann machines with structure learning as applied to text analysis. In: AAAI (2017)
Google Scholar
Jagarlamudi, J., Daumé III, H., Udupa, R.: Incorporating lexical priors into topic models. In: EACL (2012)
Google Scholar
Lafferty, J., Blei, D.M.: Correlated topic models. In: NIPS (2006)
Google Scholar
Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML (2006)
Google Scholar
Liu, T., Zhang, N.L., Chen, P.: Hierarchical latent tree analysis for topic detection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 256–272. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_17
Chapter Google Scholar
Lubke, G., Neale, M.C.: Distinguishing between latent classes and continuous factors: resolution by maximum likelihood? Multivariate Behav. Res. 41(4), 499–532 (2006). https://doi.org/10.1207/s15327906mbr4104_4. pMID: 26794916
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations Workshops (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Google Scholar
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: ICML (2007)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP (2011)
Google Scholar
Owen, A.B.: Monte carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen (2013)
Google Scholar
Paisley, J., Wang, C., Blei, D.M., Jordan, M., et al.: Nested hierarchical Dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 256–270 (2015)
Article Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Saint Paul (1988)
MATH Google Scholar
Sato, M.A., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12(2), 407–432 (2000)
Article Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML (2009)
Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD (2006)
Google Scholar

Download references

Acknowledgements

Research on this article was supported by Hong Kong Research Grants Council under grants 16202515.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Peixian Chen, Zhourong Chen & Nevin L. Zhang
Lenovo Machine Intelligence Center, Shenzhen, Hong Kong
Peixian Chen

Authors

Peixian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhourong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nevin L. Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nevin L. Zhang .

Editor information

Editors and Affiliations

Technische Universität Dortmund, Dortmund, Germany
Gabriele Kern-Isberner
Mathematical Institute of the Serbian Academy of Sciences and Arts, Belgrade, Serbia
Zoran Ognjanović

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, P., Chen, Z., Zhang, N.L. (2019). A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models. In: Kern-Isberner, G., Ognjanović, Z. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2019. Lecture Notes in Computer Science(), vol 11726. Springer, Cham. https://doi.org/10.1007/978-3-030-29765-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-29765-7_22
Published: 04 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29764-0
Online ISBN: 978-3-030-29765-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics