Skip to main content

A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11726))

Abstract

We propose a novel document generation process based on hierarchical latent tree models (HLTMs) learned from data. An HLTM has a layer of observed word variables at the bottom and multiple layers of latent variables on top. For each document, the generative process first samples values for the latent variables layer by layer via logic sampling, then draws relative frequencies for the words conditioned on the values of the latent variables, and finally generates words for the document using the relative word frequencies. The motivation for this work is to take word counts into consideration with HLTMs. In comparison with LDA-based hierarchical document generation processes, the new process achieves drastically better model fit with much fewer parameters. It also yields more meaningful topics and topic hierarchies. It is the new state-of-the-art for the hierarchical topic detection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    NIPS: http://www.cs.nyu.edu/~roweis/data.html, News: http://qwone.com/~jason/20Newsgroups/, NYT: http://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

  2. 2.

    github.com/kmpoon/hlta; github.com/blei-lab/hlda; www.columbia.edu/~jwp2128/code/nHDP.zip; www.arbylon.net/projects/knowceans-lda-cgen/Hpam2pGibbsSampler.java.

References

  1. Ahmed, A., Xing, E.: Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: ICDM (2008)

    Google Scholar 

  2. Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: ICML (2009)

    Google Scholar 

  3. Bartholomew, D.J., Knott, M.: Latent Variable Models and Factor Analysis, 2nd edn. Arnold, New York (1999)

    MATH  Google Scholar 

  4. Blei, D.M., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS (2004)

    Google Scholar 

  5. Blei, D.M., Griffiths, T., Jordan, M.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)

    Article  MathSciNet  Google Scholar 

  6. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML (2006)

    Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. Roy. Stat. Soc. Seri. B (Stat. Method.) 71(3), 593–613 (2009)

    Article  MathSciNet  Google Scholar 

  9. Chen, P., Chen, Z., Zhang, N.L.: A novel document generation process for topic detection based on hierarchical latent tree models. arXiv preprint arXiv:1712.04116 (2018)

  10. Chen, P., Zhang, N.L., Liu, T., Poon, L.K., Chen, Z., Khawar, F.: Latent tree models for hierarchical topic detection. Artif. Intell. 250, 105–124 (2017)

    Article  MathSciNet  Google Scholar 

  11. Chen, P., Zhang, N.L., Poon, L.K., Chen, Z.: Progressive EM for latent tree models and hierarchical topic detection. In: AAAI (2016)

    Google Scholar 

  12. Chen, Z., Zhang, N.L., Yeung, D., Chen, P.: Sparse Boltzmann machines with structure learning as applied to text analysis. In: AAAI (2017)

    Google Scholar 

  13. Jagarlamudi, J., Daumé III, H., Udupa, R.: Incorporating lexical priors into topic models. In: EACL (2012)

    Google Scholar 

  14. Lafferty, J., Blei, D.M.: Correlated topic models. In: NIPS (2006)

    Google Scholar 

  15. Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML (2006)

    Google Scholar 

  16. Liu, T., Zhang, N.L., Chen, P.: Hierarchical latent tree analysis for topic detection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 256–272. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_17

    Chapter  Google Scholar 

  17. Lubke, G., Neale, M.C.: Distinguishing between latent classes and continuous factors: resolution by maximum likelihood? Multivariate Behav. Res. 41(4), 499–532 (2006). https://doi.org/10.1207/s15327906mbr4104_4. pMID: 26794916

    Article  Google Scholar 

  18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations Workshops (2013)

    Google Scholar 

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)

    Google Scholar 

  20. Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: ICML (2007)

    Google Scholar 

  21. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP (2011)

    Google Scholar 

  22. Owen, A.B.: Monte carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen (2013)

    Google Scholar 

  23. Paisley, J., Wang, C., Blei, D.M., Jordan, M., et al.: Nested hierarchical Dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 256–270 (2015)

    Article  Google Scholar 

  24. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Saint Paul (1988)

    MATH  Google Scholar 

  25. Sato, M.A., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12(2), 407–432 (2000)

    Article  Google Scholar 

  26. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML (2009)

    Google Scholar 

  27. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD (2006)

    Google Scholar 

Download references

Acknowledgements

Research on this article was supported by Hong Kong Research Grants Council under grants 16202515.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nevin L. Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, P., Chen, Z., Zhang, N.L. (2019). A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models. In: Kern-Isberner, G., Ognjanović, Z. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2019. Lecture Notes in Computer Science(), vol 11726. Springer, Cham. https://doi.org/10.1007/978-3-030-29765-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29765-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29764-0

  • Online ISBN: 978-3-030-29765-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics