Skip to main content
Log in

Sentence level topic models for associated topics extraction

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In LDA model, independence assumptions in the Dirichlet distribution of the topic proportions lead to the inability to model the connections between topics. Some researchers have attempted to break them and thus obtained more powerful topic models. Following this strategy, by using an association matrix to measure the association between latent topics, we develop an associated topic model (ATM), in which consecutive sentences are considered important and the topic assignments for words are jointly determined by the association matrix and the sentence level topic distributions, instead of the document-specific topic distributions only. This approach gives a more realistic modeling of latent topic connections where the presence of a topic may be connected with the presence of another. We derive a collapsed Gibbs sampling algorithm for inference and parameter estimation for the ATM. The experimental results demonstrate that the ATM gives a more practical interpretation and is capable of learning more associated topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Andrews, M., Vigliocco, G.: The hidden Markov topic model: a probabilistic model of semantic representation. Top. Cogn. Sci. 2(2), 101–113 (2010)

    Article  Google Scholar 

  2. Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via dirichlet forest priors. Intern. Conf. Machine Learn. 382(26), 25–32 (2009)

    Google Scholar 

  3. Andrzejewski, D., Zhu, X., Craven, M., Recht, B.: A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In: International Joint Conference on Artificial Intelligence, pp. 1171–1177 (2011)

  4. Bagheri, A.: Latent dirichlet Markov allocation. Thinklab University of Salford, Jong, F.D. (2013)

  5. Balikas, G., Amini, M.R., Clausel, M.: On a topic model for sentences. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 921–924 (2016)

  6. Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inf. Proces. Syst. 18, 147 (2006)

    Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Blei, D.M., Griffiths, T, L, Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. ACM (2010)

  9. Borcard, D., Gillet, F., Legendre, P.: Association Measures and Matrices. Springer, New York (2011)

    Book  Google Scholar 

  10. Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)

  11. Buntine, W., Jakulin, A.: Applying discrete pca in data analysis. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI Press, pp. 59–66 (2004)

  12. Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Leveraging multi-domain prior knowledge in topic models. In: International Joint Conference on Artificial Intelligence, pp. 2071–2077 (2013)

  13. Chong, W., Bo, T., Meek, C., Blei, D.M.: Markov topic models. In: International Conference on Artificial Intelligence and Statistics, pp. 583–590 (2009)

  14. Gelman, A.: Bayesian data analysis. Biometrics 52(3), 1160 (2000)

    Google Scholar 

  15. Gilks, W., Richardson, S., Spiegelhalter, D.: Markov chain monte carlo in practice, ser. Interdisciplinary statistics series (1996)

  16. Griffiths, T.: Gibbs sampling in the generative model of latent dirichlet allocation. Standford University (2002)

  17. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  18. Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Advances in Neural Information Processing Systems, pp. 537–544 (2004)

  19. Gruber, A., Weiss, Y., Rosen-Zvi, M.: Hidden topic Markov models. In: Proceedings of Artificial Intelligence & Statistics, vol. 2007, pp 163–170 (2007)

  20. Hennig, L., Strecker, T., Narr, S., De Luca, E.W., Albayrak, S.: Identifying sentence-level semantic content units with topic models. In: Database and Expert Systems Applications, pp. 59–63 (2010)

  21. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 50–57 (1999)

  22. Jagarlamudi, J., Hal Daume, I., Udupa, R.: Incorporating lexical priors into topic models. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 204–213 (2012)

  23. Lafferty, J.D.: A correlated topic model of science. Ann. Appl. Stat. 1(1), 17–35 (2007)

    Article  MathSciNet  Google Scholar 

  24. Li, W., Mccallum, A.: Pachinko allocation: dag-structured mixture models of topic correlations. In: International Conference on Machine Learning, pp. 577–584 (2006)

  25. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 2-4, 2010, Los Angeles, California, USA, pp. 100-108 (2010)

  26. Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with regularized topic models. In: International Conference on Neural Information Processing Systems, pp. 496–504 (2011)

  27. O’Callaghan, D., Greene, D., Carthy, J.: An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 42(13), 5645–5657 (2015)

    Article  Google Scholar 

  28. Passos, A., Wallach, H.M., Mccallum, A.: Correlations and anticorrelations in lda inference. University of Massachusetts - Amherst 37(5):548–555 (2011)

  29. Petterson, J., Buntine, W.L., Narayanamurthy, S.M., Caetano, T.S., Smola, A.J.: Word features for latent dirichlet allocation. In: Neural Information Processing Systems, vol. 2010, pp 1921–1929 (2010)

  30. Suh, S., Choi, S.: Two-dimensional correlated topic models. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2559–2563 (2016)

  31. Tian, F., Gao, B., He, D., Liu, T.: Sentence level recurrent topic model: letting topics speak for themselves. arXiv: learning (2016)

  32. Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, ACM, pp. 977–984 (2006)

  33. Wang, C., Fan, J., Kalyanpur, A., Gondek, D.: Relation extraction with relation topics. In: Conference on Empirical Methods in Natural Language Processing, pp. 1426–1436 (2011)

  34. Wang, X., McCallum, A.: A Note on Topical N-Grams. Tech. rep., DTIC Document (2005)

  35. Xie, P., Yang, D., Xing, E.P.: Incorporating word correlation knowledge into topic modeling. In: North american chapter of the association for computational linguistics, pp. 725–734 (2015)

  36. Zhang, Y., Xu, H.: Sltm: A sentence level topic model for analysis of online reviews. pp. 449–453 (2016)

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant No. 61672161, 61272480), and in part by the grants of Australian Research Council(DP170104747).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haixin Jiang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web and Big Data

Guest Editors: Junjie Yao, Bin Cui, Christian S. Jensen, and Zhe Zhao

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, H., Zhou, R., Zhang, L. et al. Sentence level topic models for associated topics extraction. World Wide Web 22, 2545–2560 (2019). https://doi.org/10.1007/s11280-018-0639-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0639-1

Keywords

Navigation