Skip to main content

Evaluating Thesaurus-Based Topic Models

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

  • 2444 Accesses

Abstract

In this paper, we study thesaurus-based topic models and evaluate them from the point of view of topic coherence. Thesaurus-based topic models enhance the scores of related terms found in the same text, which means that the model encourages these terms to be on the same topics. We evaluate various variants of such models. First, we carry out a manual evaluation of the obtained topics. Second, we study the possibility to use the collected manual data for evaluating new variants of thesaurus-based models, propose a method and select the best its parameters in cross-validation. Third, we apply the created evaluation method to estimate the influence of word frequencies on adding thesaurus relations for generating coherent topic models.

The work is supported by the Russian Foundation for Basic Research (project 16-29-09606).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://bigartm.org/.

  2. 2.

    https://golosislama.com/.

  3. 3.

    https://github.com/shen139/openwebspider/releases.

References

  1. Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet Forest priors. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 25–32. ACM (2009)

    Google Scholar 

  2. Blei, D.M.: Probabilistic topic models. Comm. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  3. Blei, D.M., Lafferty, J.D.: Visualizing topics with multi-word expressions. arXiv preprint arXiv:0907.1013 (2009)

  4. Boyd-Graber, J., Mimno, D., Newman, D.: Care and feeding of topic models: problems, diagnostics, and improvements. In: Handbook of Mixed Membership Models and Their Applications, pp. 225-255 (2014)

    Google Scholar 

  5. Broughton, V.: The need for a faceted classification as the basis of all methods of information retrieval. In: Aslib Proceedings, vol. 58, pp. 49–72. Emerald Group Publishing Limited (2006)

    Article  Google Scholar 

  6. Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 209–218. ACM (2013)

    Google Scholar 

  7. Gao, Y., Wen, D.: Semantic similarity-enhanced topic models for document analysis. In: Chang, M., Li, Y. (eds.) Smart Learning Environments. LNET, pp. 45–56. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44447-4_3

    Chapter  Google Scholar 

  8. Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation. Psychol. Rev. 114(2), 211 (2007)

    Article  Google Scholar 

  9. Lau, J.H., Baldwin, T., Newman, D.: On collocations and topic models. ACM Trans. Speech Lang. Process. (TSLP) 10(3), 10 (2013)

    Google Scholar 

  10. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)

    Google Scholar 

  11. Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 605–613. ACL (2010)

    Google Scholar 

  12. Leydesdorff, L., Rafols, I.: A global map of science based on the ISI subject categories. J. Assoc. Inf. Sci. Technol. 60(2), 348–362 (2009)

    Article  Google Scholar 

  13. Loukachevitch, N., Dobrov, B.: Ruthes linguistic ontology vs. Russian WordNets. In: Proceedings of Global WordNet Conference GWC-2014 (2014)

    Google Scholar 

  14. Loukachevitch, N., Nokel, M.: Adding thesaurus information into probabilistic topic models. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 210–218. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_24

    Chapter  Google Scholar 

  15. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)

    Google Scholar 

  16. Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with regularized topic models. In: Advances in Neural Information Processing Systems, pp. 496–504 (2011)

    Google Scholar 

  17. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)

    Google Scholar 

  18. Nokel, M., Loukachevitch, N.: Accounting ngrams and multi-word terms can improve topic models. In: ACL 2016, p. 44 (2016)

    Google Scholar 

  19. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408. ACM (2015)

    Google Scholar 

  20. Sievert, C., Shirley, K.E.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70 (2014)

    Google Scholar 

  21. Smith, A., Lee, T.Y., Poursabzi-Sangdeh, F., Boyd-Graber, J., Elmqvist, N., Findlater, L.: Evaluating visual representations for topic understanding and their effects on manually generated topic labels (2017)

    Google Scholar 

  22. Vorontsov, K.: Additive regularization for topic models of text collections. Dokl. Math. 89, 301–304 (2014)

    Article  MathSciNet  Google Scholar 

  23. Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)

    Google Scholar 

  24. Wang, X., McCallum, A., Wei, X.: Topical N-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702. IEEE (2007)

    Google Scholar 

  25. Xie, P., Yang, D., Xing, E.P.: Incorporating word correlation knowledge into topic modeling. In: HLT-NAACL, pp. 725–734 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Loukachevitch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Loukachevitch, N., Ivanov, K. (2018). Evaluating Thesaurus-Based Topic Models. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics