Abstract
Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an unsupervised manner. Traditionally, applications of topic modeling over textual data use the bag-of-words model, i.e. only consider words in the documents. In our previous work we developed a framework for mining enriched topic models. We proposed a bag-of-features approach, where a document consists not only of words but also of linked named entities and their related information, such as types or categories.
In this work we focused on the feature engineering and selection aspects of enriched topic modeling and evaluated the results based on two measures for assessing the understandability of estimated topics for humans: model precision and topic log odds. In our 10-model experimental setup with 7 pure resource-, 2 hybrid words/resource- and one word-based model, the traditional bag-of-words models were outperformed by 5 pure resource-based models in both measures. These results show that incorporating background knowledge into topic models makes them more understandable for humans.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. IJCAI 5, 1048–1053 (2005)
Garla, V.N., Brandt, C.: Ontology-guided feature engineering for clinical text classification. J. Biomed. Inf. 45(5), 992–998 (2012)
Hoffman, M., Blei, D.M., Bach, F.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 23, pp. 856–864 (2010)
Hu, Z., Luo, G., Sachan, M., Xing, E., Nie, Z.: Grounding topic models with knowledge bases (2016)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)
Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 680–686. ACM (2006)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Pinoli, P., Chicco, D., Masseroli, M.: Latent Dirichlet allocation based on Gibbs sampling for gene function prediction. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)
Scott, S., Matwin, S.: Feature engineering for text classification. ICML 99, 379–388 (1999)
Todor, A., Lukasiewicz, W., Athan, T., Paschke, A.: Enriching topic models with DBpedia. In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 735–751. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48472-3_46
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1105–1112. ACM (2009)
Zong, W., Feng, W., Chu, L.-K., Sculli, D.: A discriminative and semantic feature selection method for text categorization. Int. J. Prod. Econ. 165, 215–222 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Lukasiewicz, W., Todor, A., Paschke, A. (2018). Human Perception of Enriched Topic Models. In: Abramowicz, W., Paschke, A. (eds) Business Information Systems. BIS 2018. Lecture Notes in Business Information Processing, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-319-93931-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-93931-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93930-8
Online ISBN: 978-3-319-93931-5
eBook Packages: Computer ScienceComputer Science (R0)