Human Perception of Enriched Topic Models

Lukasiewicz, Wojciech; Todor, Alexandru; Paschke, Adrian

doi:10.1007/978-3-319-93931-5_2

Human Perception of Enriched Topic Models

Wojciech Lukasiewicz⁸,
Alexandru Todor⁸ &
Adrian Paschke⁸

Conference paper
First Online: 16 June 2018

2888 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 320))

Abstract

Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an unsupervised manner. Traditionally, applications of topic modeling over textual data use the bag-of-words model, i.e. only consider words in the documents. In our previous work we developed a framework for mining enriched topic models. We proposed a bag-of-features approach, where a document consists not only of words but also of linked named entities and their related information, such as types or categories.

In this work we focused on the feature engineering and selection aspects of enriched topic modeling and evaluated the results based on two measures for assessing the understandability of estimated topics for humans: model precision and topic log odds. In our 10-model experimental setup with 7 pure resource-, 2 hybrid words/resource- and one word-based model, the traditional bag-of-words models were outperformed by 5 pure resource-based models in both measures. These results show that incorporating background knowledge into topic models makes them more understandable for humans.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
Google Scholar
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. IJCAI 5, 1048–1053 (2005)
Google Scholar
Garla, V.N., Brandt, C.: Ontology-guided feature engineering for clinical text classification. J. Biomed. Inf. 45(5), 992–998 (2012)
Article Google Scholar
Hoffman, M., Blei, D.M., Bach, F.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 23, pp. 856–864 (2010)
Google Scholar
Hu, Z., Luo, G., Sachan, M., Xing, E., Nie, Z.: Grounding topic models with knowledge bases (2016)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)
Google Scholar
Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 680–686. ACM (2006)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Google Scholar
Pinoli, P., Chicco, D., Masseroli, M.: Latent Dirichlet allocation based on Gibbs sampling for gene function prediction. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)
Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. ICML 99, 379–388 (1999)
Google Scholar
Todor, A., Lukasiewicz, W., Athan, T., Paschke, A.: Enriching topic models with DBpedia. In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 735–751. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48472-3_46
Chapter Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1105–1112. ACM (2009)
Google Scholar
Zong, W., Feng, W., Chu, L.-K., Sculli, D.: A discriminative and semantic feature selection method for text categorization. Int. J. Prod. Econ. 165, 215–222 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

AG Corporate Semantic Web, Institute for Computer Science, Freie Universität Berlin, 14195, Berlin, Germany
Wojciech Lukasiewicz, Alexandru Todor & Adrian Paschke

Authors

Wojciech Lukasiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Todor
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Paschke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandru Todor .

Editor information

Editors and Affiliations

Poznan University of Economics and Business, Poznan, Poland
Witold Abramowicz
Fraunhofer FOKUS, Berlin, Germany
Adrian Paschke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lukasiewicz, W., Todor, A., Paschke, A. (2018). Human Perception of Enriched Topic Models. In: Abramowicz, W., Paschke, A. (eds) Business Information Systems. BIS 2018. Lecture Notes in Business Information Processing, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-319-93931-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-93931-5_2
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93930-8
Online ISBN: 978-3-319-93931-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics