Abstract
Topic modeling is an important tool in the analysis of corpora and the classification and clustering of documents. Various extensions of the underlying graphical models have been proposed to address hierarchical or dynamical topics. However, despite their popularity, topic models face problems in the exploration and correlation of the (often unknown number of) topics extracted from a document collection, and rely on compute-intensive graphical models. In this paper, we present a novel framework for exploring evolving corpora of news articles in terms of topics covered over time. Our approach is based on implicit networks representing the cooccurrences of entities and terms in the documents as weighted edges. Edges with high weight between entities are indicative of topics, allowing the context of a topic to be explored incrementally by growing network sub-structures. Since the exploration of topics corresponds to local operations in the network, it is efficient and interactive. Adding new news articles to the collection simply updates the network, thus avoiding expensive recomputations of term and topic distributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The URLs of articles in our data, the extracted implicit network, and our program code are available at https://dbs.ifi.uni-heidelberg.de/resources/nwtopics/.
- 2.
- 3.
- 4.
- 5.
References
Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Inf. Retr. 15(1), 54–92 (2012)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS (2009)
Evert, S.: The statistics of word cooccurrences: word pairs and collocations. Ph.D. thesis, University of Stuttgart, Germany (2005)
Gretarsson, B., O’Donovan, J., Bostandjiev, S., Höllerer, T., Asuncion, A., Newman, D., Smyth, P.: TopicNets: visual analysis of large text corpora with topic modeling. ACM Trans. Intell. Syst. Technol. 3(2), 23:1–23:26 (2012)
Gries, S.T.: 50-something years of work on collocations. Int. J. Corpus Linguist. 18(1), 137–166 (2013)
Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP (2012)
Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: KDD (2011)
Hu, Y., Boyd-Graber, J., Satinoff, B., Smith, A.: Interactive topic modeling. Mach. Learn. 95(3), 423–469 (2014)
Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: KDD (2006)
Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: CIKM (2013)
Sarma, A.D., Jain, A., Yu, C.: Dynamic relationship and event discovery. In: WSDM (2011)
Shi, B., Lam, W., Jameel, S., Schockaert, S., Lai, K.P.: Jointly learning word embeddings and latent topics. In: SIGIR (2017)
Spitz, A., Almasian, S., Gertz, M.: EVELIN: exploration of event and entity links in implicit networks. In: WWW Companion (2017)
Spitz, A., Dixit, V., Richter, L., Gertz, M., Geiss, J.: State of the union: a data consumer’s perspective on Wikidata and its properties for the classification and resolution of entities. In: Wikipedia Workshop at ICWSM (2016)
Spitz, A., Gertz, M.: Terms over LOAD: leveraging named entities for cross-document extraction and summarization of events. In: SIGIR (2016)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
Zuo, Y., Zhao, J., Xu, K.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)
Acknowledgements
We would like to thank the Ambiverse Ambinauts for kindly providing access to their named entity linking and disambiguation API.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Spitz, A., Gertz, M. (2018). Entity-Centric Topic Extraction and Exploration: A Network-Based Approach. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)