Skip to main content

Entity-Centric Topic Extraction and Exploration: A Network-Based Approach

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Included in the following conference series:

Abstract

Topic modeling is an important tool in the analysis of corpora and the classification and clustering of documents. Various extensions of the underlying graphical models have been proposed to address hierarchical or dynamical topics. However, despite their popularity, topic models face problems in the exploration and correlation of the (often unknown number of) topics extracted from a document collection, and rely on compute-intensive graphical models. In this paper, we present a novel framework for exploring evolving corpora of news articles in terms of topics covered over time. Our approach is based on implicit networks representing the cooccurrences of entities and terms in the documents as weighted edges. Edges with high weight between entities are indicative of topics, allowing the context of a topic to be explored incrementally by growing network sub-structures. Since the exploration of topics corresponds to local operations in the network, it is efficient and interactive. Adding new news articles to the collection simply updates the network, thus avoiding expensive recomputations of term and topic distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The URLs of articles in our data, the extracted implicit network, and our program code are available at https://dbs.ifi.uni-heidelberg.de/resources/nwtopics/.

  2. 2.

    https://www.ambiverse.com/.

  3. 3.

    http://snowballstem.org/.

  4. 4.

    https://cran.r-project.org/web/packages/tidytext/.

  5. 5.

    https://cran.r-project.org/web/packages/topicmodels/.

References

  1. Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Inf. Retr. 15(1), 54–92 (2012)

    Article  Google Scholar 

  2. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  3. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML (2006)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS (2009)

    Google Scholar 

  6. Evert, S.: The statistics of word cooccurrences: word pairs and collocations. Ph.D. thesis, University of Stuttgart, Germany (2005)

    Google Scholar 

  7. Gretarsson, B., O’Donovan, J., Bostandjiev, S., Höllerer, T., Asuncion, A., Newman, D., Smyth, P.: TopicNets: visual analysis of large text corpora with topic modeling. ACM Trans. Intell. Syst. Technol. 3(2), 23:1–23:26 (2012)

    Article  Google Scholar 

  8. Gries, S.T.: 50-something years of work on collocations. Int. J. Corpus Linguist. 18(1), 137–166 (2013)

    Article  Google Scholar 

  9. Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP (2012)

    Google Scholar 

  10. Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: KDD (2011)

    Google Scholar 

  11. Hu, Y., Boyd-Graber, J., Satinoff, B., Smith, A.: Interactive topic modeling. Mach. Learn. 95(3), 423–469 (2014)

    Article  MathSciNet  Google Scholar 

  12. Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: KDD (2006)

    Google Scholar 

  13. Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: CIKM (2013)

    Google Scholar 

  14. Sarma, A.D., Jain, A., Yu, C.: Dynamic relationship and event discovery. In: WSDM (2011)

    Google Scholar 

  15. Shi, B., Lam, W., Jameel, S., Schockaert, S., Lai, K.P.: Jointly learning word embeddings and latent topics. In: SIGIR (2017)

    Google Scholar 

  16. Spitz, A., Almasian, S., Gertz, M.: EVELIN: exploration of event and entity links in implicit networks. In: WWW Companion (2017)

    Google Scholar 

  17. Spitz, A., Dixit, V., Richter, L., Gertz, M., Geiss, J.: State of the union: a data consumer’s perspective on Wikidata and its properties for the classification and resolution of entities. In: Wikipedia Workshop at ICWSM (2016)

    Google Scholar 

  18. Spitz, A., Gertz, M.: Terms over LOAD: leveraging named entities for cross-document extraction and summarization of events. In: SIGIR (2016)

    Google Scholar 

  19. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)

    Google Scholar 

  20. Zuo, Y., Zhao, J., Xu, K.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Ambiverse Ambinauts for kindly providing access to their named entity linking and disambiguation API.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Spitz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spitz, A., Gertz, M. (2018). Entity-Centric Topic Extraction and Exploration: A Network-Based Approach. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76941-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76940-0

  • Online ISBN: 978-3-319-76941-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics