Abstract
Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR 3, 993–1022 (2003)
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)
Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: SIGKDD, pp. 457–466. ACM (2009)
Ferragina, P., Scaiella, U.: TagMe: On-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM, pp. 1625–1628. ACM (2010)
Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP-CoNLL, pp. 105–115. ACL (2012)
Mihalcea, R., Csomai, A.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM, pp. 233–242. ACM (2007)
Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: CIKM, pp. 509–518. ACM (2008)
Ratinov, L.A., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL, vol. 11, pp. 1375–1384 (2011)
Sil, A., Yates, A.: Re-ranking for joint named-entity recognition and linking. In: CIKM (2013)
Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: SIGKDD, pp. 680–686. ACM (2006)
Kim, H., Sun, Y., Hockenmaier, J., Han, J.: Etm: Entity topic models for mining documents associated with entities. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 349–358. IEEE (2012)
Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: SIGKDD, pp. 1037–1045. ACM (2011)
Sen, P.: Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st International Conference on World Wide Web, pp. 729–738. ACM (2012)
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: SIGKDD, pp. 569–577. ACM (2008)
Hansen, J.A., Ringger, E.K., Seppi, K.D.: Probabilistic explicit topic modeling using wikipedia. In: Gurevych, I., Biemann, C., Zesch, T. (eds.) GSCL. LNCS, vol. 8105, pp. 69–82. Springer, Heidelberg (2013)
Houlsby, N., Ciaramita, M.: Scalable probabilistic entity-topic modeling. arXiv preprint arXiv:1309.0337 (2013)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)
Teh, Y.W., Newman, D., Welling, M.: A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In: NIPS, vol. 19, p. 1353 (2007)
Mimno, D., Hoffman, M., Blei, D.: Sparse stochastic inference for latent dirichlet allocation. In: Langford, J., Pineau, J. (eds.) ICML, pp. 1599–1606. Omni Press, New York (2012)
Milne, D., Witten, I.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Houlsby, N., Ciaramita, M. (2014). A Scalable Gibbs Sampler for Probabilistic Entity Linking. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-06028-6_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06027-9
Online ISBN: 978-3-319-06028-6
eBook Packages: Computer ScienceComputer Science (R0)