Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models

Hakimov, Sherzod; Horst, Hendrik ter; Jebbara, Soufian; Hartung, Matthias; Cimiano, Philipp

doi:10.1007/978-3-319-49004-5_19

Sherzod Hakimov¹⁷,
Hendrik ter Horst¹⁷,
Soufian Jebbara¹⁷,
Matthias Hartung¹⁷ &
…
Philipp Cimiano¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

European Knowledge Acquisition Workshop

2345 Accesses
7 Citations

Abstract

Named Entity Disambiguation (NED) is the task of disambiguating named entities in a natural language text by linking them to their corresponding entities in a knowledge base such as DBpedia, which are already recognized. It is an important step in transforming unstructured text into structured knowledge. Previous work on this task has proven a strong impact of graph-based methods such as PageRank on entity disambiguation. Other approaches rely on distributional similarity between an article and the textual description of a candidate entity. However, the combined impact of these different feature groups has not been explored to a sufficient extent. In this paper, we present a novel approach that exploits an undirected probabilistic model to combine different types of features for named entity disambiguation. Capitalizing on Markov Chain Monte Carlo sampling, our model is capable of exploiting complementary strengths between both graph-based and textual features. We analyze the impact of these features and their combination on named entity disambiguation. In an evaluation on the GERBIL benchmark, our model compares favourably to the current state-of-the-art in 8 out of 14 data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/ag-sc/NED.
2.
http://wiki.dbpedia.org/Downloads2015-04.
3.
After re-normalizing the probabilities such that \(\sum _{s'_{ij}}\pi (s'_{ij}) = 1\).
4.
Our experiments show that a greedy approach that always prefers the state with the highest probability works best.
5.
http://wiki.dbpedia.org/Downloads2015-04.
6.
Our results, GERBIL v1.2.2:
http://gerbil.aksw.org/gerbil/experiment?id=201604290045.
7.
State-of-the-art annotation systems’ results, GERBIL v1.2.2:
http://gerbil.aksw.org/gerbil/experiment?id=201604270003.

References

Alhelbawy, A., Gaizauskas, R.J.: Graph ranking for collective named entity disambiguation. In: Proceedings of ACL (Short Papers), Baltimore, MD, pp. 75–80 (2014)
Google Scholar
Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 (2003)
Article MATH Google Scholar
Barrena, A., Soroa, A., Agirre, E.: Combining mention context and hyperlinks from Wikipedia for named entity disambiguation. In: Proceedings of \(\star \)SEM, Denver, CO, pp. 101–105 (2015)
Google Scholar
Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL, pp. 9–16 (2006)
Google Scholar
Cano, A.E., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts: (# microposts2014) named entity extraction & linking challenge. In: CEUR Workshop Proceedings, vol. 1141, pp. 54–60 (2014)
Google Scholar
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of SEMANTICS (2013)
Google Scholar
Das Sarma, A., Molla, A.R., Pandurangan, G., Upfal, E.: Fast distributed pagerank computation. Theor. Comput. Sci. 561(Part B), 113–121 (2015). Special Issue on Distributed Computing and Networking
Article MathSciNet MATH Google Scholar
Ganea, O.E., Horlescu, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of WWW (2016)
Google Scholar
Guo, Z., Barbosa, D.: Robust entity linking via random walks. In: Proceedings of CIKM, Shanghai, China, pp. 499–508 (2014)
Google Scholar
Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Proceedings of the Workshop on Semantic Web Information Management (SWIM), pp. 1–7 (2012)
Google Scholar
Haveliwala, T.H.: Topic-sensitive PageRank. In: Proceedings of WWW, pp. 517–526 (2002)
Google Scholar
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP, Edinburgh, Scotland, UK, pp. 782–792 (2011)
Google Scholar
Houlsby, N., Ciaramita, M.: A scalable gibbs sampler for probabilistic entity linking. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 335–346. Springer, Heidelberg (2014)
Chapter Google Scholar
Jin, Y., Kcman, E., Wang, K., Loynd, R.: Entity linking at the tail: sparse signals, unknown entities and phrase models. In: Proceedings of WSDM (2014)
Google Scholar
Klinger, R., Cimiano, P.: Joint and pipeline probabilistic models for fine-grained sentiment analysis: extracting aspects, subjective phrases and their relations. In: Proceedings of ICDMW, pp. 937–944 (2013)
Google Scholar
Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and sum product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Article MathSciNet MATH Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 163(4), 707–710 (1966)
MathSciNet MATH Google Scholar
Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., Lu, Y.: Entity linking for tweets. In: Proceedings of ACL, Sofia, Bulgaria, pp. 1304–1311 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)
Google Scholar
Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367 (2014)
Piccinno, F., Ferragina, P.: From TagME to WAT. A new entity annotator. In: Proceedings of ACM Workshop on Entity Recognition and Disambiguation, pp. 55–62 (2014)
Google Scholar
Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Large-scale cross-document coreference using distributed inference and hierarchical models. Proc. ACL 1, 793–803 (2011)
Google Scholar
Tristram, F., Walter, S., Cimiano, P., Unger, C.: Weasel. A machine learning based approach to entity linking combining different features. In: Proceedings of ISWC Workshop on NLP and DBpedia (2015)
Google Scholar
Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014)
Google Scholar
Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., et al.: GERBIL. General entity annotator benchmarking framework. In: Proceedings of WWW, pp. 1133–1143 (2015)
Google Scholar
Waitelonis, J., Sack, H.: Named entity linking in #tweets with kea. In: Proceedings of 6th workshop on Making Sense of Microposts - Named Entity Recognition and Linking (NEEL) Challenge, at WWW2016 (2016)
Google Scholar
Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank. Learning preferences from atomic gradients. In: NIPS Workshop on Advances in Ranking, pp. 1–5 (2009)
Google Scholar
Zwicklbauer, S., Seifert, C., Granitzer, M.: DoSeR - a knowledge-base-agnostic framework for entity disambiguation using semantic embeddings. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 182–198. Springer, Heidelberg (2016). doi:10.1007/978-3-319-34129-3_12
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the Cluster of Excellence Cognitive Interaction Technology ‘CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Semantic Computing Group Cognitive Interaction Technology – Center of Excellence (CITEC), Bielefeld University, 33615, Bielefeld, Germany
Sherzod Hakimov, Hendrik ter Horst, Soufian Jebbara, Matthias Hartung & Philipp Cimiano

Authors

Sherzod Hakimov
View author publications
You can also search for this author in PubMed Google Scholar
Hendrik ter Horst
View author publications
You can also search for this author in PubMed Google Scholar
Soufian Jebbara
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Hartung
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Cimiano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sherzod Hakimov .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Eva Blomqvist
University of Bologna, Bologna, Italy
Paolo Ciancarini
University of Bologna, Bologna, Italy
Francesco Poggi
University of Bologna, Bologna, Italy
Fabio Vitali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hakimov, S., Horst, H.t., Jebbara, S., Hartung, M., Cimiano, P. (2016). Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-49004-5_19
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49003-8
Online ISBN: 978-3-319-49004-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics