Skip to main content

Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

Abstract

Named Entity Disambiguation (NED) is the task of disambiguating named entities in a natural language text by linking them to their corresponding entities in a knowledge base such as DBpedia, which are already recognized. It is an important step in transforming unstructured text into structured knowledge. Previous work on this task has proven a strong impact of graph-based methods such as PageRank on entity disambiguation. Other approaches rely on distributional similarity between an article and the textual description of a candidate entity. However, the combined impact of these different feature groups has not been explored to a sufficient extent. In this paper, we present a novel approach that exploits an undirected probabilistic model to combine different types of features for named entity disambiguation. Capitalizing on Markov Chain Monte Carlo sampling, our model is capable of exploiting complementary strengths between both graph-based and textual features. We analyze the impact of these features and their combination on named entity disambiguation. In an evaluation on the GERBIL benchmark, our model compares favourably to the current state-of-the-art in 8 out of 14 data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/ag-sc/NED.

  2. 2.

    http://wiki.dbpedia.org/Downloads2015-04.

  3. 3.

    After re-normalizing the probabilities such that \(\sum _{s'_{ij}}\pi (s'_{ij}) = 1\).

  4. 4.

    Our experiments show that a greedy approach that always prefers the state with the highest probability works best.

  5. 5.

    http://wiki.dbpedia.org/Downloads2015-04.

  6. 6.

    Our results, GERBIL v1.2.2:

    http://gerbil.aksw.org/gerbil/experiment?id=201604290045.

  7. 7.

    State-of-the-art annotation systems’ results, GERBIL v1.2.2:

    http://gerbil.aksw.org/gerbil/experiment?id=201604270003.

References

  1. Alhelbawy, A., Gaizauskas, R.J.: Graph ranking for collective named entity disambiguation. In: Proceedings of ACL (Short Papers), Baltimore, MD, pp. 75–80 (2014)

    Google Scholar 

  2. Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 (2003)

    Article  MATH  Google Scholar 

  3. Barrena, A., Soroa, A., Agirre, E.: Combining mention context and hyperlinks from Wikipedia for named entity disambiguation. In: Proceedings of \(\star \)SEM, Denver, CO, pp. 101–105 (2015)

    Google Scholar 

  4. Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL, pp. 9–16 (2006)

    Google Scholar 

  6. Cano, A.E., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts: (# microposts2014) named entity extraction & linking challenge. In: CEUR Workshop Proceedings, vol. 1141, pp. 54–60 (2014)

    Google Scholar 

  7. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of SEMANTICS (2013)

    Google Scholar 

  8. Das Sarma, A., Molla, A.R., Pandurangan, G., Upfal, E.: Fast distributed pagerank computation. Theor. Comput. Sci. 561(Part B), 113–121 (2015). Special Issue on Distributed Computing and Networking

    Article  MathSciNet  MATH  Google Scholar 

  9. Ganea, O.E., Horlescu, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of WWW (2016)

    Google Scholar 

  10. Guo, Z., Barbosa, D.: Robust entity linking via random walks. In: Proceedings of CIKM, Shanghai, China, pp. 499–508 (2014)

    Google Scholar 

  11. Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Proceedings of the Workshop on Semantic Web Information Management (SWIM), pp. 1–7 (2012)

    Google Scholar 

  12. Haveliwala, T.H.: Topic-sensitive PageRank. In: Proceedings of WWW, pp. 517–526 (2002)

    Google Scholar 

  13. Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP, Edinburgh, Scotland, UK, pp. 782–792 (2011)

    Google Scholar 

  14. Houlsby, N., Ciaramita, M.: A scalable gibbs sampler for probabilistic entity linking. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 335–346. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  15. Jin, Y., Kcman, E., Wang, K., Loynd, R.: Entity linking at the tail: sparse signals, unknown entities and phrase models. In: Proceedings of WSDM (2014)

    Google Scholar 

  16. Klinger, R., Cimiano, P.: Joint and pipeline probabilistic models for fine-grained sentiment analysis: extracting aspects, subjective phrases and their relations. In: Proceedings of ICDMW, pp. 937–944 (2013)

    Google Scholar 

  17. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and sum product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  18. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 163(4), 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  19. Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., Lu, Y.: Entity linking for tweets. In: Proceedings of ACL, Sofia, Bulgaria, pp. 1304–1311 (2013)

    Google Scholar 

  20. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  21. Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)

    Google Scholar 

  22. Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367 (2014)

  23. Piccinno, F., Ferragina, P.: From TagME to WAT. A new entity annotator. In: Proceedings of ACM Workshop on Entity Recognition and Disambiguation, pp. 55–62 (2014)

    Google Scholar 

  24. Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Large-scale cross-document coreference using distributed inference and hierarchical models. Proc. ACL 1, 793–803 (2011)

    Google Scholar 

  25. Tristram, F., Walter, S., Cimiano, P., Unger, C.: Weasel. A machine learning based approach to entity linking combining different features. In: Proceedings of ISWC Workshop on NLP and DBpedia (2015)

    Google Scholar 

  26. Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014)

    Google Scholar 

  27. Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., et al.: GERBIL. General entity annotator benchmarking framework. In: Proceedings of WWW, pp. 1133–1143 (2015)

    Google Scholar 

  28. Waitelonis, J., Sack, H.: Named entity linking in #tweets with kea. In: Proceedings of 6th workshop on Making Sense of Microposts - Named Entity Recognition and Linking (NEEL) Challenge, at WWW2016 (2016)

    Google Scholar 

  29. Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank. Learning preferences from atomic gradients. In: NIPS Workshop on Advances in Ranking, pp. 1–5 (2009)

    Google Scholar 

  30. Zwicklbauer, S., Seifert, C., Granitzer, M.: DoSeR - a knowledge-base-agnostic framework for entity disambiguation using semantic embeddings. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 182–198. Springer, Heidelberg (2016). doi:10.1007/978-3-319-34129-3_12

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by the Cluster of Excellence Cognitive Interaction Technology ‘CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sherzod Hakimov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Hakimov, S., Horst, H.t., Jebbara, S., Hartung, M., Cimiano, P. (2016). Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49004-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49003-8

  • Online ISBN: 978-3-319-49004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics