WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems

  • Emrah InanEmail author
  • Oguz Dikenelli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10570)


Entity Linking is the task to annotate ambiguous mentions in an unstructured text to the referent entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is difficult to evaluate domain-specific Entity Linking approaches due to lack of evaluation datasets for specific domains. This study presents a tool called WeDGeM as a multilingual evaluation set generator for specific domains using Wikipedia and DBpedia. Wikipedia category pages and DBpedia taxonomy are used for adjusting domain-specific annotated text generation. Wikipedia disambiguation pages are applied to determine the ambiguity level of the generated texts. Based on these texts, a use case for well-known Entity Linking systems supporting English and Turkish texts are evaluated in the movie domain.


Entity linking Evaluation dataset DBpedia Wikipedia 


  1. 1.
    Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 249–260. ACM (2013)Google Scholar
  2. 2.
    Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: 2015 IEEE International Conference on Semantic Computing (ICSC), pp. 244–251. IEEE (2015)Google Scholar
  3. 3.
    Eisner, J. (ed.): EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 28–30 June 2007. ACL (2007).
  4. 4.
    Ellis, J., Getman, J., Mott, J., Li, X., Griffitt, K., Strassel, S., Wright, J.: Linguistic resources for 2013 knowledge base population evaluations. In: Proceedings of the Sixth Text Analysis Conference, TAC 2013, Gaithersburg, Maryland, USA, 18–19 November 2013 (2013)Google Scholar
  5. 5.
    Ernst, P., Siu, A., Weikum, G.: KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform. 16(1), 157 (2015)CrossRefGoogle Scholar
  6. 6.
    Hassanzadeh, O., Consens, M.P.: Linked movie data base. In: LDOW (2009)Google Scholar
  7. 7.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–466. ACM (2009)Google Scholar
  8. 8.
    Li, X., Strassel, S., Ji, H., Griffitt, K., Ellis, J.: Linguistic resources for entity linking evaluation: from monolingual to cross-lingual. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 23–25 May 2012, pp. 3098–3105 (2012).
  9. 9.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics 2011, NY, USA, pp. 1–8 (2011).
  10. 10.
    Mitchell, A., Strassel, S., Huang, S., Zakhary, R.: Ace 2004 multilingual training corpus. Linguist. Data Consortium 1, 1 (2005). PhiladelphiaGoogle Scholar
  11. 11.
    Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 2014 International Conference on Posters & #38; Demonstrations Track, ISWC-PD 2014, vol. 1272, pp. 25–28., Aachen, Germany (2014).
  12. 12.
    Navigli, R.: Babelnet and friends: a manifesto for multilingual semantic processing. Intelligenza Artificiale 7(2), 165–181 (2013). Scholar
  13. 13.
    Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Wikilinks: a large-scale cross-document coreference corpus labeled via links to Wikipedia. University of Massachusetts, Amherst, Technical report UM-CS-2012-015 (2012)Google Scholar
  14. 14.
    Spitkovsky, V.I., Chang, A.X.: A cross-lingual dictionary for English Wikipedia concepts. In: LREC, pp. 3168–3175 (2012)Google Scholar
  15. 15.
    Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)Google Scholar
  16. 16.
    Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW Conference (2015).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer EngineeringEge UniversityBornovaTurkey

Personalised recommendations