Canonicalizing Knowledge Base Literals

  • Jiaoyan ChenEmail author
  • Ernesto Jiménez-Ruiz
  • Ian Horrocks
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11778)


Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability are limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching.


Knowledge base correction Literal canonicalization Knowledge-based learning Recurrent Neural Network 



The work is supported by the AIDA project, The Alan Turing Institute under the EPSRC grant EP/N510129/1, the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889), the Royal Society, EPSRC projects DBOnto, \(\text {MaSI}^{\text {3}}\) and \(\text {ED}^{\text {3}}\).


  1. 1.
    Abedjan, Z., Naumann, F.: Synonym analysis for predicate expansion. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 140–154. Springer, Heidelberg (2013). Scholar
  2. 2.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, A., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). Scholar
  3. 3.
    Auer, S., Lehmann, J., Hellmann, S.: LinkedGeoData: adding a spatial dimension to the web of data. In: Bernstein, A., et al. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 731–746. Springer, Heidelberg (2009). Scholar
  4. 4.
    Chen, J., Jimenez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. In: AAAI (2019)Google Scholar
  5. 5.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)Google Scholar
  6. 6.
    Debattista, J., Londoño, S., Lange, C., Auer, S.: Quality assessment of linked datasets using probabilistic approximation. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 221–236. Springer, Cham (2015). Scholar
  7. 7.
    Dimou, A., et al.: Assessing and refining mappingsto RDF to improve dataset quality. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 133–149. Springer, Cham (2015). Scholar
  8. 8.
    Dongo, I., Cardinale, Y., Al-Khalil, F., Chbeir, R.: Semantic web datatype inference: towards better RDF matching. In: Bouguettaya, A., et al. (eds.) WISE 2017. LNCS, vol. 10570, pp. 57–74. Springer, Cham (2017). Scholar
  9. 9.
    Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 260–277. Springer, Cham (2017). Scholar
  10. 10.
    Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 9(1), 77–129 (2018)CrossRefGoogle Scholar
  11. 11.
    Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). Scholar
  12. 12.
    Galárraga, L., Heitz, G., Murphy, K., Suchanek, F.M.: Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1679–1688 (2014)Google Scholar
  13. 13.
    Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012). Scholar
  14. 14.
    Gunaratna, K., Thirunarayan, K., Sheth, A., Cheng, G.: Gleaning types for literals in RDF triples with application to entity summarization. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 85–100. Springer, Cham (2016). Scholar
  15. 15.
    Kartsaklis, D., Pilehvar, M.T., Collier, N.: Mapping text to knowledge graph entities using multi-sense LSTMS. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1959–1970 (2018)Google Scholar
  16. 16.
    Kontokostas, D., et al.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)Google Scholar
  17. 17.
    Krompaß, D., Baier, S., Tresp, V.: Type-constrained representation learning in knowledge graphs. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 640–655. Springer, Cham (2015). Scholar
  18. 18.
    Luo, X., Luo, K., Chen, X., Zhu, K.Q.: Cross-lingual entity linking for web tables. In: AAAI, pp. 362–369 (2018)Google Scholar
  19. 19.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  20. 20.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  21. 21.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: - weaving chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011). Scholar
  22. 22.
    Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Seman. Web 8(3), 489–508 (2017)CrossRefGoogle Scholar
  23. 23.
    Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). Scholar
  24. 24.
    Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than just adding a cherry on top. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Cham (2015). Scholar
  25. 25.
    Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013). Scholar
  26. 26.
    Raad, J., Beek, W., van Harmelen, F., Pernelle, N., Saïs, F.: Detecting erroneous identity links on the web using network metrics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 391–407. Springer, Cham (2018). Scholar
  27. 27.
    Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Sleeman, J., Finin, T., Joshi, A.: Entity type recognition for heterogeneous semantic graphs. AI Mag. 36(1), 75–86 (2015)CrossRefGoogle Scholar
  29. 29.
    Vashishth, S., Jain, P., Talukdar, P.: CESI: canonicalizing open knowledge bases using embeddings and side information. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 1317–1327 (2018)Google Scholar
  30. 30.
    Wu, T.H., Wu, Z., Kao, B., Yin, P.: Towards practical open knowledge base canonicalization. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 883–892 (2018)Google Scholar
  31. 31.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiaoyan Chen
    • 1
    Email author
  • Ernesto Jiménez-Ruiz
    • 2
    • 3
  • Ian Horrocks
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of OxfordOxfordUK
  2. 2.The Alan Turing InstituteLondonUK
  3. 3.Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations