Advertisement

Canonicalizing Knowledge Base Literals

  • Jiaoyan ChenEmail author
  • Ernesto Jiménez-Ruiz
  • Ian Horrocks
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11778)

Abstract

Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability are limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching.

Keywords

Knowledge base correction Literal canonicalization Knowledge-based learning Recurrent Neural Network 

Notes

Acknowledgments

The work is supported by the AIDA project, The Alan Turing Institute under the EPSRC grant EP/N510129/1, the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889), the Royal Society, EPSRC projects DBOnto, \(\text {MaSI}^{\text {3}}\) and \(\text {ED}^{\text {3}}\).

References

  1. 1.
    Abedjan, Z., Naumann, F.: Synonym analysis for predicate expansion. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 140–154. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38288-8_10CrossRefGoogle Scholar
  2. 2.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, A., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-76298-0_52CrossRefGoogle Scholar
  3. 3.
    Auer, S., Lehmann, J., Hellmann, S.: LinkedGeoData: adding a spatial dimension to the web of data. In: Bernstein, A., et al. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 731–746. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04930-9_46CrossRefGoogle Scholar
  4. 4.
    Chen, J., Jimenez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. In: AAAI (2019)Google Scholar
  5. 5.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)Google Scholar
  6. 6.
    Debattista, J., Londoño, S., Lange, C., Auer, S.: Quality assessment of linked datasets using probabilistic approximation. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 221–236. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18818-8_14CrossRefGoogle Scholar
  7. 7.
    Dimou, A., et al.: Assessing and refining mappingsto RDF to improve dataset quality. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 133–149. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25010-6_8CrossRefGoogle Scholar
  8. 8.
    Dongo, I., Cardinale, Y., Al-Khalil, F., Chbeir, R.: Semantic web datatype inference: towards better RDF matching. In: Bouguettaya, A., et al. (eds.) WISE 2017. LNCS, vol. 10570, pp. 57–74. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68786-5_5CrossRefGoogle Scholar
  9. 9.
    Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 260–277. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68288-4_16CrossRefGoogle Scholar
  10. 10.
    Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 9(1), 77–129 (2018)CrossRefGoogle Scholar
  11. 11.
    Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_23CrossRefGoogle Scholar
  12. 12.
    Galárraga, L., Heitz, G., Murphy, K., Suchanek, F.M.: Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1679–1688 (2014)Google Scholar
  13. 13.
    Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35176-1_5CrossRefGoogle Scholar
  14. 14.
    Gunaratna, K., Thirunarayan, K., Sheth, A., Cheng, G.: Gleaning types for literals in RDF triples with application to entity summarization. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 85–100. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-34129-3_6CrossRefGoogle Scholar
  15. 15.
    Kartsaklis, D., Pilehvar, M.T., Collier, N.: Mapping text to knowledge graph entities using multi-sense LSTMS. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1959–1970 (2018)Google Scholar
  16. 16.
    Kontokostas, D., et al.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)Google Scholar
  17. 17.
    Krompaß, D., Baier, S., Tresp, V.: Type-constrained representation learning in knowledge graphs. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 640–655. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_37CrossRefGoogle Scholar
  18. 18.
    Luo, X., Luo, K., Chen, X., Zhu, K.Q.: Cross-lingual entity linking for web tables. In: AAAI, pp. 362–369 (2018)Google Scholar
  19. 19.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  20. 20.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  21. 21.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25093-4_14CrossRefGoogle Scholar
  22. 22.
    Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Seman. Web 8(3), 489–508 (2017)CrossRefGoogle Scholar
  23. 23.
    Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_32CrossRefGoogle Scholar
  24. 24.
    Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than just adding a cherry on top. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_11CrossRefGoogle Scholar
  25. 25.
    Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_34CrossRefGoogle Scholar
  26. 26.
    Raad, J., Beek, W., van Harmelen, F., Pernelle, N., Saïs, F.: Detecting erroneous identity links on the web using network metrics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 391–407. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-00671-6_23CrossRefGoogle Scholar
  27. 27.
    Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Sleeman, J., Finin, T., Joshi, A.: Entity type recognition for heterogeneous semantic graphs. AI Mag. 36(1), 75–86 (2015)CrossRefGoogle Scholar
  29. 29.
    Vashishth, S., Jain, P., Talukdar, P.: CESI: canonicalizing open knowledge bases using embeddings and side information. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 1317–1327 (2018)Google Scholar
  30. 30.
    Wu, T.H., Wu, Z., Kao, B., Yin, P.: Towards practical open knowledge base canonicalization. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 883–892 (2018)Google Scholar
  31. 31.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiaoyan Chen
    • 1
    Email author
  • Ernesto Jiménez-Ruiz
    • 2
    • 3
  • Ian Horrocks
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of OxfordOxfordUK
  2. 2.The Alan Turing InstituteLondonUK
  3. 3.Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations