Advertisement

Data-Driven Joint Debugging of the DBpedia Mappings and Ontology

Towards Addressing the Causes Instead of the Symptoms of Data Quality in DBpedia
  • Heiko PaulheimEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10249)

Abstract

DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.

Keywords

Knowledge graph construction Knowledge graph debugging Ontology debugging Data quality Data-driven approaches DBpedia 

Notes

Acknowledgements

The author would like to thank the numerous people involved in the DBpedia project for their past, ongoing, and future efforts, as well as the authors of [4] for providing the DBpedia mappings in RML.

References

  1. 1.
    Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41338-4_17CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)Google Scholar
  3. 3.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  4. 4.
    Dimou, A., Kontokostas, D., Freudenberg, M., Verborgh, R., Lehmann, J., Mannens, E., Hellmann, S.: DBpedia mappings quality assessment. In: International Semantic Web Conference - Posters and Demonstrations (2016)Google Scholar
  5. 5.
    Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF mappings of heterogeneous data. In: LDOW (2014)Google Scholar
  6. 6.
    Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). doi: 10.1007/978-3-319-11964-9_23CrossRefGoogle Scholar
  7. 7.
    Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WordNet with DOLCE. AI Mag. 24(3), 13–24 (2003)zbMATHGoogle Scholar
  8. 8.
    Gangemi, A., Mika, P.: Understanding the semantic web through descriptions and situations. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 689–706. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-39964-3_44CrossRefGoogle Scholar
  9. 9.
    Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: an owl 2 reasoner. J. Autom. Reasoning 53(3), 245–269 (2014)CrossRefGoogle Scholar
  10. 10.
    Hellmann, S., Stadler, C., Lehmann, J., Auer, S.: DBpedia live extraction. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2009. LNCS, vol. 5871, pp. 1209–1223. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-05151-7_33CrossRefGoogle Scholar
  11. 11.
    Jain, P., Hitzler, P., Yeh, P.Z., Verma, K., Sheth, A.P.: Linked data is merely more data. In: AAAI Spring Symposium: Linked Data Meets Artificial Intelligence, vol. 11 (2010)Google Scholar
  12. 12.
    Lehmann, J., Bühmann, L.: ORE - a tool for repairing and enriching knowledge bases. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010. LNCS, vol. 6497, pp. 177–193. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-17749-1_12CrossRefGoogle Scholar
  13. 13.
    Lehmann, J., Gerber, D., Morsey, M., Ngonga Ngomo, A.-C.: DeFacto - deep fact validation. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 312–327. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-35176-1_20CrossRefGoogle Scholar
  14. 14.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. 6(2), 167–195 (2015)Google Scholar
  15. 15.
    Ma, Y., Gao, H., Wu, T., Qi, G.: Learning disjointness axioms with association rule mining and its application to inconsistency detection of linked data. In: Zhao, D., Du, J., Wang, H., Wang, P., Ji, D., Pan, J.Z. (eds.) CSWS 2014. CCIS, vol. 480, pp. 29–41. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-45495-4_3CrossRefGoogle Scholar
  16. 16.
    Paulheim, H.: Identifying wrong links between datasets by multi-dimensional outlier detection. In: International Workshop on Debugging Ontologies and Ontology Mappings, vol. 1162, pp. 27–38. CEUR Workshop Proceedings (2014)Google Scholar
  17. 17.
    Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)CrossRefGoogle Scholar
  18. 18.
    Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)CrossRefGoogle Scholar
  19. 19.
    Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than just adding a cherry on top. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Cham (2015). doi: 10.1007/978-3-319-25007-6_11CrossRefGoogle Scholar
  20. 20.
    Paulheim, H., Stuckenschmidt, H.: Fast approximate A-box consistency checking using machine learning. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 135–150. Springer, Cham (2016). doi: 10.1007/978-3-319-34129-3_9CrossRefGoogle Scholar
  21. 21.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson, London (1995)zbMATHGoogle Scholar
  22. 22.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). doi: 10.1007/978-3-319-11964-9_16CrossRefGoogle Scholar
  23. 23.
    Sheng, Z., Wang, X., Shi, H., Feng, Z.: Checking and handling inconsistency of DBpedia. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds.) WISM 2012. LNCS, vol. 7529, pp. 480–488. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33469-6_60CrossRefGoogle Scholar
  24. 24.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge unifying WordNet and Wikipedia. In: 16th International Conference on World Wide Web, pp. 697–706. ACM, New York (2007)Google Scholar
  25. 25.
    Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: Proceedings of the 8th International Conference on Semantic Systems, pp. 33–40. ACM, New York (2012)Google Scholar
  26. 26.
    Waitelonis, J., Ludwig, N., Knuth, M., Sack, H.: WhoKnows? - evaluating linked data heuristics with a quiz that cleans up DBpedia. Int. J. Interact. Technol. Smart Educ. 8(4), 236–248 (2011)CrossRefGoogle Scholar
  27. 27.
    Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). doi: 10.1007/978-3-319-07443-6_34CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany

Personalised recommendations