An Analysis of Data Quality in DBpedia and

  • Yanfang Ma
  • Guilin Qi
Part of the Communications in Computer and Information Science book series (CCIS, volume 406)


Linked Data has experienced an accelerating growth since it was launched on 2006. While an increasing amount of RDF data is available on the web, errors also proliferate, thus the quality of Linked Data has drawn more and more public attention. Since the quality of data in some way affects the reliability and efficiency of web applications consuming Linked Data, demand for quality analysis of Linked Data becomes increasingly imperative. In this paper, we present some problems concerning the quality of Linked Data. These problems are discovered through our analysis on two cross-domain RDF datasets: DBpedia and, both of which are based on automatic extraction of resources from existing encyclopedias. Some of the problems can be detected simply by SPARQL queries, while others cannot. For every problem listed in this paper, we present a method for the detection of it. Besides, we do experiments to demonstrate the validity of our methods.


Object Property SPARQL Query Meaningful Unit Levenshtein Distance Parent Category 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  3. 3.
    Fleischhacker, D., Völker, J.: Inductive learning of disjointness axioms. In: Meersman, R., et al. (eds.) OTM 2011, Part II. LNCS, vol. 7045, pp. 680–697. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW (2010)Google Scholar
  5. 5.
    Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. J. Web Sem. 14, 14–44 (2012)CrossRefGoogle Scholar
  6. 6.
    Lehmann, J., Bühmann, L.: ORE-a tool for repairing and enriching knowledge bases. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part II. LNCS, vol. 6497, pp. 177–193. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: - weaving chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Péron, Y., Raimbault, F., Ménier, G., Marteau, P.F., et al.: On the detection of inconsistencies in rdf data sets and their correction at ontological level. In: ISWC (2011)Google Scholar
  9. 9.
    Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. J. IEEE Data Eng. Bull. 23(4), 3–13 (2000)Google Scholar
  10. 10.
    Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: I-SEMANTICS, pp. 33–40 (2012)Google Scholar
  11. 11.
    Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Management Inform. Systems 12(4), 5–33 (1996)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yanfang Ma
    • 1
  • Guilin Qi
    • 1
  1. 1.School of Computer Science and EngineeringSoutheast UniversityNanjingChina

Personalised recommendations