Skip to main content

An Analysis of Data Quality in DBpedia and Zhishi.me

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 406))

Abstract

Linked Data has experienced an accelerating growth since it was launched on 2006. While an increasing amount of RDF data is available on the web, errors also proliferate, thus the quality of Linked Data has drawn more and more public attention. Since the quality of data in some way affects the reliability and efficiency of web applications consuming Linked Data, demand for quality analysis of Linked Data becomes increasingly imperative. In this paper, we present some problems concerning the quality of Linked Data. These problems are discovered through our analysis on two cross-domain RDF datasets: DBpedia and Zhishi.me, both of which are based on automatic extraction of resources from existing encyclopedias. Some of the problems can be detected simply by SPARQL queries, while others cannot. For every problem listed in this paper, we present a method for the detection of it. Besides, we do experiments to demonstrate the validity of our methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  3. Fleischhacker, D., Völker, J.: Inductive learning of disjointness axioms. In: Meersman, R., et al. (eds.) OTM 2011, Part II. LNCS, vol. 7045, pp. 680–697. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW (2010)

    Google Scholar 

  5. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. J. Web Sem. 14, 14–44 (2012)

    Article  Google Scholar 

  6. Lehmann, J., Bühmann, L.: ORE-a tool for repairing and enriching knowledge bases. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part II. LNCS, vol. 6497, pp. 177–193. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Péron, Y., Raimbault, F., Ménier, G., Marteau, P.F., et al.: On the detection of inconsistencies in rdf data sets and their correction at ontological level. In: ISWC (2011)

    Google Scholar 

  9. Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. J. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  10. Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: I-SEMANTICS, pp. 33–40 (2012)

    Google Scholar 

  11. Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Management Inform. Systems 12(4), 5–33 (1996)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ma, Y., Qi, G. (2013). An Analysis of Data Quality in DBpedia and Zhishi.me. In: Qi, G., Tang, J., Du, J., Pan, J.Z., Yu, Y. (eds) Linked Data and Knowledge Graph. CSWS 2013. Communications in Computer and Information Science, vol 406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54025-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54025-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54024-0

  • Online ISBN: 978-3-642-54025-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics