Skip to main content

Mining Cardinalities from Knowledge Bases

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Abstract

Cardinality is an important structural aspect of data that has not received enough attention in the context of RDF knowledge bases (KBs). Information about cardinalities can be useful for data users and knowledge engineers when writing queries, reusing or engineering KBs. Such cardinalities can be declared using OWL and RDF constraint languages as constraints on the usage of properties over instance data. However, their declaration is optional and consistency with the instance data is not ensured. In this paper, we address the problem of mining cardinality bounds for properties to discover structural characteristics of KBs, and use these bounds to assess completeness. Because KBs are incomplete and error-prone, we apply statistical methods for filtering property usage and for finding accurate and robust patterns. Accuracy of the cardinality patterns is ensured by properly handling equality axioms (owl:sameAs); and robustness by filtering outliers. We report an implementation of our algorithm with two variants using SPARQL 1.1 and Apache Spark, and their evaluation on real-world and synthetic data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Henceforth, we use prefixes for namespaces according to http://prefix.cc/.

  2. 2.

    https://www.w3.org/TR/shacl/ (accessed on February 13, 2017).

  3. 3.

    http://dublincore.org/documents/dc-dsp/.

  4. 4.

    OWL allows the expression of cardinalities through the minCardinality, maxCardinality, and cardinality restrictions.

  5. 5.

    http://docs.stardog.com/icv/icv-specification.html.

  6. 6.

    http://spinrdf.org/.

  7. 7.

    http://spark.apache.org/ (version 2.1.0).

  8. 8.

    http://www.cyc.com/platform/opencyc.

  9. 9.

    https://www.cs.ox.ac.uk/isg/tools/UOBMGenerator/.

  10. 10.

    http://www.bl.uk/bibliographic/download.html.

  11. 11.

    http://www.dbis.informatik.uni-goettingen.de/Mondial/.

  12. 12.

    https://datahub.io/dataset/nytimes-linked-open-data.

References

  1. Bosch, T., Eckert, K.: Guidance, please! Towards a framework for RDF-based constraint languages. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (2015)

    Google Scholar 

  2. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15 (2009)

    Article  Google Scholar 

  3. Ferrarotti, F., Hartmann, S., Link, S.: Efficiency frontiers of XML cardinality constraints. Data Knowl. Eng. 87, 297–319 (2013)

    Article  Google Scholar 

  4. Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_23

    Google Scholar 

  5. Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: WSDM, pp. 375–383. ACM (2017)

    Google Scholar 

  6. Glimm, B., Hogan, A., Krötzsch, M., Polleres, A.: OWL: yet to arrive on the web of data? In: LDOW, CEUR Workshop Proceedings, vol. 937. CEUR-WS.org (2012)

    Google Scholar 

  7. Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW, CEUR Workshop Proceedings, vol. 628. CEUR-WS.org (2010)

    Google Scholar 

  8. Kellou-Menouer, K., Kedad, Z.: Evaluating the gap between an RDF dataset and its schema. In: Jeusfeld, M.A., Karlapalem, K. (eds.) ER 2015. LNCS, vol. 9382, pp. 283–292. Springer, Cham (2015). doi:10.1007/978-3-319-25747-1_28

    Chapter  Google Scholar 

  9. Lausen, G., Meier, M., Schmidt, M.: SPARQLing constraints for RDF. In: EDBT, pp. 499–509 (2008)

    Google Scholar 

  10. Liddle, S.W., Embley, D.W., Woodfield, S.N.: Cardinality constraints in semantic data models. Data Knowl. Eng. 11(3), 235–270 (1993)

    Article  MATH  Google Scholar 

  11. Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational databases. Web Seman.: Sci. Serv. Agents World Wide Web 7(2), 74–89 (2009)

    Article  Google Scholar 

  12. Motik, B., Nenov, Y., Piro, R.E.F., Horrocks, I.: Handling Owl:sameAs via rewriting. In: AAAI, pp. 231–237. AAAI Press (2015)

    Google Scholar 

  13. Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language structural specification and functional-style syntax, 2nd edn (2012). http://www.w3.org/TR/2012/REC-owl2-syntax-20121211/

  14. Muñoz, E.: On learnability of constraints from RDF data. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 834–844. Springer, Cham (2016). doi:10.1007/978-3-319-34129-3_52

    Chapter  Google Scholar 

  15. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994. IEEE Computer Society (2011)

    Google Scholar 

  16. Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)

    Article  Google Scholar 

  17. Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. 10(2), 63–86 (2014)

    Article  Google Scholar 

  18. Pearson, R.K.: Mining Imperfect Data: Dealing with Contamination and Incomplete Records. Society for Industrial and Applied Mathematics, Philadelphia (2005)

    Book  MATH  Google Scholar 

  19. Prud’hommeaux, E., Gayo, J.E.L., Solbrig, H.R.: Shape expressions: an RDF validation and transformation language. In: SEMANTICS, pp. 32–40. ACM (2014)

    Google Scholar 

  20. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  21. Rivero, C.R., Hernández, I., Ruiz, D., Corchuelo, R.: Towards discovering ontological models from big RDF data. In: Castano, S., Vassiliadis, P., Lakshmanan, L.V., Lee, M.L. (eds.) ER 2012. LNCS, vol. 7518, pp. 131–140. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33999-8_16

    Chapter  Google Scholar 

  22. Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165–172 (1983)

    Article  MATH  Google Scholar 

  23. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd and 3rd edn. Pearson Education, London (2009)

    Google Scholar 

  24. Ryman, A.G., Hors, A.L., Speicher, S.: OSLC resource shape: a language for defining constraints on linked data. In: Proceedings of the WWW 2013 Workshop on Linked Data on the Web (2013)

    Google Scholar 

  25. Schenner, G., Bischof, S., Polleres, A., Steyskal, S.: Integrating distributed configurations with RDFS and SPARQL. In: Configuration Workshop, CEUR Workshop Proceedings, vol. 1220, pp. 9–15. CEUR-WS.org (2014)

    Google Scholar 

  26. Schmidt, M., Lausen, G.: Pleasantly consuming Linked Data with RDF data descriptions. In: COLD. CEUR-WS.org (2013)

    Google Scholar 

  27. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: ICDT, pp. 4–33. ACM (2010)

    Google Scholar 

  28. Thalheim, B.: Fundamentals of cardinality constraints. In: Pernul, G., Tjoa, A.M. (eds.) ER 1992. LNCS, vol. 645, pp. 7–23. Springer, Heidelberg (1992). doi:10.1007/3-540-56023-8_3

    Chapter  Google Scholar 

  29. Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: I-SEMANTICS, pp. 33–40. ACM (2012)

    Google Scholar 

  30. Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21034-1_9

    Chapter  Google Scholar 

  31. Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). doi:10.1007/978-3-319-07443-6_34

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been supported by TOMOE project funded by Fujitsu Laboratories Ltd., Japan and Insight Centre for Data Analytics at National University of Ireland Galway, Ireland.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emir Muñoz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Muñoz, E., Nickles, M. (2017). Mining Cardinalities from Knowledge Bases. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64468-4_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64467-7

  • Online ISBN: 978-3-319-64468-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics