Mining Cardinalities from Knowledge Bases

Muñoz, Emir; Nickles, Matthias

doi:10.1007/978-3-319-64468-4_34

Mining Cardinalities from Knowledge Bases

Emir Muñoz^19,20 &
Matthias Nickles²⁰

Conference paper
First Online: 01 August 2017

1169 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Abstract

Cardinality is an important structural aspect of data that has not received enough attention in the context of RDF knowledge bases (KBs). Information about cardinalities can be useful for data users and knowledge engineers when writing queries, reusing or engineering KBs. Such cardinalities can be declared using OWL and RDF constraint languages as constraints on the usage of properties over instance data. However, their declaration is optional and consistency with the instance data is not ensured. In this paper, we address the problem of mining cardinality bounds for properties to discover structural characteristics of KBs, and use these bounds to assess completeness. Because KBs are incomplete and error-prone, we apply statistical methods for filtering property usage and for finding accurate and robust patterns. Accuracy of the cardinality patterns is ensured by properly handling equality axioms (owl:sameAs); and robustness by filtering outliers. We report an implementation of our algorithm with two variants using SPARQL 1.1 and Apache Spark, and their evaluation on real-world and synthetic data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Henceforth, we use prefixes for namespaces according to http://prefix.cc/.
2.
https://www.w3.org/TR/shacl/ (accessed on February 13, 2017).
3.
http://dublincore.org/documents/dc-dsp/.
4.
OWL allows the expression of cardinalities through the minCardinality, maxCardinality, and cardinality restrictions.
5.
http://docs.stardog.com/icv/icv-specification.html.
6.
http://spinrdf.org/.
7.
http://spark.apache.org/ (version 2.1.0).
8.
http://www.cyc.com/platform/opencyc.
9.
https://www.cs.ox.ac.uk/isg/tools/UOBMGenerator/.
10.
http://www.bl.uk/bibliographic/download.html.
11.
http://www.dbis.informatik.uni-goettingen.de/Mondial/.
12.
https://datahub.io/dataset/nytimes-linked-open-data.

References

Bosch, T., Eckert, K.: Guidance, please! Towards a framework for RDF-based constraint languages. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (2015)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15 (2009)
Article Google Scholar
Ferrarotti, F., Hartmann, S., Link, S.: Efficiency frontiers of XML cardinality constraints. Data Knowl. Eng. 87, 297–319 (2013)
Article Google Scholar
Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_23
Google Scholar
Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: WSDM, pp. 375–383. ACM (2017)
Google Scholar
Glimm, B., Hogan, A., Krötzsch, M., Polleres, A.: OWL: yet to arrive on the web of data? In: LDOW, CEUR Workshop Proceedings, vol. 937. CEUR-WS.org (2012)
Google Scholar
Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW, CEUR Workshop Proceedings, vol. 628. CEUR-WS.org (2010)
Google Scholar
Kellou-Menouer, K., Kedad, Z.: Evaluating the gap between an RDF dataset and its schema. In: Jeusfeld, M.A., Karlapalem, K. (eds.) ER 2015. LNCS, vol. 9382, pp. 283–292. Springer, Cham (2015). doi:10.1007/978-3-319-25747-1_28
Chapter Google Scholar
Lausen, G., Meier, M., Schmidt, M.: SPARQLing constraints for RDF. In: EDBT, pp. 499–509 (2008)
Google Scholar
Liddle, S.W., Embley, D.W., Woodfield, S.N.: Cardinality constraints in semantic data models. Data Knowl. Eng. 11(3), 235–270 (1993)
Article MATH Google Scholar
Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational databases. Web Seman.: Sci. Serv. Agents World Wide Web 7(2), 74–89 (2009)
Article Google Scholar
Motik, B., Nenov, Y., Piro, R.E.F., Horrocks, I.: Handling Owl:sameAs via rewriting. In: AAAI, pp. 231–237. AAAI Press (2015)
Google Scholar
Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language structural specification and functional-style syntax, 2nd edn (2012). http://www.w3.org/TR/2012/REC-owl2-syntax-20121211/
Muñoz, E.: On learnability of constraints from RDF data. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 834–844. Springer, Cham (2016). doi:10.1007/978-3-319-34129-3_52
Chapter Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994. IEEE Computer Society (2011)
Google Scholar
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)
Article Google Scholar
Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. 10(2), 63–86 (2014)
Article Google Scholar
Pearson, R.K.: Mining Imperfect Data: Dealing with Contamination and Incomplete Records. Society for Industrial and Applied Mathematics, Philadelphia (2005)
Book MATH Google Scholar
Prud’hommeaux, E., Gayo, J.E.L., Solbrig, H.R.: Shape expressions: an RDF validation and transformation language. In: SEMANTICS, pp. 32–40. ACM (2014)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Article MATH Google Scholar
Rivero, C.R., Hernández, I., Ruiz, D., Corchuelo, R.: Towards discovering ontological models from big RDF data. In: Castano, S., Vassiliadis, P., Lakshmanan, L.V., Lee, M.L. (eds.) ER 2012. LNCS, vol. 7518, pp. 131–140. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33999-8_16
Chapter Google Scholar
Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165–172 (1983)
Article MATH Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd and 3rd edn. Pearson Education, London (2009)
Google Scholar
Ryman, A.G., Hors, A.L., Speicher, S.: OSLC resource shape: a language for defining constraints on linked data. In: Proceedings of the WWW 2013 Workshop on Linked Data on the Web (2013)
Google Scholar
Schenner, G., Bischof, S., Polleres, A., Steyskal, S.: Integrating distributed configurations with RDFS and SPARQL. In: Configuration Workshop, CEUR Workshop Proceedings, vol. 1220, pp. 9–15. CEUR-WS.org (2014)
Google Scholar
Schmidt, M., Lausen, G.: Pleasantly consuming Linked Data with RDF data descriptions. In: COLD. CEUR-WS.org (2013)
Google Scholar
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: ICDT, pp. 4–33. ACM (2010)
Google Scholar
Thalheim, B.: Fundamentals of cardinality constraints. In: Pernul, G., Tjoa, A.M. (eds.) ER 1992. LNCS, vol. 645, pp. 7–23. Springer, Heidelberg (1992). doi:10.1007/3-540-56023-8_3
Chapter Google Scholar
Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: I-SEMANTICS, pp. 33–40. ACM (2012)
Google Scholar
Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21034-1_9
Chapter Google Scholar
Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). doi:10.1007/978-3-319-07443-6_34
Chapter Google Scholar

Download references

Acknowledgements

This work has been supported by TOMOE project funded by Fujitsu Laboratories Ltd., Japan and Insight Centre for Data Analytics at National University of Ireland Galway, Ireland.

Author information

Authors and Affiliations

Fujitsu Ireland Limited, Dublin, Ireland
Emir Muñoz
Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
Emir Muñoz & Matthias Nickles

Authors

Emir Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Nickles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emir Muñoz .

Editor information

Editors and Affiliations

University of Lyon, Villeurbanne, France
Djamal Benslimane
University of Milan, Milan, Italy
Ernesto Damiani
University of Michigan, Dearborn, Michigan, USA
William I. Grosky
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Wright State University, Dayton, Ohio, USA
Amit Sheth
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muñoz, E., Nickles, M. (2017). Mining Cardinalities from Knowledge Bases. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-64468-4_34
Published: 01 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics