Statistical Relation Cardinality Bounds in Knowledge Bases

Muñoz, Emir; Nickles, Matthias

doi:10.1007/978-3-662-58415-6_3

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11310))

311 Accesses
2 Citations

Abstract

There is an increasing number of Semantic Web knowledge bases (KBs) available on the Web, created in academia and industry alike. In this paper, we address the problem of lack of structure in these KBs due to their schema-free nature required for open environments such as the Web. Relation cardinality is an important structural aspect of data that has not received enough attention in the context of KBs. We propose a definition for relation cardinality bounds that can be used to unveil the structure that KBs data naturally exhibit. Information about relation cardinalities such as a person can have two parents and zero or more children, or a book should have one author at least, or a country should have more than two cities can be useful for data users and knowledge engineers when writing queries and reusing or engineering KB systems. Such cardinalities can be declared using OWL and RDF constraint languages as constraints on the usage of properties in the domain of knowledge; however, their declaration is optional and consistency with the instance data is not ensured. We first address the problem of mining relation cardinality bounds by proposing an algorithm that normalises and filters the data to ensure the accuracy and robustness of the mined cardinality bounds. Then we show how these bounds can be used to assess two relevant data quality dimensions: consistency and completeness. Finally, we report that relation cardinality bounds can also be used to expose structural characteristics of a KB by mapping the bounds into a constraint language to declare the actual shape of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Henceforth, we use prefixes to replace namespaces according to http://prefix.cc/ to shorten the length of URLs.
2.
https://www.w3.org/TR/shacl/ (accessed on February 13, 2017).
3.
http://dublincore.org/documents/dc-dsp/.
4.
OWL allows the expression of cardinalities through the minCardinality, maxCardinality, and cardinality restrictions.
5.
http://docs.stardog.com/icv/icv-specification.html.
6.
http://spinrdf.org/.
7.
This work extends our previous work in [22].
8.
Shape Expressions (ShEx) Primer: http://shex.io/shex-primer/.
9.
http://spark.apache.org/ (version 2.1.0).
10.
Any complete graph is its own maximal clique.
11.
http://data.linkedmdb.org/.
12.
http://www.cyc.com/platform/opencyc.
13.
https://www.cs.ox.ac.uk/isg/tools/UOBMGenerator/.
14.
http://www.bl.uk/bibliographic/download.html.
15.
http://www.dbis.informatik.uni-goettingen.de/Mondial/.
16.
https://datahub.io/dataset/nytimes-linked-open-data.
17.
http://data.semanticweb.org/.

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)
MATH Google Scholar
Arenas, M., Conca, S., Pérez, J.: Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In: WWW, pp. 629–638. ACM (2012)
Google Scholar
Arenas, M., Gutierrez, C., Pérez, J.: Foundations of RDF databases. In: Tessaris, S., et al. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 158–204. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03754-2_4
Chapter Google Scholar
Boneva, I., Labra Gayo, J.E., Prud’hommeaux, E.G.: Semantics and validation of shapes schemas for RDF. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 104–120. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_7
Chapter Google Scholar
Bosch, T., Eckert, K.: Guidance, please! towards a framework for RDF-based constraint languages. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (2015)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)
Article Google Scholar
Christodoulou, K., Paton, N.W., Fernandes, A.A.A.: Structure inference for linked data sources using clustering. Trans. Large-Scale Data Knowl.-Centered Syst. 19, 1–25 (2015)
MathSciNet Google Scholar
Ferrarotti, F., Hartmann, S., Link, S.: Efficiency frontiers of XML cardinality constraints. Data Knowl. Eng. 87, 297–319 (2013)
Article Google Scholar
Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_23
Chapter Google Scholar
Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: WSDM, pp. 375–383. ACM (2017)
Google Scholar
Glimm, B., Hogan, A., Krötzsch, M., Polleres, A.: OWL: yet to arrive on the web of data? In: LDOW. CEUR Workshop Proceedings, vol. 937. CEUR-WS.org (2012)
Google Scholar
Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW. CEUR Workshop Proceedings, vol. 628. CEUR-WS.org (2010)
Google Scholar
Horrocks, I., Tessaris, S.: Querying the semantic web: a formal approach. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 177–191. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-48005-6_15
Chapter MATH Google Scholar
Kellou-Menouer, K., Kedad, Z.: Evaluating the gap between an RDF dataset and its schema. In: Jeusfeld, M.A., Karlapalem, K. (eds.) ER 2015. LNCS, vol. 9382, pp. 283–292. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25747-1_28
Chapter Google Scholar
Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoč, D.: SPARQL with property paths. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 3–18. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_1
Chapter Google Scholar
Lausen, G., Meier, M., Schmidt, M.: SPARQLing constraints for RDF. In: EDBT, ACM International Conference Proceeding Series, vol. 261, pp. 499–509. ACM (2008)
Google Scholar
Liddle, S.W., Embley, D.W., Woodfield, S.N.: Cardinality constraints in semantic data models. Data Knowl. Eng. 11(3), 235–270 (1993)
Article Google Scholar
Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational databases. J. Web Sem. 7(2), 74–89 (2009)
Article Google Scholar
Motik, B., Nenov, Y., Piro, R.E.F., Horrocks, I.: Handling Owl:sameAs via rewriting. In: AAAI, pp. 231–237. AAAI Press (2015)
Google Scholar
Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax, 2nd edn (2012). http://www.w3.org/TR/2012/REC-owl2-syntax-20121211/
Muñoz, E.: On learnability of constraints from RDF data. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 834–844. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_52
Chapter Google Scholar
Muñoz, E., Nickles, M.: Mining cardinalities from knowledge bases. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10438, pp. 447–462. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_34
Chapter Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994. IEEE Computer Society (2011)
Google Scholar
Olivé, A.: Conceptual Modeling of Information Systems. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-39390-0
Book MATH Google Scholar
Papakonstantinou, V., Flouris, G., Fundulaki, I., Gubichev, A.: Some thoughts on OWL-empowered SPARQL query optimization. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 12–16. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_3
Chapter Google Scholar
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)
Article Google Scholar
Paulheim, H., Bizer, C.: Improving the quality of Linked Data using statistical distributions. Int. J. Semant. Web Inf. Syst. 10(2), 63–86 (2014)
Article Google Scholar
Pearson, R.K.: Mining imperfect data - dealing with contamination and incomplete records. SIAM (2005)
Google Scholar
Polleres, A., Reutter, J.L., Kostylev, E.V.: Nested constructs vs. sub-selects in SPARQL. In: AMW. CEUR Workshop Proceedings, vol. 1644. CEUR-WS.org (2016)
Google Scholar
Polleres, A., Scharffe, F., Schindlauer, R.: SPARQL++ for mapping between RDF vocabularies. In: Meersman, R., Tari, Z. (eds.) OTM 2007. LNCS, vol. 4803, pp. 878–896. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76848-7_59
Chapter Google Scholar
Prud’hommeaux, E., Gayo, J.E.L., Solbrig, H.R.: Shape expressions: an RDF validation and transformation language. In: SEMANTICS, pp. 32–40. ACM (2014)
Google Scholar
Rivero, C.R., Hernández, I., Ruiz, D., Corchuelo, R.: Towards discovering ontological models from big RDF data. In: Castano, S., Vassiliadis, P., Lakshmanan, L.V., Lee, M.L. (eds.) ER 2012. LNCS, vol. 7518, pp. 131–140. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33999-8_16
Chapter Google Scholar
Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165–172 (1983)
Article Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, 3rd internat. edn. Pearson Education (2010)
Google Scholar
Ryman, A.G., Hors, A.L., Speicher, S.: OSLC resource shape: a language for defining constraints on linked data. In: LDOW. CEUR Workshop Proceedings, vol. 996. CEUR-WS.org (2013)
Google Scholar
Schenner, G., Bischof, S., Polleres, A., Steyskal, S.: Integrating distributed configurations with RDFS and SPARQL. In: Configuration Workshop. CEUR Workshop Proceedings, vol. 1220, pp. 9–15. CEUR-WS.org (2014)
Google Scholar
Schmidt, M., Lausen, G.: Pleasantly consuming linked data with RDF data descriptions. In: COLD. CEUR Workshop Proceedings, vol. 1034. CEUR-WS.org (2013)
Google Scholar
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: ICDT, pp. 4–33. ACM International Conference Proceeding Series. ACM (2010)
Google Scholar
Tanon, T.P., Stepanova, D., Razniewski, S., Mirza, P., Weikum, G.: Completeness-aware rule learning from knowledge graphs. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 507–525. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_30
Chapter Google Scholar
Thalheim, B.: Fundamentals of cardinality constraints. In: Pernul, G., Tjoa, A.M. (eds.) ER 1992. LNCS, vol. 645, pp. 7–23. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-56023-8_3
Chapter Google Scholar
Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: I-SEMANTICS, pp. 33–40. ACM (2012)
Google Scholar
Vandenbussche, P., Atemezing, G., Poveda-Villalón, M., Vatant, B.: Linked open vocabularies (LOV): a gateway to reusable semantic vocabularies on the web. Semant. Web 8(3), 437–452 (2017)
Article Google Scholar
Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_9
Chapter Google Scholar
Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_34
Chapter Google Scholar

Download references

Acknowledgements

This work has been supported by TOMOE project funded by Fujitsu Laboratories Ltd., Japan and Insight Centre for Data Analytics at National University of Ireland Galway, Ireland.

Author information

Authors and Affiliations

Fujitsu Ireland Limited, Dublin, Ireland
Emir Muñoz
Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
Emir Muñoz & Matthias Nickles

Authors

Emir Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Nickles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emir Muñoz .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Roland Wagner
IUT, University Lyon 1, Villeurbanne Cedex, France
Djamal Benslimane
University of Milan, Crema, Italy
Ernesto Damiani
University of Michigan-Dearborn, Dearborn, MI, USA
William I. Grosky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Muñoz, E., Nickles, M. (2018). Statistical Relation Cardinality Bounds in Knowledge Bases. In: Hameurlain, A., Wagner, R., Benslimane, D., Damiani, E., Grosky, W. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIX. Lecture Notes in Computer Science(), vol 11310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58415-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-58415-6_3
Published: 23 November 2018
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58414-9
Online ISBN: 978-3-662-58415-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics