Using Queries as Schema-Templates for Graph Databases
- 126 Downloads
Abstract
In contrast to heavy-handed ER-style data models in relational databases, knowledge graphs (or graph databases) capture entity semantics in terms of entity relationships and properties following a simple collect-as-you-go model. While this allows for a more flexible and dynamically adaptable knowledge representation, it comes at the price of more complex querying: with varying degrees of information sparsity, it will gradually become more difficult to figure out what an entity actually represents. Thus, matching the intended schema as specified by a query against actually occurring entity patterns in the graph database needs severe attention on a conceptual level. In this article, we analyze graph patterns as schema information from a graph pattern matching perspective. We argue that every query consists of a mixture of conceptual information (how entities are structured) together with evaluation information (further dependencies and constraints on data) and that this mixture is not always easy to divide. To arrive at truly schema-aware graph query processing, we propose several matching mechanisms, each mandating a specific semantic meaning of the graph pattern, and discuss their practical applicability.
Keywords
Graph databases Graph queries Conceptual modeling Pattern matchingReferences
- 1.Abedjan Z, Golab L, Naumann F (2015) Profiling relational data: a survey. Vldb J 24(4):557–581CrossRefGoogle Scholar
- 2.Abiteboul S, Buneman P, Suciu D (1999) Data on the web: from relations to semistructured data and XML. Morgan Kaufmann, BurlingtonGoogle Scholar
- 3.Angles R, Gutierrez C (2005) Querying RDF data from a graph database perspective. In: Gómez-Pérez A, Euzenat J (eds) ESWC 2005, LNCS. Springer, Berlin, Heidelberg, pp 346–360 https://doi.org/10.1007/11431053_24 Google Scholar
- 4.Angles R, Arenas M, Barceló P, Hogan A, Reutter J, Vrgoč D (2017) Foundations of modern query languages for graph databases. Acm Comput Surv 50(5):68:1–68:40. https://doi.org/10.1145/3104031 CrossRefGoogle Scholar
- 5.Brookes SD, Hoare CAR, Roscoe AW (1984) A theory of communicating sequential processes. JACM 31(3):560–599. https://doi.org/10.1145/828.833 MathSciNetCrossRefzbMATHGoogle Scholar
- 6.Brynielsson J, Högberg J, Kaati L, Mårtenson C, Svenson P (2010) Detecting social positions using simulation. ASONAM 2010, pp 48–55 https://doi.org/10.1109/ASONAM.2010.52 Google Scholar
- 7.Chen PPS (1976) The entity-relationship model – toward a unified view of data. ACM Trans Database Syst 1(1):9–36. https://doi.org/10.1145/320434.320440 MathSciNetCrossRefGoogle Scholar
- 8.Cook SA (1971) The complexity of theorem-proving procedures. Proceedings of the Third Annual ACM Symposium on Theory of Computing, STOC ’71. ACM, New York, pp 151–158 https://doi.org/10.1145/800157.805047 Google Scholar
- 9.Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14. ACM, New York, pp 601–610 https://doi.org/10.1145/2623330.2623623 Google Scholar
- 10.Fan W (2012) Graph pattern matching revised for social network analysis. ICDT 2012. ACM, New York, pp 8–21 https://doi.org/10.1145/2274576.2274578 Google Scholar
- 11.Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. PVDLB Endow 3(1):264–275. https://doi.org/10.14778/1920841.1920878 Google Scholar
- 12.Galárraga L, Razniewski S, Amarilli A, Suchanek FM (2017) Predicting completeness in knowledge bases. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17. ACM, New York, pp 375–383 https://doi.org/10.1145/3018661.3018739 Google Scholar
- 13.van Glabbeek RJ (1990) The linear time - branching time spectrum. In: Baeten JCM, Klop JW (eds) CONCUR 1990. Springer, Berlin, Heidelberg, pp 278–297 https://doi.org/10.1007/BFb0039066 Google Scholar
- 14.van Glabbeek R, Goltz U (1989) Equivalence notions for concurrent systems and refinement of actions. In: Kreczmar A, Mirkowska G (eds) Mathematical foundations of computer science 1989. Springer, Berlin, Heidelberg, pp 237–248CrossRefGoogle Scholar
- 15.Glimm B, Krötzsch M (2010) Sparql beyond subgraph matching. The Semantic Web – ISWC 2010. Springer, Berlin, Heidelberg, pp 241–256Google Scholar
- 16.Hell P, Nešetřil J (1990) On the complexity of h‑coloring. J Comb Theory Ser B 48(1):92–110. https://doi.org/10.1016/0095-8956(90)90132-J CrossRefzbMATHGoogle Scholar
- 17.Henzinger M, Henzinger T, Kopke P (1995) Computing simulations on finite and infinite graphs. In: FOCS 1995. IEEE Computer Society, pp 453–462 https://doi.org/10.1109/SFCS.1995.492576
- 18.Homoceanu S, Balke WT (2015) A chip off the old block - extracting typical attributes for entities based on family resemblance. Database systems for advanced applications. Springer, Cham, pp 493–509Google Scholar
- 19.Homoceanu S, Wille P, Balke WT (2013) Proswip: property-based data access for semantic web interactive programming. Proceedings of the 12th International Semantic Web Conference - Part I, ISWC ’13. Springer, New York, pp 184–199 https://doi.org/10.1007/978-3-642-41335-3_12 Google Scholar
- 20.Lee J, Han WS, Kasperovics R, Lee JH (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB Endow 6(2):133–144. https://doi.org/10.14778/2535568.2448946 CrossRefGoogle Scholar
- 21.Ma S, Cao Y, Fan W, Huai J, Wo T (2011) Capturing topology in graph pattern matching. PVLDB Endow 5(4):310–321. https://doi.org/10.14778/2095686.2095690 CrossRefzbMATHGoogle Scholar
- 22.Ma S, Cao Y, Fan W, Huai J, Wo T (2014) Strong simulation: capturing topology in graph pattern matching. Acm Trans Database Syst 39(1):1–4. https://doi.org/10.1145/2528937 MathSciNetCrossRefzbMATHGoogle Scholar
- 23.Mennicke S, Kalo JC, Balke WT (2017) Querying graph databases: what do graph patterns mean? Springer, Cham, pp 134–148 https://doi.org/10.1007/978-3-319-69904-2_11 Google Scholar
- 24.Milner R (1971) An algebraic definition of simulation between programs. Proceedings of the 2Nd International Joint Conference on Artificial Intelligence, IJCAI’71. Morgan Kaufmann Publishers, San Francisco, pp 481–489Google Scholar
- 25.Mottin D, Lissandrini M, Velegrakis Y, Palpanas T (2016) Exemplar queries: a new way of searching. VLDB J 25(6):741–765. https://doi.org/10.1007/s00778-016-0429-2 CrossRefGoogle Scholar
- 26.Nardo LD, Ranzato F, Tapparo F (2009) The subgraph similarity problem. IEEE Trans Knowl Data Eng 21(5):748–749. https://doi.org/10.1109/TKDE.2008.205 CrossRefGoogle Scholar
- 27.Pérez J, Arenas M, Gutierrez C (2009) Semantics and complexity of sparql. Acm Trans Database Syst 34(3):16:1–16:45. https://doi.org/10.1145/1567274.1567278 CrossRefGoogle Scholar
- 28.Polyvyanyy A, Weidlich M, Weske M (2012) Isotactics as a foundation for alignment and abstraction of behavioral models. In: Barros A, Gal A, Kindler E (eds) Business process management. Springer, Berlin, Heidelberg, pp 335–351CrossRefGoogle Scholar
- 29.Vardi MY (1982) The complexity of relational query languages (extended abstract). Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC ’82. ACM, New York, pp 137–146 https://doi.org/10.1145/800070.802186 Google Scholar