A More General Theory of Static Approximations for Conjunctive Queries

Barceló, Pablo; Romero, Miguel; Zeume, Thomas

doi:10.1007/s00224-019-09924-0

A More General Theory of Static Approximations for Conjunctive Queries

Published: 10 May 2019

Volume 64, pages 916–964, (2020)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

185 Accesses
Explore all metrics

Abstract

Conjunctive query (CQ) evaluation is NP-complete, but becomes tractable for fragments of bounded hypertreewidth. Approximating a hard CQ by a query from such a fragment can thus allow for an efficient approximate evaluation. While underapproximations (i.e., approximations that return correct answers only) are well-understood, the dual notion of overapproximations (i.e, approximations that return complete – but not necessarily sound – answers), and also a more general notion of approximation based on the symmetric difference of query results, are almost unexplored. In fact, the decidability of the basic problems of evaluation, identification, and existence of those approximations has been open. This article establishes a connection between overapproximations and existential pebble games that allows for studying such problems systematically. Building on this connection, it is shown that the evaluation and identification problem for overapproximations can be solved in polynomial time. While the general existence problem remains open, the problem is shown to be decidable in 2EXPTIME over the class of acyclic CQs and in PTIME for Boolean CQs over binary schemata. Additionally we propose a more liberal notion of overapproximations to remedy the known shortcoming that queries might not have an overapproximation, and study how queries can be overapproximated in the presence of tuple generating and equality generating dependencies. The techniques are then extended to symmetric difference approximations and used to provide several complexity results for the identification, existence, and evaluation problem for this type of approximations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 5

Order-Sensitive Domination in Partially Ordered Sets and Graphs

Article 27 April 2022

Satisfiability Modulo Theories

An Introduction to Answer Set Programming and Some of Its Extensions

Notes

Recall that the symmetric difference between sets A and B is (A ∖ B) ∪ (B ∖ A).

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)
MATH Google Scholar
Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)
Bárány, V., Gottlob, G., Otto, M.: Querying the guarded fragment. Logical Methods in Computer Science 10(2) (2014)
Barceló, P.: Querying graph databases. In: PODS, pp. 175–188 (2013)
Barceló, P., Gottlob, G., Pieris, A.: Semantic acyclicity under constraints. In: PODS, pp. 343–354 (2016)
Barceló, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. In: PODS, pp. 249–260 (2012)
Barceló, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. SIAM J. Comput. 43(3), 1085–1130 (2014)
Article MathSciNet Google Scholar
Barceló, P., Romero, M., Vardi, M.Y.: Semantic acyclicity on graph databases. SIAM J. Comput. 45(4), 1339–1376 (2016)
Article MathSciNet Google Scholar
Blumensath, A., Otto, M., Weyer, M.: Decidability results for the boundedness problem. Logical Methods in Computer Science 10(3) (2014)
Calì, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive relational constraints. In: KR, pp. 70–80 (2008)
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: STOC, pp. 77–90 (1977)
Chekuri, C., Rajaraman, A.: Conjunctive query containment revisited. Theor. Comput. Sci. 239(2), 211–229 (2000)
Article MathSciNet Google Scholar
Chen, H., Dalmau, V.: Beyond hypertree width: decomposition methods without decompositions. In: CP, pp. 167–181 (2005)
Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization problems for database logic programs (Preliminary Report). In: STOC, pp. 477–490 (1988)
Dalmau, V., Kolaitis, P.G., Vardi, M.Y.: Constraint satisfaction, bounded treewidth, and finite-variable logics. In: CP, pp. 310–326 (2002)
Deutsch, A., Nash, A., Remmel, J.B.: The chase revisisted. In: PODS, pp. 149–158 (2008)
Fagin, R.: A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst. 6(3), 387–415 (1981)
Article Google Scholar
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: Semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)
Article MathSciNet Google Scholar
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB 3(1), 264–275 (2010)
Google Scholar
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT, pp. 174–185 (2011)
Fischl, W., Gottlob, G., Pichler, R.: General and fractional hypertree decompositions: hard and easy cases. In: PODS, pp. 17–32 (2018)
Gaifman, H., Mairson, H.G., Sagiv, Y., Vardi, M.Y.: Undecidable optimization problems for database logic programs. J. ACM 40(3), 683–713 (1993)
Article MathSciNet Google Scholar
Garofalakis, M., Gibbon, P.: Approximate query processing: taming the terabytes. In: VLDB, p. 725 (2001)
Gottlob, G., Greco, G., Leone, N., Scarcello, F.: Hypertree decompositions: questions and answers. In: PODS, pp. 57–74 (2016)
Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64(3), 579–627 (2002)
Article MathSciNet Google Scholar
Gottlob, G., Miklós, Z., Schwentick, T.: Generalized hypertree decompositions: NP-hardness and tractable variants. J ACM 56(6), 30:1–30:32 (2009)
Greco, G., Scarcello, F.: The power of local consistency in conjunctive queries and constraint satisfaction problems. SIAM J. Comput. 46(3), 1111–1145 (2017)
Article MathSciNet Google Scholar
Grohe, M., Marx, D.: Constraint solving via fractional edge covers. In: SODA, pp. 289–298 (2006)
Hell, P., Nesetril, J.: The core of a graph. Discret. Math. 109(1-3), 117–126 (1992)
Article MathSciNet Google Scholar
Hell, P., Nesetril, J., Zhu, X.: Complexity of tree homomorphisms. Discret. Appl. Math. 70(1), 23–36 (1996)
Article MathSciNet Google Scholar
Hell, P., Nešeťril, J.: Graphs and Homomorphisms. Oxford University Press, Oxford (2004)
Book Google Scholar
Ioannidis, Y.: Approximations in database systems. In: ICDT, pp. 16–30 (2003)
Kolaitis, P.G., Panttaja, J.: On the complexity of existential pebble games. In: CSL, pp. 314–329 (2003)
Kolaitis, P.G., Vardi, M.Y.: On the expressive power of datalog: Tools and a case study. J. Comput. Syst. Sci. 51(1), 110–134 (1995)
Article MathSciNet Google Scholar
Kolaitis, P.G., Vardi, M.Y.: Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61(2), 302–332 (2000)
Article MathSciNet Google Scholar
Liu, Q.: Approximate query processing. In: Encyclopedia of Database Systems, pp 113–119 (2009)
Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans. Database Syst. 4(4), 455–469 (1979)
Article Google Scholar
Otto, M.: The boundedness problem for monadic universal first-order logic. In: LICS, pp. 37–48 (2006)
Papadimitriou, C.H., Yannakakis, M.: On the complexity of database queries. J. Comput. Syst. Sci. 58(3), 407–427 (1999)
Article MathSciNet Google Scholar
Yannakakis, M.: Algorithms for acyclic database schemes. In: VLDB, pp. 82–94 (1981)

Download references

Author information

Authors and Affiliations

DCC, University of Chile, IMFD, Santiago, Chile
Pablo Barceló
Department of Computer Science, University of Oxford, Oxford, UK
Miguel Romero
TU Dortmund, Dortmund, Germany
Thomas Zeume

Authors

Pablo Barceló
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Romero
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Zeume
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Barceló.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Special Issue on Database Theory (2018)

Barceló is funded by Millennium Institute for Foundational Research on Data and Fondecyt Grant 1170109. Zeume acknowledges the financial support by the European Research Council (ERC), grant agreement No 683080. Romero and Zeume thank the Simons Institute for the Theory of Computing for hosting them. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 714532). The paper reflects only the authors’ views and not the views of the ERC or the European Commission. The European Union is not liable for any use that may be made of the information contained therein.

Appendix

Proof

(Theorem 1) Fix k > 1. The CQ q is defined over graphs, i.e., over a schema with a single binary relation symbol E, and consists of k + 1 variables v₁,…,v_k+ 1. For every 1 ≤ i < j ≤ k + 1 we add either the atom (i.e., edge) E(v_i,v_j) or E(v_j,v_i) to q in such a way that the subgraph of G induced by {v₁,v₂,v₃} is a directed cycle and a certain condition (‡), defined below, holds. We start introducing some terminology.

Let G be a directed graph on nodes v₁,…,v_k+ 1 that contains, for each 1 ≤ i < j ≤ k + 1, either the edge E(v_i,v_j) or E(v_j,v_i). For a B ⊆{v₁,…,v_k+ 1} of size 1 ≤ł ≤ k − 1 and a node v ∈{v₁,…,v_k+ 1}∖ B, we define conn(v,B) as the tuple (e₁,…,e_k+ 1) ∈{− 1, 1, #}^k+ 1 such that for each 1 ≤ p ≤ k + 1:

$$ e_{p} \ = \ \left\{\begin{array}{lll} \#, & \text{if } v_p \not\in B, \\ 1, & \text{if } v_p \in B \text{ and the edge } E(v,v_p) \text{ is in } G, \\ -1, & \text{otherwise, i.e., } v_p \in B \text{ and } E(v_p,v) \text{ is in } G. \end{array}\right. $$

In simple terms, conn(v,B) specifies how v connects with the nodes in B.

Our condition (‡) then establishes the following:

(‡)
For each B ⊆{v₁,…,v_k+ 1} of size 2 ≤ł ≤ k − 1 and each node v in {v₁,…,v_k+ 1}∖ B, there is a node v^′∈{v₁,…,v_k+ 1}∖ B such that
$$ {\textsf conn}(v,B) \quad \neq \quad {\textsf conn}(v^{\prime},B). $$
That is, for each such B and v we will always be able to find another v^′ outside B that connects to the nodes in B in a different way than v.

Example 6

The graphs in Fig. 6 satisfy this condition for k = 2, 3, 4, respectively. Notice that the directed cycle on nodes {v₁,v₂,v₃}, shown in the left-hand side, satisfies condition (‡) trivially.

The next lemma establishes that for each k > 1 there is always a graph that satisfies this condition.

Lemma 9

For eachk > 1, there is adirected graph G on nodesv₁,…,v_k+ 1such that the following hold:

1.
For each 1 ≤ i < j ≤ k + 1,either the edgeE(v_i,v_j) orE(v_j,v_i) is in G;
2.
the subgraph of G induced by {v₁,v₂,v₃} is a directed cycle; and
3.
G satisfies condition(‡).

Proof

(Lemma 9) For k = 2 this is given by the graph in Example 6. For k ≥ 3 we prove by induction a stronger claim: There is a directed graph G on nodes v₁,…,v_k+ 1 such that:

1.
G contains either the edge E(v_i,v_j) or E(v_j,v_i) for each 1 ≤ i < j ≤ k + 1.
2.
The subgraph of G induced by {v₁,v₂,v₃} is a directed cycle.
3.
G contains the edges E(v₁,v₂) and E(v₄,v₃).
4.
G satisfies condition (‡).

The basis case k = 3 is given again by the graph in Example 6. For the inductive case, assume by induction hypothesis that there is a directed graph G on nodes v₁,…,v_k+ 1 that satisfies the claim above. A new graph G^′ is then created from G by adding a new node v_k+ 2 and connecting it to the nodes in {v₁,…,v_k+ 1} as follows: For each 1 ≤ i ≤ k, if E(v_i,v_i+ 1) is in G then we add the edge E(v_k+ 2,v_i) to G^′, otherwise we add the edge E(v_i,v_k+ 2). Moreover, if E(v_k+ 1,v₁) is in G then we add the edge E(v_k+ 2,v_k+ 1) to G^′, otherwise we add the edge E(v_k+ 1,v_k+ 2). Notice that G coincides with the subgraph of G^′ that is induced by nodes v₁,…,v_k+ 1. Moreover, by construction G^′ satisfies the first three conditions of the claim. We prove next that it also satisfies condition (‡).

Take an arbitrary B ⊆{v₁,…,v_k+ 2} of size 2 ≤ł ≤ k and a node v outside B. We prove that the condition holds by cases:

v_k+ 2∉B, v ∈{v₁,…,v_k+ 1}, and 2 ≤ł ≤ k − 1: By inductive hypothesis there is a node v^′∈{v₁,…,v_k+ 1}∖ B such that conn(v,B)≠conn(v^′,B).
v_k+ 2∉B, v ∈{v₁,…,v_k+ 1}, and ł = k: We set v^′ := v_k+ 2 and claim that the predecessor u of v in {v₁,…,v_k+ 1} distinguishes v and v^′. Here, the “predecessor” of v_i is v_i− 1 if 2 ≤ i ≤ k + 1, and the predecessor of v₁ is v_k+ 1 (note that u ∈ B as ł = k). By construction of G^′, we have that E(u,v) ∈ G^′ if and only if E(v^′,u) ∈ G^′. We conclude that conn(v,B)≠conn(v^′,B).
v_k+ 2∉B and v = v_k+ 2: There must exist some node v^′ in {v₁,…,v_k+ 1} that does not belong to B but its predecessor u in {v₁,…,v_k+ 1} does. Then by construction of G^′, we have that E(u,v^′) ∈ G^′ if and only if E(v,u) ∈ G^′. We conclude that conn(v,B)≠conn(v^′,B).
v_k+ 2 ∈ B and ł ≥ 3: Then B^′ = B ∖{v_k+ 2} is of size 2 ≤ł − 1 ≤ k − 1. By induction hypothesis, for every node v outside B^′ there is another node v^′∈{v₁,…,v_k+ 1}∖ B^′ such that conn(v,B^′)≠conn(v^′,B^′). This implies that conn(v,B)≠conn(v^′,B).
v_k+ 2 ∈ B and ł = 2: Then B = {v_k+ 2,u} for some u ∈{v₁,…,v_k+ 1}. Suppose first that u ∈{v₁,v₂,v₃}. Since the subgraph induced by {v₁,v₂,v₃} in G defines a directed cycle, it is the case that E(u,z) holds if and only if E(w,u) holds, where {u,w,z} = {v₁,v₂,v₃}. Therefore, for each v ∈{v₁,…,v_k+ 1}∖ B there is a node v^′∈{z,w} such that conn(v,{u})≠conn(v^′,{u}). It follows that conn(v,B)≠conn(v^′,B). Suppose now that u∉{v₁,v₂,v₃}. It suffices to exhibit two nodes v^′ and v^″ outside B such that E(v^′,v_k+ 2) and E(v_k+ 2,v^″). By induction hypothesis the edges E(v₁,v₂) and E(v₄,v₃) are in G^′. Therefore, v_k+ 2 is connected via edges E(v₃,v_k+ 2) and E(v_k+ 2,v₁) in G^′.

This concludes the proof of the lemma. □

Fix k ≥ 1. We then take as q any Boolean CQ whose canonical database is a graph G on nodes v₁,…,v_2k+ 1 that satisfies the conditions stated in Lemma 9. That is, (1) for each 1 ≤ i < j ≤ 2k + 1, either the edge E(v_i,v_j) or E(v_j,v_i) is in G, (2) the subgraph of G induced by {v₁,v₂,v₃} is a directed cycle, and (3) G satisfies condition (‡). It is easy to see that q is in GHW(k + 1) ∖GHW(k) as its underlying undirected graph is a clique on 2k + 1 elements. In fact, these elements can be covered with (k + 1) edges, but not with k.

We claim that q has no GHW(ł)-overapproximation for any 1 ≤ł ≤ k. The proofs for the cases when ł = 1 and ł > 1 are slightly different. We start with the latter, i.e., when 1 < ł ≤ k. The proof for every such an ł is analogous, and thus we concentrate on proving the claim for ł = k > 1. According to Theorem 7, we need to prove that there is no constant c ≥ 0 such that for every database $\mathcal {D}$ it holds that

$$ q \to_{k} \mathcal{D} \quad \Longleftrightarrow \quad q {\to_{k}^{c}} \mathcal{D}. $$

It is sufficient to show then that for each integer c ≥ 0 there is a database $\mathcal {D}$ such that

$$ q {\to_{k}^{c}} \mathcal{D} \ \ \text{ but } \ \ q \not\to_{k}^{c+1} \mathcal{D}. $$

Or, equivalently, that for each integer c ≥ 0 there is a database $\mathcal {D}$ such that

$$ q_{c} \to \mathcal{D} \ \ \text{ but } \ \ q_{c+1} \not\to \mathcal{D}, $$

where q_c, for c ≥ 0, is the CQ which is defined in Lemma 1, i.e., for every $\mathcal {D}$ it is the case that $q {\to _{k}^{c}} \mathcal {D}$ iff $q_{c} \to \mathcal {D}$. In view of (1), this boils down to proving that

$$ q_{c+1} \not\to q_{c}, \ \ \ \text{for each $c \geq 0$.} $$

(8)

We prove (8) by induction. The claim clearly holds for c = 0, as by definition q₀ is empty while q₁ is not. Let us assume now that the claim holds for c ≥ 0. That is, q_c+ 1↛q_c. This means, in particular, that the core of q_c+ 1 is not contained in q_c. That is, this core contains at least one node w in q_c+ 1 that does not belong to q_c.

By the way q is defined, any k-union of q must be of the form S ⊆{v₁,…,v_2k+ 1} with |S| = 2k. Let us consider now (T_c+ 1,β_c+ 1) as defined in the proof of Lemma 1. Since w∉q_c, it must be the case that there is a unique node t of T_c+ 1 such that w ∈ β_c+ 1(t). Moreover, this t must be a leaf of T_c+ 1. Suppose that ϕ_t(w) = v, for v ∈{v₁,…,v_2k+ 1}, where ϕ_t is as defined in the proof of Lemma 1, i.e., ϕ_t is a bijection between β_c+ 1(t) and the k-union S ⊆{v₁,…,v_2k+ 1} of q such that λ_c+ 1(t) = S.

Notice, by definition, that if the parent of t in T_c+ 1 is t^′, then either λ_c+ 1(t^′) = ∅ – which holds precisely when t^′ is the root of T_c+ 1 –, or λ_c+ 1(t^′) = S^′, where S^′ is the subset of {v₁,…,v_2k+ 1} which contains all elements save for v. That is, in the latter case we have that S^′ is obtained from S by replacing some element v^′ in {v₁,…,v_2k+ 1}, with v^′≠v, by v itself.

From Proposition 1, we can assume that the homomorphism that maps q_c+ 1 to its core is a retraction, i.e., it is the identity on the nodes of this core, in particular, on w. On the other hand, w is linked in q_c+ 1 exclusively with the remaining nodes that appear in β_c+ 1(t). Moreover, the graph induced by the nodes in λ_c+ 1(t) is a clique on 2k elements, and thus all the elements in β_c+ 1(t) must belong to the core of q_c+ 1.

Recall that ϕ_t(w) = v. Take an arbitrary node v^″∈ S that is not v. Notice that neither v^″ = v^′ as v^″∈ S, while v^′∉S. By definition, T_c+ 2 contains a leaf t^″ whose parent is t such that λ_c+ 2(t^″) = S^″, where S^″ is the subset of {v₁,…,v_2k+ 1} which is obtained from S by replacing v^″ with the unique node in {v₁,…,v_2k+ 1}∖ S, namely v^′. Let us assume that $\phi _{t^{\prime \prime }}(v^{\prime }) = w^{\prime \prime }$. Notice that w^″ appears in no other node in (T_c+ 2,β_c+ 2).

Assume now, for the sake of contradiction, that q_c+ 2 → q_c+ 1. Then the core of q_c+ 2 is the same than the core of q_c+ 1. Let C be this core. Henceforth, from Proposition 1 there is a retraction h from q_c+ 2 to C. Since all elements in β_c+ 2(t) = β_c+ 1(t) are in C, the homomorphism h must be the identity on them. But then h maps w^′ to the unique element in q_c+ 1 that is linked to exactly the same nodes than w^′ in q_c+ 2; namely, ϕ_t(v^″) = w^″.

Suppose that v^′ and v^″ represent the nodes v_i and v_j in {v₁,…,v_2k+ 1}, respectively. By assumption, i≠j. But this implies then that in the canonical database G of q we have that

$$ {\textsf conn}(v_{i}, B) \ = \ {\textsf conn}(v_{j}, B), $$

where B = {v₁,…,v_2k+ 1}∖{v_i,v_j}. This is a contradiction since B is of size 2k − 1 > 1 and G satisfies condition (‡). This concludes our proof that q has no GHW(k)-overapproximation (and, analogously, that it has no GHW(ł)-overapproximation for any 1 < ł ≤ k).

We prove next that q neither has a GHW(1)-overapproximation. Let us assume, for the sake of contradiction, that q has a GHW(1)-overapproximation q^′. It is an easy observation that the directed graphs in GHW(1) are precisely those whose underlying undirected graph is acyclic. Notice also that q^′ has no directed cycles of length two (i.e., atoms of the form E(u,v) and E(v,u)); otherwise, since q^′→ q, we would have that q also has such a cycle (which we know it does not). Using the fact that q^′∈GHW(1) and has no directed cycles of length two, it is not difficult to show (see e.g. [31]) that there is a sufficiently large integer n ≥ 1 such that, if P_n is the directed path on n vertices, then

$$ q^{\prime} \to \mathbf{P}_{n} \ \ \text{ but } \ \ \mathbf{P}_{n} \not\to q^{\prime}. $$

This implies that if q^″ is the Boolean CQ which is naturally defined by P_n, then $q^{\prime \prime } \subsetneq q^{\prime }$. Moreover, P_n → G. This is due to the fact that G contains a directed cycle on {v₁,v₂,v₃}. We conclude that

$$ q \subseteq q^{\prime\prime} \subsetneq q^{\prime}, $$

and, therefore, that q^′ is not a GHW(1)-overapproximation of q. This is a contradiction. We then conclude the proof of Theorem 1. □

Proof

(Lemma 8) Before proving the lemma, we need some terminology and claims. Let $\mathcal {D}$ be a database and (A₁,…,A_n) be a tuple of pairwise-disjoint subsets of elements of $\mathcal {D}$, where n ≥ 0. In addition, let $\mathcal {D}^{\prime }$ be a database and (a₁,…,a_n) a tuple of elements in $\mathcal {D}^{\prime }$. Then we write $(\mathcal {D},(A_{1},\dots ,A_{n}))\to (\mathcal {D}^{\prime },(a_{1},\dots ,a_{n}))$ iff there is a homomorphism h from $\mathcal {D}$ to $\mathcal {D}^{\prime }$ such that, for each i ∈{1,…,n} and a ∈ A_i, it is the case that h(a) = a_i.

For such a pair $(\mathcal {D},(A_{1},\dots ,A_{n}))$, with n ≥ 0, we define its generalized hypertreewidth in the natural way. The intuition is that we see $(\mathcal {D},(A_{1},\dots ,A_{n}))$ as a “query”, where A₁ ∪⋯ ∪ A_n are the “free variables” and the rest of the elements are the “existential variables”. Formally, a tree decomposition of $(\mathcal {D},(A_{1},\dots ,A_{n}))$ is a pair (T,χ), where T is a tree and χ is a mapping that assigns a subset of the elements in $\mathcal {D}\setminus (A_{1}\cup {\cdots } \cup A_{n})$ to each node t ∈ T, such that the following statements hold:

1.
For each atom $R(\bar a)$ in $\mathcal {D}$, it is the case that $\bar a\cap (\mathcal {D}\setminus (A_{1}\cup {\cdots } \cup A_{n}))$ is contained in χ(t), for some t ∈ T.
2.
For each element a in $\mathcal {D}\setminus (A_{1}\cup {\cdots } \cup A_{n})$, the set of nodes t ∈ T for which a occurs in χ(t) is connected.

The width of node t in (T,χ) is the minimal number ℓ for which there are ℓ atoms in $\mathcal {D}$ covering χ(t), i.e., atoms $R(\bar a_{1}),\dots ,R(\bar a_{\ell })$ in $\mathcal {D}$ such that $\chi (t)\subseteq \bigcup _{1\leq i \leq \ell } \bar a_{i}$ The width of (T,χ) is the maximal width of the nodes of T.

The generalized hypertreewidth of $(\mathcal {D},(A_{1},\dots ,A_{n}))$ is the minimum width of its tree decompositions.

By mimicking the proof of the forward implication of Proposition 3, we can show the following:

Lemma 10

Fixk ≥ 1.Let$q(\bar x),q^{\prime }(\bar x^{\prime })$beCQs, where$\bar x=(x_{1},\dots ,x_{n})$and$\bar x^{\prime }=(x_{1}^{\prime },\dots ,x_{n}^{\prime })$,forn ≥ 0.Suppose that$(q,\bar x)\to _{k} (q^{\prime },\bar x^{\prime })$.Then, for each database$\mathcal {D}$andtuple (A₁,…,A_n) of subsets of$\mathcal {D}$suchthat$(\mathcal {D},(A_{1},\dots ,A_{n}))$hasgeneralized hypertreewidth at most k, it is the case that

$$ \begin{array}{@{}rcl@{}} (\mathcal{D},(A_{1},\dots,A_{n}))\to (q,(x_{1},\dots,x_{n})) \quad \Longrightarrow \\ (\mathcal{D},(A_{1},\dots,A_{n}))\to (q^{\prime},(x_{1}^{\prime},\dots,x_{n}^{\prime})). \end{array} $$

Proof

Let $\mathcal {H}$ be a winning strategy for Duplicator witnessing the fact that $(q,\bar x)\to _{k} (q^{\prime },\bar x^{\prime })$. Let us assume that $(\mathcal {D},(A_{1},\dots ,A_{n}))$ has generalized hypertreewidth at most k, and that $(\mathcal {D},(A_{1},\dots ,A_{n}))\to (q,(x_{1},\dots ,x_{n}))$ is witnessed via a homomorphism h. Then we can compose h with the strategy $\mathcal {H}$ to define a homomorphism g witnessing $(\mathcal {D},(A_{1},\dots ,A_{n}))\to (q^{\prime },(x_{1}^{\prime },\dots ,x_{n}^{\prime }))$. The mapping g is defined in a top-down fashion over the tree decomposition (T,χ) of width at most k of $(\mathcal {D},(A_{1},\dots ,A_{n}))$. One starts at the root r of T, and forces Spoiler to play his pebbles over the set h(χ(r)). If Duplicator responds according to $\mathcal {H}$ with a partial homomorphism f_r, we then let g(a) = f_r(h(a)), for each a ∈ χ(r). We then move to each child of r and so on, until all leaves are reached and g is defined over all elements in $\mathcal {D}\setminus (A_{1}\cup \cdots \cup A_{n})$. Since Duplicator responds to Spoiler’s moves with consistent partial homomorphisms, we have that g is actually a well-defined homomorphism from $(\mathcal {D},(A_{1},\dots ,A_{n}))$ to $(q^{\prime },(x_{1}^{\prime },\dots ,x_{n}^{\prime }))$. □

Now we are ready to show our lemma. Suppose that $(q,\bar x)\to _{k}(q^{\prime },\bar x^{\prime })$, where $\bar x=(x_{1},\dots ,x_{n})$ and $\bar x^{\prime }=(x_{1}^{\prime },\dots ,x_{n}^{\prime })$, for some n ≥ 0. Assume that $(q^{\prime \prime },\bar x^{\prime \prime })\to (q^{\prime }\wedge q, \bar z)$ via a homomorphism h, for $q^{\prime \prime }(\bar x^{\prime \prime })\in \textsf {GHW}(k)$, and suppose that $\bar x^{\prime \prime }=(x_{1}^{\prime \prime },\dots ,x_{n}^{\prime \prime })$ and $\bar z=(z_{1},\dots ,z_{n})$. For each i ∈{1,…,n}, we define V_i to be the set of variables x in q^″ such that h(x) = z_i. In particular, $x_{i}^{\prime \prime }\in V_{i}$, for each i ∈{1,…,n}. We define V to be the set of variables x in q^″ such that h(x) = y, where y is an existentially quantified variable of q. Similarly, we define V^′ with respect to the existentially quantified variables of q^′. Note that the sets V,V^′,V₁,…,V_n form a partition of the variables of q^″.

Recall that $\mathcal {D}_{q^{\prime \prime }}$ be the canonical database of q^″. Since q^″∈GHW(k), we know that

$$ \left( \mathcal{D}_{q^{\prime\prime}}, (\{x_{1}^{\prime\prime}\},\dots,\{x_{n}^{\prime\prime}\})\right) $$

has generalized hypertreewidth at most k, as defined above. Let $\mathcal {D}_{V}$ be the database induced in $\mathcal {D}_{q^{\prime \prime }}$ by the set of variables V ∪ V₁ ∪⋯ ∪ V_n, i.e., the set of atoms $R(\bar t)\in \mathcal {D}_{q^{\prime \prime }}$ such that each element in $\bar t$ is in V ∪ V₁ ∪⋯ ∪ V_n. We now show that

$$ \left( \mathcal{D}_{V},(V_{1},\dots,V_{n})\right) $$

has also generalized hypertreewidth at most k. Indeed, let (T,χ) be the tree decomposition of $(\mathcal {D}_{q^{\prime \prime }}$, $(\{x_{1}^{\prime \prime }\},\dots ,\{x_{n}^{\prime \prime }\}))$ of width at most k. Define χ^′ such that for each t ∈ T, we have that χ^′(t) = χ(t) ∩ V. We claim that (T,χ^′) is a tree decomposition of $(\mathcal {D}_{V}$, (V₁,…,V_n)) of width at most k.

In fact, since (T,χ) is a tree decomposition, we have that, for each a ∈ V, it is the case that the set {t ∈ T∣a ∈ χ^′(t)} is connected; and for each atom $R(\bar a)\in \mathcal {D}_{V}$, there is a node t ∈ T such that $\bar a\cap V\subseteq \chi ^{\prime }(t)$. To see that the width of (T,χ^′) is bounded by k, let t be a node in T. Since the width of (T,χ) is at most k, there are ℓ atoms $R(\bar a_{1}),\dots ,R(\bar a_{\ell })$ in $\mathcal {D}_{q^{\prime \prime }}$, with ℓ ≤ k, such that $\chi (t)\subseteq \bigcup _{1\leq i \leq \ell } \bar a_{i}$. Let $R(\bar a_{i_{1}}),\dots ,R(\bar a_{i_{p}})$, where 1 ≤ i₁ < ⋯ < i_p ≤ ℓ and p ≤ ℓ, be the atoms in $\{R(\bar a_{1}),\dots ,R(\bar a_{\ell })\}$ that contain an element in χ^′(t). Since χ^′(t) ⊆ χ(t), it is the case that $\chi ^{\prime }(t)\subseteq \bigcup _{1\leq j \leq p} \bar a_{i_{j}}$. It suffices to show that each $R(\bar a_{i_{j}})$ is actually an atom in $\mathcal {D}_{V}$, for 1 ≤ j ≤ p. Towards a contradiction, suppose that this is not the case. Then, there is an atom in $\mathcal {D}_{q^{\prime \prime }}$ that contains simultaneously one variable in χ^′(t) ⊆ V and one variable in V^′. By the definitions of V^′ and V, and the fact that h is a homomorphism, it follows that there is an atom in $(q^{\prime }\wedge q)(\bar z)$ that mentions simultaneously one existentially quantified variable from q^′ and one from q; this contradicts the definition of $(q^{\prime }\wedge q)(\bar z)$. We conclude that the generalized hypertreewidth of $(\mathcal {D}_{V},(V_{1},\dots ,V_{n}))$ is at most k.

Recall that h is our initial homomorphism from $(q^{\prime \prime },\bar x^{\prime \prime })$ to $(q^{\prime }\wedge q, \bar z)$. Let h_V be the restriction of h to the set V ∪ V₁ ∪⋯ ∪ V_n. By construction,

$$ \left( \mathcal{D}_{V},(V_{1},\dots,V_{n})\right) \to \left( q,(x_{1},\dots,x_{n})\right) $$

via homomorphism h_V. We can then apply Lemma 10 and obtain that

$$ \left( \mathcal{D}_{V},(V_{1},\dots,V_{n})\right) \to \left( q^{\prime},(x_{1}^{\prime},\dots,x_{n}^{\prime})\right) $$

via a homomorphism h^′. We define our required homomorphism g from $(q^{\prime \prime },\bar x^{\prime \prime })$ to $(q^{\prime },\bar x^{\prime })$ as follows: if a ∈ V ∪ V₁ ∪⋯ ∪ V_n, then g(a) = h^′(a); otherwise, if a ∈ V^′, then g(a) = h(a). To see that g is a homomorphism, it suffices to consider an atom $R(\bar a)\in \mathcal {D}_{q^{\prime \prime }}$ such that $\bar a$ contains an element in V^′ and one element not in V^′, and show that $R(g(\bar a))\in \mathcal {D}_{q^{\prime }}$. Let A be the set of elements in $\bar a$ that are not in V^′. As mentioned above, there are no atoms in $\mathcal {D}_{q^{\prime \prime }}$ mentioning elements in V^′ and V simultaneously, thus A ⊆ V₁ ∪⋯ ∪ V_n. In particular, h(a) = h^′(a), for each a ∈ A. It follows that $R(g(\bar a))=R(h(\bar a))$, from which we conclude that $R(g(\bar a))\in \mathcal {D}_{q^{\prime }}$. □

Proof

(Proposition 10) Consider the Boolean CQ q from Fig. 2, defined as

$$ q = \exists x\exists y\exists z \left( P_{a}(x,y)\wedge P_{a}(y,x) \wedge P_{a}(y,z) \wedge P_{a}(z,y) \wedge P_{b}(z,x) \wedge P_{b}(x,z)\right), $$

and the CQ q^′ from the same figure defined by

$$ \begin{array}{@{}rcl@{}} q^{\prime}\ , = \exists x\exists y_{1}\exists y_{2}\exists z \left( P_{a}(x,y_{1})\wedge P_{a}(y_{1},x) \wedge P_{a}(y_{2},z)\right. \\ \left.\wedge P_{a}(z,y_{2}) \wedge P_{b}(z,x) \wedge P_{b}(x,z)\right). \end{array} $$

For each n ≥ 1, we define the CQ

$$ \begin{array}{@{}rcl@{}} q_{n} = \exists x_{1}{\cdots} \exists x_{n+1} \left( P_{a}(x_{1},x_{2})\wedge {\cdots} \wedge P_{a}(x_{n},x_{n+1})\wedge \right.\\ \left.P_{b}(x_{1},x_{1})\wedge P_{b}(x_{n+1},x_{n+1}\right). \end{array} $$

Observe that q^′∧ q_n ∈GHW(1), for each n ≥ 1. We now show that, for each n ≥ 1, q^′∧ q_n is an incomparable GHW(1)-Δ-approximation of q. As mentioned in Example 2, we have that q →₁q^′. In particular q →₁(q^′∧ q_n). Clearly, q↛(q^′∧ q_n). Also, q_n↛q since variables x₁ and x_n+ 1 of q_n cannot be mapped to any variable in q via a homomorphism. Therefore, (q^′∧ q_n)↛q. By Theorem 11, it follows that q^′∧ q_n is an incomparable GHW(1)-Δ-approximation of q.

Now we show that the CQs {q^′∧ q_n}_n≥ 1 form a family of non-equivalent CQs. First note that q_n↛q^′, for each n ≥ 1. Also, observe that q_i → q_j iff i = j, for i,j ≥ 1. It follows that for each i,j ≥ 1, such that i≠j, it is the case that (q^′∧ q_i)↛(q^′∧ q_j) and (q^′∧ q_j)↛(q^′∧ q_i). In particular, {q^′∧ q_n}_n≥ 1 is a family of non-equivalent CQs. □

Proof

(Proposition 11) As already mentioned, the c oNP upper bound follows directly from Theorem 11. For the lower bound, we consider the N on-Hom(H) problem, for a fixed directed graph H, which asks, given a directed graph G, whether G↛H. Let us assume that, for each k ≥ 1, there is a directed graph H_k such that:

1.
H_k ∈GHW(k), or more formally, the Boolean CQ $q_{H_k}$ whose canonical database is H_k belongs to GHW(k).
2.
N on-Hom(H_k) is c oNP-complete even when the input directed graph G satisfies that H_k↛G.

We later explain how to obtain these graphs H_k’s. Now we reduce from the restricted version of N on-Hom(H_k) given by item (2) above. Let G be a directed graph such that H_k↛G. We first check in polynomial time whether G →_kH_k. If G↛_kH_k, we output a fixed pair $q_0,q_0^{\prime }$ such that $q_0^{\prime }\in \textsf {GHW}(k)$ and $q_0^{\prime }$ is an incomparable GHW(k)-Δ-approximation of q₀. In case that G →_kH_k, we output the pair $q_{G}, q_{H_k}$, where q_G and $q_{H_k}$ are Boolean CQs whose canonical databases are precisely G and H_k, respectively. Since $q_{H_k}\in \textsf {GHW}(k)$ by item (1) above, the reduction is well-defined.

Suppose first that G↛H_k. If G↛_kH_k, then we are done, since $q_0^{\prime }$ is an incomparable GHW(k)-Δ-approximation of q₀. Otherwise, if G →_kH_k, since G↛H_k and H_k↛G (item (2) above), Theorem 11 implies that $q_{H_k}$ is an incomparable GHW(k)-Δ-approximation of q_G. On the other hand, assume that G → H_k. In particular, we have that G →_kH_k, and then, in this case, the reduction outputs the pair $q_{G}, q_{H_k}$. Since G → H_k, we conclude that $q_{H_k}$ is not an incomparable GHW(k)-Δ-approximation of q_G.

It remains to define the directed graph H_k. If k ≥ 2, it suffices to consider the clique on 2k vertices, that is, the directed graph K_2k whose vertex set is {1,…, 2k} and whose edges are {(i,j)∣i≠j, for i,j ∈{1,…, 2k}}. We have that K_2k ∈GHW(k), and thus item (1) above is satisfied. Also, we can reduce from the non-2k-colorability problem by replacing each undirected edge {u,v} of a given undirected graph G, by a directed edge in an arbitrary direction, e.g., from u to v. Clearly, this is a reduction from non-2k-colorability to N on-Hom(K_2k). Also note that the output f(G) of the reduction satisfies that K_2k↛f(G), as f(G) has no directed loops nor directed cycles of length 2. Therefore, item (2) above is satisfied. For k = 1, it is known from [30] that there is an oriented tree T (i.e., a directed graph whose underlying undirected graph is a tree and has no directed cycles of length 1 (loops) and 2) such that N on-Hom(T) is c oNP-complete. Since T is an oriented tree then it belongs to GHW(1), and then item (1) is satisfied. Also, by inspecting the reduction in [30], we have that item (2) also holds. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barceló, P., Romero, M. & Zeume, T. A More General Theory of Static Approximations for Conjunctive Queries. Theory Comput Syst 64, 916–964 (2020). https://doi.org/10.1007/s00224-019-09924-0

Download citation

Published: 10 May 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00224-019-09924-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A More General Theory of Static Approximations for Conjunctive Queries

Abstract

Access this article

Similar content being viewed by others

Order-Sensitive Domination in Partially Ordered Sets and Graphs

Satisfiability Modulo Theories

An Introduction to Answer Set Programming and Some of Its Extensions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Proof

Example 6

Lemma 9

Proof

Proof

Lemma 10

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A More General Theory of Static Approximations for Conjunctive Queries

Abstract

Access this article

Similar content being viewed by others

Order-Sensitive Domination in Partially Ordered Sets and Graphs

Satisfiability Modulo Theories

An Introduction to Answer Set Programming and Some of Its Extensions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Proof

Example 6

Lemma 9

Proof

Proof

Lemma 10

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation