Abstract
We study the complexity of consistent query answering on databases that may violate primary key constraints. A repair of such a database is any consistent database that can be obtained by deleting a minimal set of tuples. For every Boolean query q, CERTAINTY(q) is the problem that takes a database as input and asks whether q evaluates to true on every repair. In Koutris and Wijsen (ACM Trans. Database Syst. 42(2), 9:1–9:45, 2017), the authors show that for every selfjoinfree Boolean conjunctive query q, the problem CERTAINTY(q) is either in P or coNPcomplete, and it is decidable which of the two cases applies. In this article, we sharpen this result by showing that for every selfjoinfree Boolean conjunctive query q, the problem CERTAINTY(q) is either expressible in symmetric stratified Datalog (with some aggregation operator) or coNPcomplete. Since symmetric stratified Datalog is in L, we thus obtain a complexitytheoretic dichotomy between L and coNPcomplete. Another new finding of practical importance is that CERTAINTY(q) is on the logspace side of the dichotomy for queries q where all join conditions express foreigntoprimary key matches, which is undoubtedly the most common type of join condition.
This is a preview of subscription content, access via your institution.
Notes
 1.
The quotient graph of a directed graph G = (V,E) with respect to an equivalence relation ≡ on V is a directed graph whose vertices are the equivalence classes of ≡; there is a directed edge from class A to class B if E has a directed edge from some vertex in A to some vertex in B.
 2.
Here, α[Z ∪{w}] is the restriction of α to Z ∪{w}.
References
 1.
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. AddisonWesley, Boston (1995). http://webdam.inria.fr/Alice/
 2.
Arenas, M., Bertossi, L. E., Chomicki, J.: Consistent query answers in inconsistent databases. In: ACM PODS, pp. 68–79. https://doi.org/10.1145/303976.303983 (1999)
 3.
Arenas, M., Bertossi, L. E., Chomicki, J., He, X., Raghavan, V., Spinrad, J. P.: Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 296(3), 405–434 (2003). https://doi.org/10.1016/S03043975(02)007375
 4.
Aspvall, B., Plass, M. F., Tarjan, R. E.: A lineartime algorithm for testing the truth of certain quantified boolean formulas. Inf. Process. Lett. 8 (3), 121–123 (1979). https://doi.org/10.1016/00200190(79)900024
 5.
Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An introduction to description logic. Cambridge University Press, Cambridge (2017). http://www.cambridge.org/de/academic/subjects/computerscience/knowledgemanagementdatabasesanddatamining/introductiondescriptionlogic?format=PB#17zVGeWD2TZUeu6s.97
 6.
Barceló, P., Fontaine, G.: On the data complexity of consistent query answering over graph databases. J. Comput. Syst. Sci. 88, 164–194 (2017). https://doi.org/10.1016/j.jcss.2017.03.015
 7.
Bertossi, L. E.: Database repairing and consistent query answering. Synthesis lectures on data management. Morgan & Claypool Publishers, San Rafael (2011)
 8.
Bertossi, L. E.: Database repairs and consistent query answering: Origins and further developments. In: Suciu, D., Skritek, S., Koch, C. (eds.) Proceedings of the 38th ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30  July 5, 2019. https://doi.org/10.1145/3294052.3322190, pp 48–58. ACM (2019)
 9.
Bienvenu, M., Bourgaux, C.: Inconsistencytolerant querying of description logic knowledge bases. In: Pan, J.Z., Calvanese, D., Eiter, T., Horrocks, I., Kifer, M., Lin, F., Zhao, Y. (eds.) Reasoning Web: Logical foundation of knowledge graph construction and query answering  12th International Summer School 2016, Aberdeen, UK, September 59, 2016, Tutorial lectures, Lecture notes in computer science. https://doi.org/10.1007/9783319494937_5, vol. 9885, pp 156–202. Springer (2016)
 10.
Bulatov, A. A.: Complexity of conservative constraint satisfaction problems. ACM Trans. Comput. Log. 12(4), 24:1–24:66 (2011). https://doi.org/10.1145/1970398.1970400
 11.
Dixit, A. A., Kolaitis, P. G.: A SATbased system for consistent query answering. In: Janota, M., Lynce, I. (eds.) Theory and Applications of Satisfiability Testing  SAT 2019  22nd International Conference, SAT 2019, Lisbon, Portugal, July 912, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11628, pp 117–135. Springer (2019), https://doi.org/10.1007/9783030242589_8
 12.
Egri, L., Larose, B., Tesson, P.: Symmetric Datalog and constraint satisfaction problems in Logspace. In: LICS, pp. 193–202. https://doi.org/10.1109/LICS.2007.47 (2007)
 13.
Fontaine, G.: Why is it hard to obtain a dichotomy for consistent query answering? ACM Trans. Comput. Log. 16 (1), 7:1–7:24 (2015). https://doi.org/10.1145/2699912
 14.
Fuxman, A., Miller, R. J.: Firstorder query rewriting for inconsistent databases. In: ICDT, pp 337–351 (2005), https://doi.org/10.1007/9783540305705_23
 15.
Fuxman, A., Miller, R. J.: Firstorder query rewriting for inconsistent databases. J. Comput. Syst. Sci. 73(4), 610–635 (2007). https://doi.org/10.1016/j.jcss.2006.10.013
 16.
Grädel, E., Kolaitis, P. G., Libkin, L., Marx, M., Spencer, J., Vardi, M. Y., Venema, Y., Weinstein, S.: Finite model theory and its applications. Texts in theoretical computer science. An EATCS series springer. https://doi.org/10.1007/3540688048 (2007)
 17.
Greco, S., Pijcke, F., Wijsen, J.: Certain query answering in partially consistent databases. PVLDB 7(5), 353–364 (2014). http://www.vldb.org/pvldb/vol7/p353greco.pdf
 18.
Grohe, M., Schwentick, T.: Locality of orderinvariant firstorder formulas. ACM Trans. Comput. Log. 1(1), 112–130 (2000). https://doi.org/10.1145/343369.343386
 19.
Kolaitis, P.G., Pema, E., Tan, W.: Efficient querying of inconsistent databases with binary integer programming. PVLDB 6(6), 397–408 (2013). http://www.vldb.org/pvldb/vol6/p397tan.pdf
 20.
Koutris, P., Wijsen, J.: The data complexity of consistent query answering for selfjoinfree conjunctive queries under primary key constraints. In: PODS. https://doi.org/10.1145/2745754.2745769, pp 17–29 (2015)
 21.
Koutris, P., Wijsen, J.: Consistent query answering for selfjoinfree conjunctive queries under primary key constraints. ACM Trans. Database Syst. 42 (2), 9:1–9:45 (2017). https://doi.org/10.1145/3068334
 22.
Koutris, P., Wijsen, J.: Consistent query answering for primary keys and conjunctive queries with negated atoms. In: PODS, pp 209–224 (2018), https://doi.org/10.1145/3196959.3196982
 23.
Koutris, P., Wijsen, J.: Consistent query answering for primary keys in logspace. In: Barceló, P., Calautti, M. (eds.) 22nd International Conference on Database Theory, ICDT 2019, March 2628, 2019, Lisbon, Portugal, LIPIcs. Schloss Dagstuhl  LeibnizZentrum fuer Informatik, vol. 127, pp 23:1–23:19 (2019), https://doi.org/10.4230/LIPIcs.ICDT.2019.23
 24.
Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., Savo, D. F.: Inconsistencytolerant query answering in ontologybased data access. J. Web Sem. 33, 3–29 (2015). https://doi.org/10.1016/j.websem.2015.04.002
 25.
Libkin, L.: Elements of finite model theory. Texts in theoretical computer science. An EATCS series springer. https://doi.org/10.1007/9783662070031 (2004)
 26.
Lincoln, A., Williams, V. V., Williams, R. R.: Tight hardness for shortest cycles and paths in sparse graphs. In: ACMSIAM SODA. https://doi.org/10.1137/1.9781611975031.80, pp 1236–1252 (2018)
 27.
Lutz, C., Wolter, F.: On the relationship between consistent query answering and constraint satisfaction problems. In: ICDT. https://doi.org/10.4230/LIPIcs.ICDT.2015.363, pp 363–379 (2015)
 28.
Marileo, M. C., Bertossi, L. E.: The consistency extractor system: Answer set programs for consistent query answering in databases. Data Knowl. Eng. 69(6), 545–572 (2010). https://doi.org/10.1016/j.datak.2010.01.005
 29.
Maslowski, D., Wijsen, J.: A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci. 79(6), 958–983 (2013). https://doi.org/10.1016/j.jcss.2013.01.011
 30.
Maslowski, D., Wijsen, J.: Counting database repairs that satisfy conjunctive queries with selfjoins. In: ICDT, pp 155–164 (2014), https://doi.org/10.5441/002/icdt.2014.18
 31.
Pijcke, F.: Theoretical and practical methods for consistent query answering in the relational data model. Ph.D. thesis, University of Mons (2018)
 32.
Przymus, P., Boniewicz, A., Burzanska, M., Stencel, K.: Recursive query facilities in relational databases: a survey. In: FGIT. https://doi.org/10.1007/9783642176227_10, pp 89–99 (2010)
 33.
Reingold, O.: Undirected connectivity in logspace. J. ACM 55 (4), 17:1–17:24 (2008). https://doi.org/10.1145/1391289.1391291
 34.
Wijsen, J.: On the Firstorder expressibility of computing certain answers to conjunctive queries over uncertain databases. In: PODS. https://doi.org/10.1145/1807085.1807111, pp 179–190 (2010)
 35.
Wijsen, J.: Certain conjunctive query answering in firstorder logic. ACM Trans. Database Syst. 37(2), 9:1–9:35 (2012). https://doi.org/10.1145/2188349.2188351
 36.
Wijsen, J.: A survey of the data complexity of consistent query answering under key constraints. In: FoIKS. https://doi.org/10.1007/9783319049397_2, pp 62–78 (2014)
 37.
Wijsen, J.: Foundations of query answering on inconsistent databases. SIGMOD Rec. 48(3), 6–16 (2019). https://doi.org/10.1145/3377391.3377393
Author information
Affiliations
Corresponding author
Additional information
E : Proofs of Section 9
We will use the following helping lemma.
Lemma 19
Let q be a query in sjfBCQ that has the keyjoin property. Then, for all F,G ∈ q, if \(F\overset {q}{\rightsquigarrow }G\), then there exists a sequence \(F_{0},F_{1},\dots ,F_{\ell }\) such that F_{0} = F, F_{ℓ} = G, and for all \(i\in \{1,2,\dots ,\ell \}\), \({\mathsf {key}}({F_{i}})\subseteq {\mathsf {vars}}({F_{i1}})\).
Proof
Assume \(F\overset {q}{\rightsquigarrow }G\). We can assume a shortest sequence
that is a witness for \(F\overset {q}{\rightsquigarrow }G\). Clearly, for all \(i\in \{0,1,\dots ,\ell 1\}\), vars(F_{i}) ∩vars(F_{i+ 1})≠∅. Then, since q has the keyjoin property, for all \(i\in \{0,1,\dots ,\ell 1\}\), either

1.
vars(F_{i}) ∩vars(F_{i+ 1}) ∈{key(F_{i}),key(F_{i+ 1})}, or

2.
\({\mathsf {vars}}({F_{i}})\cap {\mathsf {vars}}({F_{i+1}})\supseteq {\mathsf {key}}({F_{i}})\cup {\mathsf {key}}({F_{i+1}})\).
We show by induction on increasing i that for all \(i\in \{1,\dots ,\ell \}\), \({\mathsf {key}}({F_{i}})\subseteq {\mathsf {vars}}({F_{i1}})\).Induction Basis i = 1 From \(x_{1}\notin {F_{0}}^{+,{q}}\), it follows x_{1}∉key(F_{0}). It follows that vars(F_{0}) ∩vars(F_{1})≠key(F_{0}). Consequently, vars(F_{0}) ∩vars(F_{1}) includes key(F_{1}).Induction Step \(i\rightarrow i+1\) The induction hypothesis is that \({\mathsf {key}}({F_{i}})\subseteq {\mathsf {vars}}({F_{i1}})\). Assume, towards a contradiction, vars(F_{i}) ∩vars(F_{i+ 1}) = key(F_{i}). It follows x_{i+ 1} ∈vars(F_{i− 1}). Then the witness (7) can be shortened by replacing the subsequence \(F_{i1}\stackrel {x_{i}}{\smallfrown }F_{i}\stackrel {x_{i+1}}{\smallfrown }F_{i+1}\) with \(F_{i1}\stackrel {x_{i+1}}{\smallfrown }F_{i+1}\), contradicting our assumption that no witness for \(F\overset {q}{\rightsquigarrow }G\) is shorter than (7). We conclude by contradiction that vars(F_{i}) ∩vars(F_{i+ 1})≠key(F_{i}). Consequently, vars(F_{i}) ∩vars(F_{i+ 1}) includes key(F_{i+ 1}). □
The proof of Theorem 4 can now be given.
Proof Proof of Theorem 4
Assume that q has the keyjoin property We show that the attack graph of q contains no strong attacks. To this end, assume \(F\stackrel {q}{\rightsquigarrow }G\). The sequence \(F_{0},F_{1},\dots ,F_{\ell 1}\) in the statement of Lemma 19 is a sequential proof for \({\mathcal {K}}({q})\models {{\mathsf {key}}({F_{0}})}\rightarrow {{\mathsf {key}}({F_{\ell }})}\), and therefore the attack \(F\overset {q}{\rightsquigarrow }G\) is weak. The result then follows from Theorem 3. □
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Database Theory (ICDT 2019)
Guest Editor: Pablo Baceló
This article extends an earlier, shorter version entitled “Consistent Query Answering for Primary Keys in Logspace” which was presented at the 22nd International Conference on Database Theory (ICDT 2019) [23] .
Appendices
Appendix A: Overview of Different Graphs and Notations
Graph  Vertices  Edge Notation  Short Description 

attack graph  query atoms  \(F\overset {q}{\rightsquigarrow }G\)  See Section 3. Informally, \(F\overset {q}{\rightsquigarrow }G\) means that there exists a “yes”instance of CERTAINTY(q) in which two keyequal Ffacts join with (and only with) two Gfacts that are not keyequal (cf. [35, Proposition 6.4]). 
Mgraph  query atoms  F→ _{M}G  Definition 3. Informally, F→ _{M}G states that the functional dependency \({{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({G})}\) is a logical consequence of the primary keys in atoms of mode c. 
↪graph  database facts  A↪B  Definition 4, datalevel instantiation of the Mgraph 
↪ _{C}graph  database facts  A↪ _{C}B  Definition 5, subgraph of the ↪graph induced by an Mcycle C 
blockquotient graph  database blocks  \(({\mathbf {b}},{\mathbf {b}}^{\prime })\)  Definition 6, quotient graph of the ↪ _{C}graph relative to the equivalence relation “is keyequal to” 
Notation  Meaning 

key(F)  the set of all variables occurring in the primary key of atom F 
vars(F)  the set of all variables occurring in atom F 
vars(q)  the set of all variables occurring in query q 
\(\sim \)  the equivalence relation “is keyequal to”, e.g., \(R(\underline {a},1)\sim R(\underline {a},2)\) 
rset(db)  the set of all repairs of a database db 
block(A,db)  the set of all facts in db that are keyequal to the fact A 
\(R(\underline {\vec {a}},\ast )\)  the set of all database facts of the form \(R(\underline {\vec {a}},\vec {b})\), for some \(\vec {b}\) 
s j f B C Q  the class of selfjoinfree Boolean conjunctive queries 
U C Q  the class of unions of conjunctive queries 
R ^{c}  a relation name of mode c, which must be interpreted by a consistent relation 
q ^{cons}  the set of all atoms of query q having a relation name of mode c 
\({\mathcal {K}}({q})\)  the set containing \({{\mathsf {key}}({F})}\rightarrow {{\mathsf {vars}}({F})}\) for every F ∈ q 
F ^{+,q}  the closure of key(F) with respect to the FDs in \({\mathcal {K}}({q\setminus \{F\}})\cup {\mathcal {K}}({{q}^{\mathsf {cons}}})\) 
genre_{q}(A)  the atom of q with the same relation name as the fact A 
V (G)  the vertex set of a graph G 
E(G)  the edge set of a graph G 
⊎  a set union that happens to be disjoint 
Appendix B: Proofs of Section 5
B.1 Proofs of Lemmas 1 and 2
Proof Proof of Lemma 1
Let o_{1} and o_{2} be garbage sets for q_{0} in db. For every i ∈{1, 2}, we can assume a repair r_{i} of o_{i} such that
Garbage Condition: for every valuation 𝜃 over vars(q) such that \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{i}})\cup {\mathbf {r}}_{i}\), we have 𝜃(q_{0}) ∩r_{i} = ∅.
Let \({\mathbf {o}}_{2}^{} = {\mathbf {o}}_{2}\setminus {\mathbf {o}}_{1}\) and \({\mathbf {r}}_{2}^{} = {\mathbf {r}}_{2}\setminus {\mathbf {o}}_{1}\). Then, \({\mathbf {r}}_{1}\uplus {\mathbf {r}}_{2}^{}\) is a repair of \({\mathbf {o}}_{1}\uplus {\mathbf {o}}_{2}^{}\), where the use of ⊎ (rather than ∪) indicates that the operands of the union are disjoint. Let 𝜃 be an arbitrary valuation over vars(q) such that
Then, \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{1}})\cup {\mathbf {r}}_{1}\). Consequently, by the Garbage Condition for i = 1, 𝜃(q_{0}) ∩r_{1} = ∅, and therefore 𝜃(q_{0}) ∩o_{1} = ∅. It follows \(\theta (q)\subseteq \left ({\mathbf {db}\setminus ({{\mathbf {o}}_{1}\cup {\mathbf {o}}_{2}})}\right )\cup {\mathbf {r}}_{2}^{}\), hence \(\theta (q)\subseteq \left ({\mathbf {db}\setminus {\mathbf {o}}_{2}}\right )\cup {\mathbf {r}}_{2}^{}\). Consequently, by the Garbage Condition for i = 2, \(\theta (q_{0})\cap {\mathbf {r}}_{2}^{}=\emptyset \). It follows that \({\mathbf {o}}_{1}\uplus {\mathbf {o}}_{2}^{}\)=o_{1} ∪o_{2} is a garbage set for q_{0} in db. □
Proof Proof of Lemma 2
The ⇐=direction is trivial. For the ⇒direction, assume that every repair of db satisfies q. We can assume a repair r_{0} of o such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}_{0}\), then 𝜃(q_{0}) ∩r_{0} = ∅. Let r be an arbitrary repair of db ∖o. It suffices to show r⊧q. Since r ∪r_{0} is a repair of db, we can assume a valuation 𝜃 over vars(q) such that \(\theta (q)\subseteq {\mathbf {r}}\cup {\mathbf {r}}_{0}\). Since \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}_{0}\) is obvious, it follows 𝜃(q) ∩r_{0} = ∅. Consequently, \(\theta (q)\subseteq {\mathbf {r}}\), hence r⊧q. This concludes the proof. □
B.2 Proof of Lemma 3
We will use two helping lemmas.
Lemma 13
Let q be a query in sjfBCQ, and let \(q_{0}\subseteq q\). Let o be a garbage set for q_{0} in db. If p is the union of one or more blocks of o, then o ∖p is a garbage set for q_{0} in db ∖p.
Proof
Let p be the union of one or more blocks of o. We can assume a repair r of o such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}\), then 𝜃(q) ∩r = ∅. Let s = r ∖p. Obviously, s is a repair of o ∖p.
Let 𝜃 be a valuation over vars(q) such that \(\theta (q)\subseteq \left ({({\mathbf {db}\setminus {\mathbf {p}}})\setminus ({{\mathbf {o}}\setminus {\mathbf {p}}})}\right )\cup {\mathbf {s}}\). It suffices to show 𝜃(q) ∩s = ∅. Since \(\left ({\mathbf {db}\setminus {\mathbf {p}}}\right )\setminus \left ({{\mathbf {o}}\setminus {\mathbf {p}}}\right )\subseteq \mathbf {db}\setminus {\mathbf {o}}\) and \({\mathbf {s}}\subseteq {\mathbf {r}}\), it follows \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}\), hence 𝜃(q) ∩r = ∅. It follows 𝜃(q) ∩s = ∅. □
Corollary 1
Let q be a query in sjfBCQ, and let \(q_{0}\subseteq q\). Let o be a garbage set for q_{0} in db. If every garbage set for q_{0} in db ∖o is empty, then o is the maximum garbage set for q_{0} in db.
Proof
Proof by contraposition. Assume that o is not the maximum garbage set for q_{0} in db. Let o_{0} be the maximum garbage set for q_{0} in db. By Lemma 13, o_{0} ∖o is a nonempty garbage set for q_{0} in db ∖o. □
Lemma 14
Let q be a query in sjfBCQ, and let \(q_{0}\subseteq q\). Let db be a database. If o is a garbage set for q_{0} in db, and p is a garbage set for q_{0} in db ∖o, then o ∪p is a garbage set for q_{0} in db.
Proof
Assume the hypothesis holds. Note that o ∩p = ∅. We can assume a repair r of o such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}\), then 𝜃(q) ∩r = ∅. Likewise, we can assume a repair s of p such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq \left ({({\mathbf {db}\setminus {\mathbf {o}}})\setminus {\mathbf {p}}}\right )\cup {\mathbf {s}}\), then 𝜃(q) ∩s = ∅. Obviously, r ∪s is a repair of o ∪p.
Let 𝜃 be a valuation over vars(q) such that \(\theta (q)\subseteq \left ({\mathbf {db}\setminus ({{\mathbf {o}}\cup {\mathbf {p}}})}\right )\cup ({{\mathbf {r}}\cup {\mathbf {s}}})\). From the set inclusion \(\left ({\mathbf {db}\setminus ({{\mathbf {o}}\cup {\mathbf {p}}})}\right )\cup ({{\mathbf {r}}\cup {\mathbf {s}}}) \subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}\), it follows \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}\), hence 𝜃(q) ∩r = ∅. Then, \(\theta (q)\subseteq \left ({\mathbf {db}\setminus ({{\mathbf {o}}\cup {\mathbf {p}}})}\right )\cup {\mathbf {s}} = \left ({({\mathbf {db}\setminus {\mathbf {o}}})\setminus {\mathbf {p}}}\right )\cup {\mathbf {s}}\), hence 𝜃(q) ∩s = ∅. It follows 𝜃(q) ∩ (r ∪s) = ∅. □
Corollary 2
Let q be a query in sjfBCQ, and let \(q_{0}\subseteq q\). Let db be a database, and let o be the maximum garbage set for q_{0} in db. Then, every garbage set for q_{0} in db ∖o is empty.
Proof
Immediate from Lemma 14. □
The proof of Lemma 3 can now be given.
Proof Proof of Lemma 3
Immediate from Corollaries 1 and 2. □
Appendix C: Appendix to Section 7
C.1 Proofs of Lemmas 5 and 6
Proof Proof of Lemma 5
We will write ⊕ for addition modulo k. We first consider garbage sets respecting the first three conditions.

Let A be a fact of db such that \({\mathsf {genre}}_{q}({A})\in \{F_{0},\dots ,F_{k1}\}\) and A has zero outdegree in the ↪ _{C}graph. Then, there exists no valuation 𝜃 over vars(q) such that \(A\in \theta (q)\subseteq \mathbf {db}\). It is obvious that block(A,db) is a garbage set for C in db.

Let \(A_{0}\stackrel {{~}_{C}}{\hookrightarrow }A_{1}\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }A_{k1}\stackrel {{~}_{C}}{\hookrightarrow }A_{0}\) be an irrelevant 1embedding of C in db. Assume without loss of generality that for every \(i\in \{0,\dots ,k1\}\), genre_{q}(A_{i}) = F_{i}. Let \({\mathbf {o}}=\bigcup _{i=0}^{k1}{\mathsf {block}}({A_{i}},{\mathbf {db}})\). Let \({\mathbf {r}}=\{A_{0},\dots ,A_{k1}\}\), which is obviously a repair of o. We show that o is a garbage set for C in db. Assume, toward a contradiction, the existence of a valuation 𝜃 over vars(q) such that for some \(i\in \{0,\dots ,k1\}\), \(A_{i}\in \theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}})\cup {\mathbf {r}}\). Then, 𝜃(F_{i})↪ _{C}𝜃(F_{i⊕1}). Since 𝜃(F_{i}) = A_{i}, we have A_{i}↪ _{C}𝜃(F_{i⊕1}). From A_{i}↪ _{C}𝜃(F_{i⊕1}) and A_{i}↪ _{C}A_{i⊕1}, it follows \(\theta (F_{i\oplus 1})\sim A_{i\oplus 1}\) by Lemma 4. Since 𝜃(F_{i⊕1}) ∈ (db ∖o) ∪r, it follows 𝜃(F_{i⊕1}) = A_{i⊕1}. By repeated application of the same reasoning, for every \(j\in \{0,\dots ,k1\}\), 𝜃(F_{j}) = A_{j}. But then \(A_{0}\stackrel {{~}_{C}}{\hookrightarrow }A_{1}\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }A_{k1}\stackrel {{~}_{C}}{\hookrightarrow }A_{0}\) is a relevant 1embedding of C in db, a contradiction.

Let r be a set containing all (and only) the facts of some nembedding of C in db with n ≥ 2. Let \({\mathbf {o}}=\bigcup _{A\in {\mathbf {r}}}{\mathsf {block}}({A},{\mathbf {db}})\). It can be shown that o is a garbage set for C in db; the argumentation is analogous to the reasoning in the previous paragraph.
Let o_{0} be the minimal subset of db that satisfies all conditions in the statement of the lemma except the recursive Condition 4. By Lemma 1 and our reasoning in the previous items, it follows that o_{0} is a garbage set for C in db.
Note that the first three conditions do not recursively depend on o_{0}. Starting with o_{0}, construct a maximal sequence
such that \({\mathbf {o}}_{0}\subsetneq {\mathbf {o}}_{1}\subsetneq {\mathbf {o}}_{2}\subsetneq \dotsm \subsetneq {\mathbf {o}}_{m+1}\) and for every \(h\in \{0,1,\dots ,m\}\),

1.
μ_{h} is a valuation over vars(q) such that \(\mu _{h}(q)\subseteq \mathbf {db}\) and μ_{h}(q) ∩o_{h}≠∅. Therefore, \(\mu (F_{0})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{1})\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{k1})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{0})\) is a relevant 1embedding of C in db; and

2.
\({\mathbf {o}}_{h+1}={\mathbf {o}}_{h}\cup \left ({\bigcup _{i=0}^{k1}{\mathsf {block}}({\mu _{h}(F_{i})},{\mathbf {db}})}\right )\).
It is clear that the final set o_{m+ 1} is a minimal set satisfying all conditions in the statement of the lemma. We show by induction on increasing h that for all \(h\in \{0,1,\dots ,m,m+1\}\), o_{h} is a garbage set for C in db. We have already showed that o_{0} is a garbage set for C in db. For the induction step, \(h\rightarrow h+1\), the induction hypothesis is that o_{h} is a garbage set for C in db. Then, there exists a repair r of o_{h} such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h}})\cup {\mathbf {r}}\), then 𝜃(q) ∩r = ∅. For every \(i\in \{0,\dots ,k1\}\), define A_{i} := μ_{h}(F_{i}). Let \({\mathbf {s}}=\{A_{0},\dots ,A_{k1}\}\setminus {\mathbf {o}}_{h}\). We have \({\mathbf {o}}_{h+1}={\mathbf {o}}_{h}\uplus \left ({\bigcup _{A_{j}\in {\mathbf {s}}}{\mathsf {block}}({A_{j}},{\mathbf {db}})}\right )\). Let \({\mathbf {r}}^{\prime }={\mathbf {r}}\uplus {\mathbf {s}}\). Obviously, \({\mathbf {r}}^{\prime }\) is a repair of o_{h+ 1}. Here, we use ⊎, rather than ∪, to make clear that the operands of the union are disjoint. Assume, toward a contradiction, the existence of a valuation 𝜃 over vars(q) such that \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h+1}})\cup {\mathbf {r}}^{\prime }\) and \(\theta (q)\cap {\mathbf {r}}^{\prime }\neq \emptyset \). Since \(({\mathbf {db}\setminus {\mathbf {o}}_{h+1}})\cup {\mathbf {r}}^{\prime }\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h}})\cup {\mathbf {r}}\), it follows \(\theta (q)\subseteq ({\mathbf {db}\setminus {\mathbf {o}}_{h}})\cup {\mathbf {r}}\), hence 𝜃(q) ∩r = ∅ by our initial hypothesis. It must be the case that 𝜃(q) ∩s≠∅. We can assume \(i\in \{0,\dots ,k1\}\) such that A_{i} ∈ 𝜃(q) ∩s. We have 𝜃(F_{i})↪ _{C}𝜃(F_{i⊕1}). Since 𝜃(F_{i}) = A_{i}, we have A_{i}↪ _{C}𝜃(F_{i⊕1}). From A_{i}↪ _{C}𝜃(F_{i⊕1}) and A_{i}↪ _{C}A_{i⊕1}, it follows \(\theta (F_{i\oplus 1})\sim A_{i\oplus 1}\) by Lemma 4. Therefore, 𝜃(F_{i⊕1}) ∈block(A_{i⊕1},db). Two cases are possible:
 Case that \({\mathsf {block}}({A_{i\oplus 1}}, \mathbf {db})\subseteq {\mathbf {o}}_{h}\).:

Since 𝜃(F_{i⊕1}) ∈ (db ∖o_{h}) ∪r, it must be the case that 𝜃(F_{i⊕1}) ∈r. However, since we have previously argued that 𝜃(q) ∩r = ∅, we conclude that this case cannot occur.
 Case that \({\mathsf {block}}({A_{i\oplus 1}},{\mathbf {db}})\not \subseteq {\mathbf {o}}_{h}\).:

By our definition of s, we have A_{i⊕1} ∈s. Since \(\theta (F_{i\oplus 1})\in ({\mathbf {db}\setminus {\mathbf {o}}_{h+1}})\cup {\mathbf {r}}^{\prime }\), it must be the case that 𝜃(F_{i⊕1}) ∈s, and therefore 𝜃(F_{i⊕1}) = A_{i⊕1}.
From the above cases, it follows that A_{i⊕1} ∈ 𝜃(q) ∩s. By repeating the same reasoning, we obtain that A_{j} ∈ 𝜃(q) ∩s for all \(j\in \{0,\dots ,k1\}\). Since μ_{h}(q) ∩o_{h}≠∅ by our construction, we can assume the existence of \(\ell \in \{0,\dots ,k1\}\) such that A_{ℓ} ∈o_{h}, hence A_{ℓ}∉s, which contradicts our earlier finding that each A_{j} belongs to 𝜃(q) ∩s. This concludes the induction step. It is correct to conclude that o_{m+ 1} is a garbage set for C in db.
Let \(\mathbf {db}^{\prime }=\mathbf {db}\setminus {\mathbf {o}}_{m+1}\). We show that the garbage set for C in \(\mathbf {db}^{\prime }\) is empty. Assume, toward a contradiction, that o is a nonempty garbage set for C in \(\mathbf {db}^{\prime }\). We can assume a repair r of o such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}\), then 𝜃(q) ∩r = ∅.
We show that for any A ∈r, the ↪ _{C}graph contains an infinite path that starts from A such that any vertex on the path belongs to \(({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}\) and any (contiguous) subpath of length k contains some fact from r. To this end, let A be a fact of r. By our construction, there exists a valuation μ over vars(q) such that \(A\in \mu (q)\subseteq \mathbf {db}^{\prime }\) (otherwise A would belong to o_{m+ 1}). Hence, \(\mu (F_{0})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{1})\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{k1})\stackrel {{~}_{C}}{\hookrightarrow }\mu (F_{0})\) is a relevant 1embedding of C in \(\mathbf {db}^{\prime }\) that contains A. Then, for some \(i\in \{0,\dots ,k1\}\), it must be the case that \(\mu (F_{i})\not \in ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}\) (or else \(\mu (q)\subseteq ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}\) and μ(q) ∩r≠∅, a contradiction). Therefore, the ↪ _{C}graph contains a shortest path π of length < k from A to some fact B ∈o ∖r. Then, there exists \(B^{\prime }\in {\mathbf {r}}\) such that \(B^{\prime }\sim B\) and the ↪ _{C}graph contains a path of length < k from A to \(B^{\prime }\). This path is obtained by substituting \(B^{\prime }\) for B in π. Since \(B^{\prime }\in {\mathbf {r}}\), we can continue the path by applying the same reasoning as for A. The path is illustrated by Fig. 10. Since the directed path is infinite, it has a shortest finite subpath of length ≥ k whose first vertex is keyequal to its last vertex. Let D be the last but one vertex on this subpath. Since the ↪ _{C}graph contains a directed edge from D to the first vertex of the subpath, it contains a cycle of some length nk with n ≥ 1. Since this cycle is obviously an nembedding of C in \(\mathbf {db}^{\prime }=\mathbf {db}\setminus {\mathbf {o}}_{m+1}\), it must be a relevant 1embedding of C in \(\mathbf {db}^{\prime }\) which, moreover, contains some fact of r. Therefore, there exists a valuation μ over vars(q) such that \(\mu (q)\subseteq ({\mathbf {db}^{\prime }\setminus {\mathbf {o}}})\cup {\mathbf {r}}\) and μ(q) ∩r≠∅, a contradiction.
Since the garbage set for db ∖o_{m+ 1} is empty, it follows by Lemma 3 that o_{m+ 1} is the maximum garbage set for C in db. This concludes the proof. □
Proof Proof of Lemma 6
For the first item, let \(A\stackrel {{~}_{C}}{\hookrightarrow }A^{\prime }\) be any edge of the nembedding. We can assume \(F,F^{\prime }\in C\) such that \(F\stackrel {\mathsf {{~}_{M}}}{\longrightarrow } F^{\prime }\), genre_{q}(A) = F, and \({\mathsf {genre}}_{q}({A^{\prime }})=F^{\prime }\). Then, the blockquotient graph will contain a directed edge from block(A,db) to \({\mathsf {block}}({A^{\prime }},{\mathbf {db}})\). It is then obvious that \(({\mathbf {b}}_{0},{\mathbf {b}}_{1},\dots ,{\mathbf {b}}_{nk1},{\mathbf {b}}_{0})\) is a directed cycle in the blockquotient graph; this cycle is elementary because no two distinct facts of an nembedding are keyequal.
For the second item, let \(i\in \{0,\dots ,nk1\}\). Since (b_{i},b_{i+ 1 mod nk}) is an edge in the blockquotient graph, we can assume A_{i} ∈b_{i} and \(A^{\prime }\in {\mathbf {b}}_{i+1\mod nk}\) such that \(A_{i}\stackrel {{~}_{C}}{\hookrightarrow }A^{\prime }\). By Lemma 4, it will be the case that \(A_{0}\stackrel {{~}_{C}}{\hookrightarrow }A_{1}\stackrel {{~}_{C}}{\hookrightarrow }\dotsm \stackrel {{~}_{C}}{\hookrightarrow }{A_{nk1}}\stackrel {{~}_{C}}{\hookrightarrow }A_{0}\). Furthermore, the latter ↪ _{C}cycle is an nembedding. Indeed, since the cycle \(({\mathbf {b}}_{0},{\mathbf {b}}_{1},\dots ,{\mathbf {b}}_{nk1},{\mathbf {b}}_{0})\) is elementary, no two distinct A_{i}s are keyequal. This concludes the proof. □
C.2 Proof of Lemma 8
We will use the following helping lemma. If G is a directed graph, then a directed cycle in G of length k is called a kcycle.
Lemma 15
Let G = (V,E) be an instance of LONGCYCLE(k). Let \(\widehat {G}=(\widehat {V},\widehat {E})\) be the undirected graph whose vertices are the kcycles of G. There is an undirected edge between any two distinct kcycles P_{1} and P_{2} if V (P_{1}) ∩ V (P_{2})≠∅. Then, the following are equivalent:

1.
\(\widehat {G}\) has a chordless cycle of length ≥ 2k or G has an elementary directed cycle of length nk with 2 ≤ n ≤ 2k − 3.

2.
G contains an elementary directed cycle of length ≥ 2k.
Proof
Since the graph G is kpartite, every kcycle is elementary.
Assume that 1 holds true. The result is obvious if there exists n such that 2 ≤ n ≤ 2k − 3 and G has an elementary cycle of length nk. Assume next that \(\widehat {G}\) has a chordless elementary cycle \((P_{0}, P_{1}, \dots , P_{m1}, P_{0})\) of length m ≥ 2k. We construct a cycle C in G using the following procedure. The construction will define a labeling function ℓ from the vertices in C to \(\{0,1,\dots ,m1\}\). It will be the case that w ∈ V (P_{ℓ(w)}) for every vertex w in C. We start with any vertex v_{0} ∈ V (P_{m− 1}) ∩ V (P_{0}) and define its label as \(\ell (v_{0})\mathrel {\mathop :}= 0\). At any point of the procedure, if we are at vertex u with label ℓ(u), we choose the next vertex w in C to be the next vertex in the kcycle P_{ℓ(u)}. If ℓ(u) < m − 1 and w also belongs to P_{ℓ(u)+ 1}, we let \(\ell (w)\mathrel {\mathop :}=\ell (u)+1\); otherwise \(\ell (w)\mathrel {\mathop :}=\ell (u)\). The procedure terminates when we attempt to add a vertex that already exists in C, and therefore C will be elementary.
We first show that the termination condition will not be met for any vertex distinct from v_{0}. Suppose, toward a contradiction, that the sequence constructed so far is \(C = \langle {v_{0}, v_{1}, \dots , v_{n}}\rangle \), ℓ(v_{n}) = i ≤ m − 1, and the next vertex in P_{i} is some v_{j} with \(j\in \{1, \dots , n1\}\). Since v_{j} belongs to both P_{i} and \(P_{\ell (v_{j})}\), it must be the case that ℓ(v_{j}) ≥ i − 1, because otherwise \(\{P_{i},P_{\ell (v_{j})}\}\) is a chord in \((P_{0}, P_{1}, \dots , P_{m1}, P_{0})\), a contradiction. We now distinguish two cases:
 Case ℓ(v_{j}) = i − 1.:

Then, v_{j} ∈ V (P_{i− 1}) ∩ V (P_{i}). By the procedure, this means that ℓ(v_{j− 1}) = i − 2. Indeed, if ℓ(v_{j− 1}) = i − 1, then the procedure would have set ℓ(v_{j}) to i, because v_{j} also belongs to P_{i}. But then this also implies that v_{j} ∈ V (P_{i− 2}), a contradiction to the fact that the cycle is chordless.
 Case ℓ(v_{j}) = i.:

Then the procedure reaches a vertex on P_{i} that has been visited before. Therefore, starting with this previously visited vertex on P_{i}, the procedure has entirely traversed P_{i} without ever reaching a vertex of P_{i+ 1 mod m}, contradicting that P_{i} and P_{i+ 1 mod m} have a vertex in common.
It is now clear that at some point we will reach v_{0}. Indeed, when the label becomes m − 1, the procedure will follow the edges of P_{m− 1} until it reaches v_{0}. We have that ℓ(v_{0}) = 0, and the procedure is such that if some vertex has label i with i < m − 1, then there is a vertex with label i + 1. Therefore, for every \(i\in \{0,1,\dots ,m1\}\), there exists at least one vertex u in C such that ℓ(u) = i. Therefore, C has at least m vertices. Since m ≥ 2k, the cycle C has length ≥ 2k.
Assume that

G contains an elementary directed cycle of length ≥ 2k, and

for all 2 ≤ n ≤ 2k − 3, G contains no elementary directed cycle of length nk.
We will show that \(\widehat {G}\) contains a chordless cycle of length ≥ 2k.
We first introduce some notions that will be useful in the proof. A subpath of a directed path is a consecutive subsequence of edges of that path. Every path is a subpath of itself. We write start(π) and end(π) to denote, respectively, the first and the last vertex of a directed path π. If \(\mathsf {end}({\pi })=\mathsf {start}({\pi ^{\prime }})\), then \(\pi \cdot \pi ^{\prime }\) denotes the concatenation of paths π and \(\pi ^{\prime }\). The length of a (possibly closed) elementary path π is the number of edges it contains, and is denoted length(π).Covering Let O be an elementary cycle in G of size ≥ 2k. A seam in O is a subpath of O that is also a subpath of some kcycle. Obviously, every seam in O has length < k. A covering of O is a set of edgedisjoint seams in O such that every edge of O is an edge of some seam in the set. Since every edge of G belongs to some kcycle by our hypothesis, O has a covering. We define \({\mathit {seamlength}}({O})\mathrel {\mathop :}=\ell \) if O has a covering of cardinality ℓ and every covering of O has cardinality ≥ ℓ.Cyclic Ordering of the Seams in a Covering Let \(C=\{S_{0},S_{1},\dots ,S_{\ell 1}\}\) be a covering of O. From here on, we will assume that the seams are listed such that a traversal of O that starts with start(S_{0}) traverses the seams of C in the order S_{0}, S_{1}, …, S_{ℓ− 1}.
Let O be a directed cycle of length ≥ 2k that minimizes seamlength(⋅). From here on, ℓ denotes seamlength(O). Thus, every elementary cycle \(O^{\prime }\) in G of length ≥ 2k satisfies \({\mathit {seamlength}}({O^{\prime }})\geq \ell \). Let \(\{S_{0},S_{1},\dots ,S_{\ell 1}\}\) be a covering of O.
Our hypothesis is that for every directed cycle of length nk in G such that n ≥ 2, we have n > 2k − 3. Consequently, length(O) ≥ (2k − 2)k. For every \(i\in \{0,\dots ,\ell 1\}\), we have length(S_{i}) ≤ k − 1 (because O is elementary with length(O) ≥ 2k). Therefore, \((2k2)k\leq {\mathit {length}}({O})={\sum }_{i=0}^{\ell 1}{\mathit {length}}({S_{i}})\leq \ell (k1)\), which implies ℓ ≥ 2k.
For every \(i\in \{0,\dots ,\ell 1\}\), let P_{i} be a kcycle of which S_{i} is a subpath. We define the fitness of P_{i} as \({\mathit {length}}({S_{i}^{\prime }})\) if \(S_{i}^{\prime }\) is the longest subpath of P_{i} that has S_{i} as a subpath and that is still a seam in O. Note that the fitness of P_{i} is at least length(S_{i}). For a reason that will become apparent shortly, if multiple choices for the kcycle P_{i} are possible, we will choose a kcycle with the greatest fitness. Assume, toward a contradiction, that the subgraph of \(\widehat {G}\) induced by \(\{P_{0},P_{1},\dots ,P_{\ell 1}\}\) has a cycle chord. We can assume without loss of generality \(m\in \{2,\dots ,\ell 2\}\) and a path \((P_{0},P_{1},\dots ,P_{m1},P_{m})\) in \(\widehat {G}\) such that \(\{P_{0},P_{m}\}\in E(\widehat {G})\), while the paths \((P_{0},P_{1},\dots ,P_{m1})\) and \((P_{1},\dots ,P_{m1},P_{m})\) are chordless. From \(\{P_{0},P_{m}\}\in E(\widehat {G})\), it follows that V (P_{0}) ∩ V (P_{m})≠∅. We have V (S_{0}) ∩ V (S_{m}) = ∅. Let π be the closed directed path in G that, starting from start(S_{m}), traverses P_{m} until a vertex (call it x) of P_{0} is reached. From x on, the path π follows P_{0} until end(S_{0}) is reached, and then traverses \(S_{1},S_{2},\dots ,S_{m1}\). Note that it is possible that x ∈ V (S_{m}) or x ∈ V (S_{0}) (but not both). We argue next that π is an elementary cycle.
The edges of π that are not in O belong either to the subpath (call it π_{m}) of P_{m} that goes from end(S_{m}) to x, or to the subpath (call it π_{0}) of P_{0} that goes from x to start(S_{0}). Note that π_{m} exists only if x∉V (S_{m}), and π_{0} exists only if x∉V (S_{0}). Assume toward a contradiction that π is not elementary. From our hypotheses and construction, it must be the case that π_{m} intersects S_{m− 1} in some vertex y, or that π_{0} intersects S_{1} in some vertex z. These possibilities are depicted in Fig 11. If this happens, however, \(P_{m}^{\prime }\) and \(P_{0}^{\prime }\) have a strictly greater fitness than P_{m} and P_{0}, contradicting that we chose kcycles with the greatest fitness. Here, \(P_{m}^{\prime }\) is the kcycle that, starting from end(S_{m− 1}) = start(S_{m}), traverses P_{m} until y, and then follows P_{m− 1} from y until end(S_{m− 1}). Similarly, \(P_{0}^{\prime }\) is the kcycle that, starting from end(S_{0}) = start(S_{1}), traverses P_{1} until z, and then follows P_{0} from z until end(S_{0}). To see that \(P_{m}^{\prime }\) has a strictly greater fitness than P_{m}, note that the subpath of \(P_{m}^{\prime }\) from y to end(S_{m}) is a seam of O. Since x∉V (S_{m− 1}), P_{m} will cover a strictly smaller suffix of S_{m− 1} than \(P_{m}^{\prime }\) does.
We show that both length(π) = k and length(π) ≥ 2k lead to a contradiction.

Assume that π is a kcycle. Then either \(S_{0}\cdot S_{1}\cdot \dotsm \cdot S_{m1}\) is a seam of O or \(S_{1}\cdot S_{2}\cdot \dotsm \cdot S_{m}\) is a seam of O. Since m ≥ 2, we can use π to construct a covering of O of cardinality < ℓ, a contradiction.

Assume that length(π) ≥ 2k. It can be easily seen that π has a covering of cardinality m + 1 < ℓ, which contradicts our assumption about O.
□
The proof of Lemma 8 can now be given.
Proof Proof of Lemma 8
Let G = (V,E) be an instance of LONGCYCLE(k). Let \(\widehat {G}=(\widehat {V},\widehat {E})\) be the undirected graph defined in the statement of Lemma 15. Obviously, it suffices to show that Condition 1 in the statement of Lemma 15 can be expressed in SymStratDatalog.
All elementary cycles in G of length nk for 2 ≤ n ≤ 2k − 3 can obviously be found in FO. We now outline a program in SymStratDatalog that tests for the existence of chordless cycles in \(\widehat {G}\) of length ≥ 2k. The graph \(\widehat {G}\) can be constructed in SymStratDatalog. Then, the existence of a chordless cycle of length ≥ 2k can be tested as follows: Check whether there exists a path \((P_{0},P_{1},P_{2},\dots ,P_{2k2},P_{2k1},P_{2k})\) such that (i) the subpath \((P_{1},\dots ,P_{2k1})\) is elementary and chordless, and (ii) the endpoints P_{0} and P_{2k} are also connected by another (possibly singlevertex) path that uses no vertex that is equal or adjacent to a vertex in \(\{P_{2},\dots ,P_{2k2}\}\). In particular, P_{0} and P_{2k} themselves must then be distinct from and not adjacent to the vertices in \(\{P_{2},\dots ,P_{2k2}\}\), and, consequently, P_{0}≠P_{1} and P_{2k}≠P_{2k− 1}. The singlevertex path occurs if P_{0} = P_{2k}.
We now give the details of the SymStratDatalog program. The following rule states that the vertices of \(\widehat {G}\) are the kcycles of G.
Note incidentally that every kcycle is stored k times in this way. Since the graph G is kcirclelayered (see Definition 7), we can assume some fixed partition \(V_{0},V_{1},\dots ,V_{k1}\) of the vertex set V. We will say that the IDB fact \(\widehat {V}(a_{0},\dots ,a_{k1})\) is of class V_{i} if a_{0} ∈ V_{i}. Thus, if \(\widehat {V}(a_{0},a_{1},\dots ,a_{k1})\) is of class V_{i}, then \(\widehat {V}(a_{1},\dots ,a_{k1},a_{0})\) is of class V_{i+ 1 mod k}. If one partition class would be given as a part of the input, for example as EDB facts V0(a), then an optimization consists in adding V0(x_{0}) to the body of the previous rule.
We will need an equality test on vertices of \(\widehat {G}\):
The use of the semicolon is for readability only. The following rules compute edges in \(\widehat {G}\). For every \(\ell \in \{0,\dots ,k1\}\), add the rules:
Note that whenever \(\mathit {\widehat {E}}(a_{0},\dots ,a_{k1};b_{0},\dots ,b_{k1})\) holds true, then \(\widehat {V}(a_{0},\dots ,a_{k1})\) and \(\widehat {V}(b_{0},\dots ,b_{k1})\) will be IDB \(\widehat {V}\)facts of the same class. In fact, it is sufficient to compute chordless cycles all of whose \(\widehat {V}\)facts are of the same class. From here on, we write \(\vec {x}\) for the sequence \(\langle {x_{0},\dots ,x_{k1}}\rangle \). Superscripts are used to create new variables: x^{(i)} and x^{(j)} are distinct variables unless i = j. Finally, \({\vec {x}}^{(i)}\) is the sequence \({x_{0}}^{(i)},\dots ,{x_{k1}}^{(i)}\). Likewise for \(\vec {y}=\langle {y_{0},\dots ,y_{k1}}\rangle \), \(\vec {z}=\langle {z_{0},\dots ,z_{k1}}\rangle \), and \(\vec {w}=\langle {w_{0},\dots ,w_{k1}}\rangle \). Add the following rule, as well as its symmetric rule:
\(\mathit {UCon}(\vec {a},\vec {b},\vec {c}_{1},\dots ,\vec {c}_{2k3})\) holds true if \(\widehat {G}\) contains an undirected path between \(\vec {a}\) and \(\vec {b}\) such that no vertex on the path is equal or adjacent to some \(\vec {c}_{i}\). The basis of the recursion is the following rule:
Finally, the following rule tests for the existence of a chordless cycle in \(\widehat {G}\) of length ≥ 2k.
This concludes the proof. □
C.3 Illustration of the Datalog Program in the Proof of Lemma 9
The following example illustrates the Datalog program in the proof of Lemma 9.
Example 5
Let \(q=\{R(\underline {x},y,z), S(\underline {y},x,z), U(\underline {z},a)\}\), where a is a constant. We show a program in symmetric stratified Datalog that computes the garbage set for the Mcycle \(C=R(\underline {x},y,z)\stackrel {\mathsf {{~}_{M}}}{\longrightarrow } S(\underline {y},x,z)\stackrel {\mathsf {{~}_{M}}}{\longrightarrow } R(\underline {x},y,z)\). In this example, k = 2. The program is constructed as in the proof of Lemma 9.
Rfacts and Sfacts belong to the maximum garbage set if they do not belong to a relevant 1embedding. This is expressed by the following rules.
If some Rfact or Sfact of a relevant 1embedding belongs to the maximum garbage set, then every fact of that 1embedding belongs to the maximum garbage set. This is expressed by the following rules.
Note that the predicates GarbageR and GarbageS refer to blocks: whenever a fact is added to the garbage set, its entire block is added. The following rules compute irrelevant 1embeddings.
The predicate \(\mathsf {\widehat {E}}\) is used for edges between vertices; each vertex is a (x,y)value. The predicate Eq expresses equality of vertices.
The predicate UCon is used for undirected connectivity of the \(\mathsf {\widehat {E}}\)predicate. In particular, it will be the case that UCon(a_{1},b_{1},a_{2},b_{2},a_{3},b_{3}) holds true if there exists a path between vertices (a_{1},b_{1}) and (a_{2},b_{2}) such that no vertex on the path is equal or adjacent to (a_{3},b_{3}). Recall that each vertex is itself a pair.
The latter two rules are each other’s symmetric version. The following rule checks whether a vertex (a_{1},b_{1}) belongs to a chordless \(\mathsf {\widehat {E}}\)cycle of length ≥ 2k.
The following rules add to the maximum garbage sets all Rfacts and Sfacts that belong to an irrelevant 1embedding or to a strong component of the ↪ _{C}graph that contains an elementary ↪ _{C}cycle of length ≥ 2k. Whenever a fact is added, all facts of its block are added.
This terminates the computation of the garbage set. In general, we have to check the existence of elementary ↪ _{C}cycles of length nk with 2 ≤ n ≤ 2k − 3. However, for k = 2, no such n exists.
C.4 Proof of Lemma 10
Proof Proof of Lemma 10
Let \(q^{\prime }=({q\setminus C})\cup \{T\}\). For every \(i\in \{0,1,\dots ,k1\}\), let F_{i} = \(R_{i}(\underline {\vec {x}_{i}},\vec {y}_{i})\). Here is an informal visual representation of the different queries involved:
Proof of the First Item We show the existence of a reduction from CERTAINTY(q) to the problem \({\mathsf {CERTAINTY}}({q^{\prime }\cup p})\) that is expressible in \({\mathit {SymStratDatalog}}^{\min \limits }\). We first describe the reduction, and then show that it can be expressed in \({\mathit {SymStratDatalog}}^{\min \limits }\).
Let db_{0} be a database that is input to CERTAINTY(q). By Lemma 9, we can compute in symmetric stratified Datalog the maximum garbage set o for C in db_{0}. Let db = db_{0} ∖o. We know, by Lemma 2, that the problem CERTAINTY(q) has the same answer on instances db_{0} and db. Moreover, by Lemma 3, every garbage set for C in db is empty, which implies, by Lemma 5, that (i) every nembedding of C in db must be a relevant 1embedding, and (ii) every fact A with genre_{q}(A) ∈ C belongs to a 1embedding. The reduction will now encode all these 1embeddings as Tfacts.
We show that every directed edge of the ↪ _{C}graph belongs to a directed cycle. To this end, take any edge A↪ _{C}B. Since every garbage set for C in db is empty, the ↪ _{C}graph contains a relevant 1embedding containing A, and a relevant 1embedding containing B. Let \(A^{\prime }\) be the fact such that \(A^{\prime }\stackrel {{~}_{C}}{\hookrightarrow }B\) is a directed edge in the 1embedding containing B. Let \(B^{\prime }\) be the fact such that \(A\stackrel {{~}_{C}}{\hookrightarrow }B^{\prime }\) is a directed edge in the 1embedding containing A. Since A↪ _{C}B and \(A\stackrel {{~}_{C}}{\hookrightarrow }B^{\prime }\), it follows \(B\sim B^{\prime }\) by Lemma 4. From \(A^{\prime }\stackrel {{~}_{C}}{\hookrightarrow }B\) and \(B\sim B^{\prime }\), it follows \(A^{\prime }\stackrel {{~}_{C}}{\hookrightarrow }B^{\prime }\). Thus, the ↪ _{C}graph contains a directed path from B to \(A^{\prime }\), an edge from \(A^{\prime }\) to \(B^{\prime }\), and a directed path from \(B^{\prime }\) to A. Consequently, the ↪ _{C}graph contains a directed path from B to A.
It follows that every strong component of the ↪ _{C}graph is initial. It can be easily seen that if an initial strong component contains some fact A, then it contains every fact that is keyequal to A. Let r be a repair of db. For every fact A ∈r, there exists a unique fact B ∈r such that A↪ _{C}B. It follows that r must contain an elementary ↪ _{C}cycle, which must be a relevant 1embedding (because every garbage set for C in db is empty) belonging to the same initial strong component as A. It can also be seen that there exists a repair that contains exactly one such 1embedding for every strong component of the ↪ _{C}graph.
We define an undirected graph G as follows: for each valuation μ over vars(q) such that \(\mu (q)\subseteq \mathbf {db}\), we introduce a vertex 𝜃 with 𝜃 = μ[vars(C)]. We add an edge between two vertices 𝜃 and \(\theta ^{\prime }\) if for some \(i\in \{0,\dots ,k1\}\), \(\theta (\vec {x}_{i})=\theta ^{\prime }(\vec {x}_{i})\). The graph G can clearly be constructed in logarithmic space (and even in FO). We define a set db_{T} of Tfacts and, for every \(i\in \{0,\dots ,k1\}\), a set db_{i} as follows: for all two vertices 𝜃, \(\theta ^{\prime }\) of G, if
then we add to db_{T} the fact \({\theta }_{[{{u}\mapsto {\theta ^{\prime }(\vec {x}_{0})}}]}(T)\), and we add to db_{i} the fact \({\theta }_{[{{u}\mapsto {\theta ^{\prime }(\vec {x}_{0})}}]}(N_{i})\). In this way, every db_{i} is consistent. Informally, if T is the atom \(T(\underline {u},\vec {w})\), then we add to db_{T} the Tfact \(T(\underline {\theta ^{\prime }(\vec {x}_{0})},\theta (\vec {w}))\), where \(\theta ^{\prime }(\vec {x}_{0})\) is treated as a single value. This fact represents that 𝜃 belongs to the strong component that is identified by \(\theta ^{\prime }(\vec {x}_{0})\). Since undirected connectivity can be computed in logarithmic space [33], db_{T} and each db_{i} can be constructed in logarithmic space.
Let db_{C} be the set of all F_{i}facts in db (0 ≤ i ≤ k − 1), and let \(\mathbf {db}_{{\mathsf {shared}}}\mathrel {\mathop :}=\mathbf {db}\setminus \mathbf {db}_{C}\), the part of the database db that is preserved by the reduction. Let \(\mathbf {db}_{N}=\bigcup _{i=0}^{k1}\mathbf {db}_{i}\). Since db_{N} is consistent, db_{shared} ⊎db_{T} ⊎db_{N} is a legal input to \({\mathsf {CERTAINTY}}({q^{\prime }\cup p})\), where the use of ⊎ (rather than ∪) indicates that the operands of the union are disjoint. Here is an informal visual representation of the reduction:
We show that the following are equivalent:

1.
Every repair of db satisfies q.

2.
For every s ∈rset(db_{shared}), for every repair r_{T} of db_{T}, \({\mathbf {s}}\uplus {\mathbf {r}}_{T}\uplus \mathbf {db}_{N}\models q^{\prime }\cup p\).

3.
Every repair of db_{shared} ⊎db_{T} ⊎db_{N} satisfies \(q^{\prime }\cup p\).
The equivalence 2 ⇔ 3 is straightforward. We show next the equivalence 1 ⇔ 2. Let s ∈rset(db_{shared}) and let r_{T} be a repair of db_{T}. By our construction of db_{T}, there exists a repair r_{C} of db_{C} such that for every valuation 𝜃 over vars(q), if \(\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}\), then for some value c, \({\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}\). Informally, r_{C} contains all (and only) the relevant 1embeddings of C in ∪r_{C} that are encoded by the Tfacts of r_{T}. Since s ∪r_{C} is a repair of db, by the hypothesis 1, we can assume a valuation 𝜃 over vars(C) such that \(\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}\). Consequently, for some value c, \({\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}\). Let r be a repair of db. There exist s ∈rset(db_{shared}) and r_{C} ∈rset(db_{C}) such that r = s ∪r_{C}. By the construction of db_{T}, there exists a repair r_{T} of db_{T} such that for every valuation 𝜃 over vars(q), if \({\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}\) for some c, then \(\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}\) (note incidentally that the converse does not generally hold). Informally, for every strong component \(\mathcal {S}\) of the ↪ _{C}graph of db such that \({\mathbf {s}}\cup ({{\mathbf {r}}_{C}\cap V(\mathcal {S})})\models q\), the set r_{T} encodes one 1embedding of C in \({\mathbf {s}}\cup ({{\mathbf {r}}_{C}\cap V(\mathcal {S})})\). Here, \(V(\mathcal {S})\) denotes the vertex set of the strong component \(\mathcal {S}\); thus \(V(\mathcal {S})\subseteq \mathbf {db}_{C}\). Since s ∪r_{T} ∪db_{N} is a repair of db_{shared} ⊎db_{T} ⊎db_{N}, it follows by the hypothesis 2 that there exists a valuation 𝜃 over vars(q) such that \({\theta }_{[{{u}\mapsto {c}}]}(q^{\prime }\cup p)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{T}\cup \mathbf {db}_{N}\) for some c. Consequently, \(\theta (q)\subseteq {\mathbf {s}}\cup {\mathbf {r}}_{C}\).
In the main body of this article, we have shown a program in \({\mathit {SymStratDatalog}}^{\min \limits }\) that computes the reduction.Proof of the Second Item Assume that the attack graph of q contains no strong cycle and that some initial strong component of the attack graph contains every atom of \(\{F_{0},F_{1},\dots ,F_{k1}\}\). Since all N_{i}facts have mode c, they have no outgoing attacks in the attack graph of \(q^{\prime }\cup p\). Since \({\mathsf {vars}}({N_{i}})\subseteq {\mathsf {vars}}({T})\) for every atom N_{i} ∈ p, we can limit our analysis to witnesses for attacks that do not contain any N_{i}. Indeed, if N_{i} would occur in a witness, it can be replaced with T. Let \(\mathcal {S}\) be an initial strong component of the attack graph of q that contains every atom of \(\{F_{0},F_{1},\dots ,F_{k1}\}\). We will use the following properties:

(a)
For all \(X,Y\subseteq \mathsf {vars}({q})\), if \({\mathcal {K}}({q})\models {X}\rightarrow {Y}\), then \({\mathcal {K}}({q^{\prime }\cup p})\models {X}\rightarrow {Y}\). This holds true because \({\mathcal {K}}({q^{\prime }\cup p})\models {\mathcal {K}}({q})\). To prove the latter claim, note that \({\mathcal {K}}({q})\setminus {\mathcal {K}}({q^{\prime }\cup p})=\{{{\mathsf {key}}({F_{i}})}\rightarrow {{\mathsf {vars}}({F_{i}})}\}_{i=0}^{k1}\). For all \(i\in \{0,1,\dots ,k1\}\), we have that \({\mathcal {K}}({\{T,N_{i}\}})\equiv \{{u}\rightarrow {\mathsf {vars}({C})},{{\mathsf {key}}({F_{i}})}\rightarrow {u}\}\) with \({\mathsf {vars}}({F_{i}})\subseteq \mathsf {vars}({C})\). Consequently, \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({F_{i}})}\rightarrow {{\mathsf {vars}}({F_{i}})}\).

(b)
As an immediate consequence of (a), we have \({H}^{+,{q}}\subseteq {H}^{+,{q^{\prime }\cup p}}\) for every H ∈ q ∖ C.

(c)
For every H ∈ q ∖ C, if \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T\), then \(H\in \mathcal {S}\). To show this result, let H ∈ q ∖ C such that \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T\). We can assume without loss of generality the existence of a witness for \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T\) of the form \(\omega \stackrel {v}{\smallfrown }T\) with v≠u, where the sequence ω starts with H. We can assume the existence of \(j\in \{0,\dots ,k1\}\) such that v ∈vars(F_{j}). From the preceding property (b), it follows that the sequence \(\omega \stackrel {v}{\smallfrown }F_{j}\) is a witness for \(H\overset {q}{\rightsquigarrow }F_{j}\). Since \(F_{j}\in \mathcal {S}\), we conclude \(H\in \mathcal {S}\).

(d)
For all \(G,H\!\in \!\mathcal {S}\), we have \({\mathcal {K}}({q^{\prime }\!\cup \! p})\!\models \!{{\mathsf {key}}({G})}\!\rightarrow \!{{\mathsf {key}}({H})}\). To show this result, let \(G,H\in \mathcal {S}\). Since \(\mathcal {S}\) is an initial strong component of the attack graph of q, there exists an elementary attack cycle that contains both G and H. Since the attack graph of q contains no strong cycle, for every edge \(J\overset {q}{\rightsquigarrow }J^{\prime }\) on this attack cycle, we have \({\mathcal {K}}({q})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({J^{\prime }})}\). It can now be easily seen that \({\mathcal {K}}({q})\models {{\mathsf {key}}({G})}\rightarrow {{\mathsf {key}}({H})}\). Finally, by property (a), \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({G})}\rightarrow {{\mathsf {key}}({H})}\).
We know by [21, Lemma 3.6] that if the attack graph contains a strong cycle, then it contains a strong cycle of length 2. Therefore, to conclude the proof, it suffices to show that every cycle of length 2 in the attack graph of \(q^{\prime }\cup p\) is weak. To this end, assume that the attack graph of \(q^{\prime }\cup p\) contains an attack cycle \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\). Then, either H≠T or J≠T (or both). We assume without loss of generality that H≠T. We show that the attack cycle \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) is weak. We distinguish three cases.
 Case that \(H\stackrel {q^{\prime }\cup p}{\not \rightsquigarrow }T\) (therefore J≠T) and \(J\stackrel {q^{\prime }\cup p}{\not \rightsquigarrow }T\).:

Then no witness for \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\) or \(J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) can contain T. By property (b), \(H\stackrel {q}{\rightsquigarrow }J\stackrel {q}{\rightsquigarrow }H\). Since the attack graph of q contains no strong attack cycle, \({\mathcal {K}}({q})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({J})}\) and \({\mathcal {K}}({q})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({H})}\). Then, by property (a), \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({J})}\) and \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({H})}\). It follows that the attack cycle \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) is weak.
 Case that \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T\).:

By property (c), \(H\in \mathcal {S}\). We distinguish two cases.
 Case that J = T.:

By property (d), \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({F_{0}})}\) and \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({F_{0}})}\rightarrow {{\mathsf {key}}({H})}\). In the following, recall that {u} = key(T). Since \(\mathcal {K}({q^{\prime }\cup p})\models \mathsf {key}(F_{0}) \rightarrow u\) and \(\mathcal {K}({q^{\prime }\cup p})\models u \rightarrow \mathsf {key}(F_{0})\) hold by the construction of \(q^{\prime }\cup p\), we conclude \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {u}\) and \({\mathcal {K}}({q^{\prime }\cup p})\) \(\models {u}\rightarrow {{\mathsf {key}}({H})}\). It follows that the attack cycle \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) is weak.
 Case that J≠T.:

We show that \(J\in \mathcal {S}\) by distinguishing two cases:

If \(J\stackrel {q^{\prime }\cup p}{\not \rightsquigarrow }T\), then no witness for \(J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) contains T. Then, by property (b), any witness for \(J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) is also a witness for \(J\overset {q}{\rightsquigarrow }H\), and therefore \(J\in \mathcal {S}\).

If \(J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T\), then \(J\in \mathcal {S}\) by property (c).
From \(H,J\in \mathcal {S}\), it follows \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({H})}\rightarrow {{\mathsf {key}}({J})}\) and \({\mathcal {K}}({q^{\prime }\cup p})\models {{\mathsf {key}}({J})}\rightarrow {{\mathsf {key}}({H})}\) by property (d). It follows that the attack cycle \(H\stackrel {q^{\prime }\cup p}{\rightsquigarrow }J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }H\) is weak.

 Case that \(J\stackrel {q^{\prime }\cup p}{\rightsquigarrow }T\) (therefore J≠T).:

This case is symmetrical to a case that has already been treated.
□
Appendix D: Proofs of Section 8.1
D.1 Proof of Lemma 11
We will use two helping lemmas.
Lemma 16
[35, Lemma 4.3] Let q be a selfjoinfree Boolean conjunctive query, and r a consistent database. If α_{1},α_{2} are valuations over vars(q) such that \(\alpha _{1}(q)\subseteq {\mathbf {r}}\) and \(\alpha _{2}(q)\subseteq {\mathbf {r}}\), then {α_{1},α_{2}} satisfies every functional dependency in \({\mathcal {K}}({q})\).
Lemma 17
Let q be a query in sjfBCQ. Let \({Z}\rightarrow {w}\) be a functional dependency that is internal to q. Let \(\vec {z}\) be a sequence of distinct variables such that \(\mathsf {vars}({\vec {z}})=Z\). Let \(q^{\prime }=q\cup \{N^{\mathsf {c}}(\underline {\vec {z}},w)\}\) where N is a fresh relation name of mode c. Then,

1.
there exists a firstorder reduction from CERTAINTY(q) to \({\mathsf {CERTAINTY}}({q^{\prime }})\); and

2.
if the attack graph of q contains no strong cycle, then the attack graph of \(q^{\prime }\) contains no strong cycle.
Proof Proof of the first item
By the second condition in Definition 8, we can assume an atom F ∈ q such that \(Z\subseteq {\mathsf {vars}}({F})\). Let \(F_{1},F_{2},\dots ,F_{\ell }\) be a sequential proof for \({\mathcal {K}}({q})\models {Z}\rightarrow {w}\) such that for every \(i\in \{1,\dots ,\ell \}\), for every u ∈ Z ∪{w}, \(F_{i}\stackrel {q}{\not \rightsquigarrow }u\). It can be easily seen that for every \(i\in \{0,\dots ,\ell 1\}\), we have
Let db be a database that is the input to CERTAINTY(q). We repeat the following “purification” step: If for two valuations over vars(q), denoted β_{1} and β_{2}, we have \(\beta _{1}(q),\beta _{2}(q)\subseteq \mathbf {db}\) and \(\{\beta _{1},\beta _{2}\}\not \models {Z}\rightarrow {w}\), then we remove both the Fblock containing β_{1}(F) and the Fblock containing β_{2}(F). Note that β_{1}(F) and β_{2}(F) may be keyequal, and hence belong to the same Fblock.
Assume that we apply this step on \(\mathbf {db}^{\prime }\) and obtain \(\mathbf {db}^{\prime \prime }\). We show that some repair of \(\mathbf {db}^{\prime }\) falsifies q if and only if some repair of \(\mathbf {db}^{\prime \prime }\) falsifies q. The ⇒direction trivially holds true. For the ⇐=direction, let \({\mathbf {r}}^{\prime \prime }\) be a repair of \(\mathbf {db}^{\prime \prime }\) that falsifies q. Assume, toward a contradiction, that every repair of \(\mathbf {db}^{\prime }\) satisfies q. For every repair r, define Reify(r) as the set of valuations over Z ∪{w} containing 𝜃 if r⊧𝜃(q). Let
Note that if β_{1}(F) and β_{2}(F) are keyequal, then we can choose either \({\mathbf {r}}^{\prime }={\mathbf {r}}^{\prime \prime }\cup \{\beta _{1}(F)\}\) or \({\mathbf {r}}^{\prime }={\mathbf {r}}^{\prime \prime }\cup \{\beta _{2}(F)\}\); the actual choice does not matter. Obviously, \({\mathbf {r}}^{\prime }\) is a repair of \(\mathbf {db}^{\prime }\). Since we assumed that every repair of \(\mathbf {db}^{\prime }\) satisfies q, we can assume a valuation α over vars(q) such that \(\alpha (q)\subseteq {\mathbf {r}}^{\prime }\). Since \(\alpha (q)\nsubseteq {\mathbf {r}}^{\prime \prime }\) (because \({\mathbf {r}}^{\prime \prime }\not \models q\)), it must be the case that for some j ∈{1, 2}, α(F) = β_{j}(F). From \(\mathsf {vars}({\vec {z}})=Z\subseteq {\mathsf {vars}}({F})\), it follows that \(\alpha (\vec {z})=\beta _{j}(\vec {z})\). From \(\beta _{1}(\vec {z})=\beta _{2}(\vec {z})\), it follows \(\alpha (\vec {z})=\beta _{1}(\vec {z})\) and \(\alpha (\vec {z})=\beta _{2}(\vec {z})\). Since β_{1}(w)≠β_{2}(w), either α(w)≠β_{1}(w) or α(w)≠β_{2}(w) (or both). Therefore, we can assume b ∈{1, 2} such that α(w)≠β_{b}(w). It will be the case that \({\textsf {Reify}}({{\mathbf {r}}^{\prime }})=\{\alpha [Z\cup \{w\}]\}\).^{Footnote 2} Indeed, since α is an arbitrary valuation over vars(q) such that \(\alpha (q)\subseteq {\mathbf {r}}^{\prime }\), it follows that for all valuations α_{1},α_{2} over vars(q), if \(\alpha _{1}(q),\alpha _{2}(q)\subseteq {\mathbf {r}}^{\prime }\), then \(\alpha _{1}(\vec {z})=\alpha _{2}(\vec {z})\) and therefore, by Lemma 16 and using that \({\mathcal {K}}({q})\models {Z}\rightarrow {w}\), we have α_{1}(w) = α_{2}(w).
We now claim that for all \(i\in \{0,1,\dots ,\ell \}\), there exists a pair \(({\mathbf {r}}^{\prime i},\alpha ^{i})\) such that

1.
\({\mathbf {r}}^{\prime i}\) is a repair of \(\mathbf {db}^{\prime }\);

2.
α^{i} is a valuation over vars(q) such that \(\alpha ^{i}(q)\subseteq {\mathbf {r}}^{\prime i}\);

3.
\(\alpha ^{i}(\{F_{j}\}_{j=1}^{i})=\beta _{b}(\{F_{j}\}_{j=1}^{i})\) and \(\alpha ^{i}(\vec {z})=\beta _{b}(\vec {z})\) (and therefore \(\alpha ^{i}(\vec {z})=\alpha (\vec {z})\));

4.
α^{i}(w) = α(w); and

5.
\({\textsf {Reify}}({{\mathbf {r}}^{\prime i}})=\{\alpha [Z\cup \{w\}]\}\).
The third condition entails \(\{\alpha ^{i},\beta _{b}\}\models {\mathcal {K}}({\{F_{j}\}_{j=1}^{i}})\) for all \(i\in \{0,1,\dots ,\ell \}\). From (4), it follows \(\{\alpha ^{i},\beta _{b}\}\models {Z}\rightarrow {{\mathsf {key}}({F_{i+1}})}\). Then, from \(\alpha ^{i}(\vec {z})=\beta _{b}(\vec {z})\), it follows that α^{i} and β_{b} agree on all variables of key(F_{i+ 1}).
The proof of the above claim runs by induction on increasing i. For the basis of the induction, i = 0, the desired result holds by choosing \({\mathbf {r}}^{\prime 0}={\mathbf {r}}^{\prime }\) and α^{0} = α.
For the induction step, \(i\rightarrow i+1\), the induction hypothesis is that the desired pair \(({\mathbf {r}}^{\prime i},\alpha ^{i})\) exists. Since α^{i} and β_{b} agree on all variables of key(F_{i+ 1}), we have that α^{i}(F_{i+ 1}) and β_{b}(F_{i+ 1}) are keyequal. From \(\beta _{b}(q)\subseteq \mathbf {db}^{\prime }\), it follows that \(\beta _{b}(F_{i+1})\in \mathbf {db}^{\prime }\). Let \({\mathbf {r}}^{\prime i+1}=\left ({{\mathbf {r}}^{\prime i}\setminus \{\alpha ^{i}(F_{i+1})\}}\right )\cup \{\beta _{b}(F_{i+1})\}\), which is obviously a repair of \(\mathbf {db}^{\prime }\). Since \(F_{i+1}\stackrel {q}{\not \rightsquigarrow }u\) for all u ∈ Z ∪{w}, \({\textsf {Reify}}({{\mathbf {r}}^{\prime i+1}})\subseteq {\textsf {Reify}}({{\mathbf {r}}^{\prime i}})\) by [21, Lemma B.1]. Since we assumed that every repair of \(\mathbf {db}^{\prime }\) satisfies q, we have that \({\textsf {Reify}}({{\mathbf {r}}^{\prime i+1}})\neq \emptyset \), and therefore \(\textsf {Reify}({{\mathbf {r}}^{\prime i+1}})=\{\alpha [Z\cup \{w\}]\}\). Hence, there exists a valuation α^{i+ 1} over vars(q) such that \(\alpha ^{i+1}(q)\subseteq {\mathbf {r}}^{\prime i+1}\) and α^{i+ 1}[Z ∪{w}] = α[Z ∪{w}], that is, \(\alpha ^{i+1}(\vec {z})=\alpha (\vec {z})\) and α^{i+ 1}(w) = α(w). Since \(\alpha (\vec {z})=\beta _{b}(\vec {z})\), we have \(\alpha ^{i+1}(\vec {z})=\beta _{b}(\vec {z})\). We have thus shown that the pair \(({\mathbf {r}}^{\prime i+1},\alpha ^{i+1})\) satisfies items 1, 2, 4, and 5 in the above fiveitem list; we also have shown the second conjunct of item 3. In the next paragraph, we show that \(\alpha ^{i+1}(\{F_{j}\}_{j=1}^{i+1})=\beta _{b}(\{F_{j}\}_{j=1}^{i+1})\), i.e., the first conjunct of item 3.
By the induction hypothesis, \(\alpha ^{i}(\{F_{j}\}_{j=1}^{i})=\beta _{b}(\{F_{j}\}_{j=1}^{i})\) and \(\alpha ^{i}(q)\subseteq {\mathbf {r}}^{\prime i}\), which implies \(\beta _{b}(\{F_{j}\}_{j=1}^{i})\subseteq {\mathbf {r}}^{\prime i}\). Since \({\mathbf {r}}^{\prime i}\) and \({\mathbf {r}}^{\prime i+1}\) include the same set of F_{j}facts for every \(j\in \{1,\dots ,i\}\), we have \(\beta _{b}(\{F_{j}\}_{j=1}^{i})\subseteq {\mathbf {r}}^{\prime i+1}\). Since \(\beta _{b}(F_{i+1})\in {\mathbf {r}}^{\prime i+1}\) by construction, we obtain \(\beta _{b}(\{F_{j}\}_{j=1}^{i+1})\subseteq {\mathbf {r}}^{\prime i+1}\). Since also \(\alpha ^{i+1}(\{F_{j}\}_{j=1}^{i+1})\subseteq {\mathbf {r}}^{\prime i+1}\) (because \(\alpha ^{i+1}(q)\subseteq {\mathbf {r}}^{\prime i+1}\)), it is correct to conclude that \(\{\beta _{b},\alpha ^{i+1}\}\models {\mathcal {K}}({\{F_{j}\}_{j=1}^{i+1}})\) by Lemma 16. We are now ready to show that α^{i+ 1}(F_{j}) = β_{b}(F_{j}) for all \(j\in \{1,\dots ,i+1\}\). To this end, pick any \(k\in \{1,\dots ,i+1\}\). We have \({\mathcal {K}}({\{F_{j}\}_{j=1}^{k1}})\models {Z}\rightarrow {{\mathsf {key}}({F_{k}})}\) by (4). Since \(\{F_{j}\}_{j=1}^{k1}\) is a subset of \(\{F_{j}\}_{j=1}^{i+1}\), we have \(\{\beta _{b},\alpha ^{i+1}\}\models {\mathcal {K}}({\{F_{j}\}_{j=1}^{k1}})\), and therefore \(\{\beta _{b},\alpha ^{i+1}\}\models {Z}\rightarrow {{\mathsf {key}}({F_{k}})}\). Then, from \(\alpha ^{i+1}(\vec {z})=\beta _{b}(\vec {z})\) (the second conjunct of item 3), it follows that α^{i+ 1} and β_{b} agree on all variables of key(F_{k}). Since \(\alpha ^{i+1}(F_{k}),\beta _{b}(F_{k})\in {\mathbf {r}}^{\prime i+1}\), it must be the case that α^{i+ 1}(F_{k}) = β_{b}(F_{k}). This concludes the induction step.
For the pair \(({\mathbf {r}}^{\prime \ell },\alpha ^{\ell })\), we have that \(\alpha ^{\ell }(\{F_{j}\}_{j=1}^{\ell })=\beta _{b}(\{F_{j}\}_{j=1}^{\ell })\), and therefore, since w occurs in some F_{j}, α^{ℓ}(w) = β_{b}(w). Since also α^{ℓ}(w) = α(w), we obtain α(w) = β_{b}(w), a contradiction. We conclude by contradiction that some repair of \(\mathbf {db}^{\prime }\) falsifies q. Thus, the purification step described in the paragraph immediate following (4) does not change the answer to CERTAINTY(q).
We repeat the “purification” step until it can no longer be applied. Let the final database be \(\widehat {\mathbf {db}}\). By the above reasoning, we have that every repair of \(\widehat {\mathbf {db}}\) satisfies q if and only if every repair of db satisfies q. Let s be the smallest set of Nfacts containing \(N(\underline {\beta (\vec {z})},\beta (w))\) for every valuation β over vars(q) such that \(\beta (q)\subseteq \mathbf {db}\). We show that s is consistent. To this end, let β_{1},β_{2} be valuations over vars(q) such that \(\beta _{1}(q),\beta _{2}(q)\subseteq \mathbf {db}\) and \(\beta _{1}(\vec {z})=\beta _{2}(\vec {z})\). If β_{1}(w)≠β_{2}(w), then a purification step can remove the block containing β_{1}(F), contradicting our assumption that no purification step is applicable on \(\widehat {\mathbf {db}}\). We conclude by contradiction that β_{1}(w) = β_{2}(w).
Since N has mode c and s is consistent, we have that \(\widehat {\mathbf {db}}\cup {\mathbf {s}}\) is a legal database. It can now be easily seen that every repair of db satisfies q if and only if every repair of \(\widehat {\mathbf {db}}\cup {\mathbf {s}}\) satisfies \(q^{\prime }=q\cup \{N^{\mathsf {c}}(\underline {\vec {z}},w)\}\).
It remains to be argued that the reduction is in FO, i.e., that the result of the repeated “purification” step can be obtained by a single firstorder query. Let \(\mathsf {vars}({q})=\{x_{1},\dots ,x_{n}\}\). Let \(q^{*}(x_{1},\dots ,x_{n})\mathrel {\mathop :}=\bigwedge _{G\in q}G\) be the quantifierfree part of the firstorder formula expressing the Boolean query q. For every \(i\in \{1,\dots ,n\}\), let \(x_{i}^{\prime }\) be a fresh variable. Let \(\vec {u}\) be a sequence of distinct variables such that \(\mathsf {vars}({\vec {u}})={\mathsf {vars}}({F})\). The following query finds all Ffacts whose blocks can be removed:
where the existential quantification ranges over all variables not in \(\vec {u}\). The Ffacts that are to be preserved are not keyequal to a fact in the preceding query and can obviously be computed in FO. This concludes the proof of the first item.Proof of the Second Item Assume that the attack graph of q contains no strong cycle. We will show that the attack graph of \(q^{\prime }\) contains no strong cycle either. By the second item in Definition 8, we can assume an atom G ∈ q such that \(Z\subseteq {\mathsf {vars}}({G})\). Note that the atom \(N^{\mathsf {c}}(\underline {\vec {z}},w)\) has no outgoing attacks because its mode is c. It is sufficient to show that for every F,H ∈ q, if there exists a witness for \(F\stackrel {q^{\prime }}{\rightsquigarrow }H\), then there exists a witness for \(F\stackrel {q^{\prime }}{\rightsquigarrow }H\) that does not contain \(N^{\mathsf {c}}(\underline {\vec {z}},w)\). To this end, assume that a witness for \(F\stackrel {q^{\prime }}{\rightsquigarrow }H\) contains
where \(u^{\prime }\) and \(u^{\prime \prime }\) are distinct variables. We can assume without loss of generality that this is the only occurrence of \(N^{\mathsf {c}}(\underline {\vec {z}},w)\) in the witness. In this case, we have \(F\overset {q}{\rightsquigarrow }u^{\prime }\). If \(u^{\prime },u^{\prime \prime }\in Z\), then we can replace \(N^{\mathsf {c}}(\underline {\vec {z}},w)\) with G. So the only nontrivial case is where either \(u^{\prime }=w\) or \(u^{\prime \prime }=w\) (but not both). Then, it must be the case that \({\mathcal {K}}({q^{\prime }\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {w}\), and therefore also
Since \({Z}\rightarrow {w}\) is internal to q, there exists a sequential proof for \({\mathcal {K}}({q})\models {Z}\rightarrow {w}\) such that no atom in the proof attacks a variable in Z ∪{w}. Let \(J_{1},J_{2},\dots ,J_{\ell }\) be a shortest such proof. Because \(F\overset {q}{\rightsquigarrow }u^{\prime }\) and \(u^{\prime } \in Z \cup \{w\}\), it must be that \(F\not \in \{J_{1},\dots ,J_{\ell }\}\). We can assume that w occurs at a nonprimarykey position in J_{ℓ}. Because of (6), we can assume the existence of a variable v ∈key(J_{ℓ}) such that \({\mathcal {K}}({q\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {v}\). If v∉Z, then there exists k < ℓ such that v occurs at a nonprimarykey position in J_{k}. Again, we can assume a variable \(v^{\prime }\in {\mathsf {key}}({J_{k}})\) such that \({\mathcal {K}}({q\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {v^{\prime }}\). By repeating the same reasoning, there exists a sequence
where \(1\leq i_{0}<i_{1}<\dotsm <i_{m}=\ell \) such that

\(z_{i_{0}}\in Z\);

for all \(j\in \{0,\dots ,m\}\), \({\mathcal {K}}({q\setminus \{F\}})\not \models {{\mathsf {key}}({F})}\rightarrow {z_{i_{j}}}\); and

for all \(j\in \{1,\dots ,m\}\), \(z_{i_{j}}\in {\mathsf {vars}}({J_{i_{j1}}})\cap {\mathsf {vars}}({J_{i_{j}}})\). In particular, \(z_{i_{j}}\in {\mathsf {key}}({J_{i_{j}}})\).
We can assume G ∈ q such that \(Z\subseteq {\mathsf {vars}}({G})\). Let \(u\in \{u^{\prime },u^{\prime \prime }\}\) such that u≠w. Thus, \(\{u,w\}=\{u^{\prime },u^{\prime \prime }\}\). It can now be easily seen that a witness for \(F\stackrel {q^{\prime }}{\rightsquigarrow }H\) can be obtained by replacing \(N^{\mathsf {c}}(\underline {\vec {z}},w)\) in (5) with the following sequence or its reverse:
This concludes the proof of Lemma 17. □
The proof of Lemma 11 is now straightforward.
Proof Proof of Lemma 11
Repeated application of Lemma 17. □
D.2 Proof of Lemma 12
We will use the following helping lemma.
Lemma 18
Let q be a query in sjfBCQ such that q is saturated and the attack graph of q contains no strong cycle. Let \(\mathcal {S}\) be an initial strong component in the attack graph of q with \(\left {\mathcal {S}}\right \geq 2\). For every atom \(F \in \mathcal {S}\), there exists an atom \(H \in \mathcal {S}\) such that F→ _{M}H.
Proof
Assume \(F \in \mathcal {S}\). Since F belongs to an initial strong component with at least two atoms, there exists \(G \in \mathcal {S}\) such that \(F\overset {q}{\rightsquigarrow }G\) and the attack is weak. Therefore, \({\mathcal {K}}({q})\models {{\mathsf {key}}({F})}\rightarrow {{\mathsf {key}}({G})}\). It follows that \({\mathcal {K}}({q\setminus \{F\}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({G})}\). Let \(\sigma = H_{1}, H_{2}, \dots , H_{\ell }\) be a sequential proof for \({\mathcal {K}}({q\setminus \{F\}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({G})}\), where \(F \notin \{H_{1}, \dots , H_{\ell }\}\). We can assume without loss of generality that H_{ℓ} = G.
Let j be the smallest index in \(\{1, \dots , \ell \}\) such that \(H_{j} \in \mathcal {S}\). Since \(H_{\ell } \in \mathcal {S}\), such an index always exists. Then, \(\sigma = H_{1}, H_{2}, \dots , H_{j1}\) is a sequential proof for \({\mathcal {K}}({q\setminus \{F\}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({H_{j}})}\) (observe that this proof may be empty). By our choice of j, for every \(i\in \{1,\dots ,j1\}\), we have \(H_{i} \notin \mathcal {S}\), and hence H_{i} cannot attack F or H_{j} (since \(\mathcal {S}\) is an initial strong component). It follows that no atom in σ attacks a variable in vars(F) ∪key(H_{j}). Since q is saturated, this implies that \({\mathcal {K}}({{q}^{\mathsf {cons}}})\models {{\mathsf {vars}}({F})}\rightarrow {{\mathsf {key}}({H_{j}})}\), and so F→ _{M}H_{j}. □
The proof of Lemma 12 can now be given.
Proof Proof of Lemma 12
Starting from some atom \(F_{0} \in \mathcal {S}\), by applying repeatedly Lemma 18, we can create an infinite sequence \(F_{0} \stackrel {\mathsf {{~}_{M}}}{\longrightarrow } F_{1} \stackrel {\mathsf {{~}_{M}}}{\longrightarrow } F_{2} \stackrel {\mathsf {{~}_{M}}}{\longrightarrow } \dotsm \) such that for every i ≥ 1, \(F_{i} \in \mathcal {S}\) and F_{i}≠F_{i+ 1}. Since the atoms in \(\mathcal {S}\) are finitely many, there will exist some i,j such that i < j and F_{i} = F_{j+ 1}. It follows that the Mgraph of q contains a cycle all of whose atoms belong to \(\mathcal {S}\). □
Rights and permissions
About this article
Cite this article
Koutris, P., Wijsen, J. Consistent Query Answering for Primary Keys in Datalog. Theory Comput Syst 65, 122–178 (2021). https://doi.org/10.1007/s00224020099856
Published:
Issue Date:
Keywords
 Conjunctive queries
 Consistent query answering
 Datalog
 Primary keys