Limits of Schema Mappings
 165 Downloads
Abstract
Schema mappings have been extensively studied in the context of data exchange and data integration, where they have turned out to be the right level of abstraction for formalizing data interoperability tasks. Up to now and for the most part, schema mappings have been studied as static objects, in the sense that each time the focus has been on a single schema mapping of interest or, in the case of composition, on a pair of schema mappings of interest. In this paper, we adopt a dynamic viewpoint and embark on a study of sequences of schema mappings and of the limiting behavior of such sequences. To this effect, we first introduce a natural notion of distance on sets of finite target instances that expresses how “close” two sets of target instances are as regards the certain answers of conjunctive que ries on these sets. Using this notion of distance, we investigate pointwise limits and uniform limits of sequences of schema mappings, as well as the companion notions of pointwise Cauchy and uniformly Cauchy sequences of schema mappings. We obtain a number of results about the limits of sequences of GAV schema mappings and the limits of sequences of LAV schema mappings that reveal striking differences between these two classes of schema mappings. We also consider the completion of the metric space of sets of target instances and obtain concrete representations of limits of sequences of schema mappings in terms of generalized schema mappings, that is, schema mappings with infinite target instances as solutions to (finite) source instances.
Keywords
Schema mappings Limits Pointwise convergence Uniform convergence1 Introduction
Schema mappings have been extensively studied in the context of data exchange and data integration, where they have turned out to be the right level of abstraction for formalizing data interoperability tasks (see the surveys [11, 12] and the monograph [1]). Up to now and for the most part, schema mappings have been studied as static objects, in the sense that each time the focus has been on a single schema mapping or on a finite and, typically, small number of schema mappings. In the case of data exchange [6], a single schema mapping is used to specify the relationship between a source schema and a target schema. In the case of operators on schema mappings [3], such as the composition operator [8, 14], a fixed number of schema mappings is used as input (e.g., two schema mappings in the case of composition) and another schema mapping is returned as output. Even the case of schemamapping evolution [9] entails a finite (but potentially large) number of schema mappings.
In this paper, we adopt a dynamic viewpoint and embark on a systematic investigation of sequences of schema mappings and of the limiting behavior of such sequences. The original motivation came from the earlier work [2, 5, 7, 10, 14] on schemamapping optimization and the study of various notions of equivalence between schema mappings that, intuitively, stipulate that two schema mappings cannot be distinguished using conjunctive queries (C Qequivalence) or conjunctive queries with at most n variables (C Q _{ n }equivalence), for some fixed n ≥ 1. In particular, in [5] and, implicitly, in [14], it was shown that, given an SOtgd (secondorder tuplegenerating dependency) σ and a positive integer n, one can construct a GLAV schema mapping that is C Q _{ n }equivalent to σ. Informally, this means that a given SO tgd can be “approximated” by GLAV schema mappings up to any fixed level of precision, even though an SO tgd is a formula of secondorder logic that may not be logically equivalent to any formula of firstorder logic and, in particular, to any GLAV schema mapping. A more dynamic interpretation is that, given an SOtgd σ, one can obtain a sequence of GLAV schema mappings \((\mathcal {M}_{n})_{n\geq 1}\), whose “limit” is σ.
Summary of Results
Our contributions are both conceptual and technical. At the conceptual level, we develop a framework for studying sequences of schema mappings by first introducing a natural notion of distance on the powerset \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) of the set Inst(T) of finite instances over a schema T. Intuitively, this notion of distance expresses how “close” two sets of finite Tinstances are as regards the certain answers of conjunctive queries on these sets. The pair \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\) is a pseudometric space, which means that the distance function d i s t(⋅,⋅) is symmetric and obeys the triangle inequality, but different sets of finite target instances may have distance zero; however, two such sets have distance zero if and only if they are C Qequivalent, i.e., every conjunctive query has the same certain answers on these two sets. Thus, we will also work with the metric space obtained by considering the C Qequivalence classes of members of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\), and will use the same notation for it.
Sequences of functions from some set to a metric space occupy a central place in the study of metric spaces (see, e.g., [18]). In particular, there are natural notions of a pointwise limit and of a uniform limit of a sequence (f _{ n })_{ n ≥ 1} of functions from some set to a metric space; moreover, there are companion notions of a pointwise Cauchy and of a uniformly Cauchy sequence of such functions. We now describe briefly how these notions can be applied to sequences of schema mappings. In its most general formulation, a schema mapping \(\mathcal {M}\) over a source schema S and a target schema T is a set of pairs (I, J), where I is a finite Sinstance and J is a finite Tinstance. It follows that a schema mapping \(\mathcal {M}\) can be also be viewed as a function f from the set Inst(S) of all finite Sinstances to the powerset \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) of the set of all finite Tinstances, where \(f(I) =\{J: (I,J) \in \mathcal {M}\}\). This way, a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings over a source schema S and a target schema T can be viewed as a sequence of functions from Inst(S) to the (pseudo)metric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\).
After the conceptual framework has been laid out, we study in depth the limiting behavior of sequences of GAV mappings and the convergence of sequences of LAV mappings. We establish a number of technical results that reveal rather dramatic and perhaps unanticipated differences between GAV schema mappings and LAV schema mappings.
For sequences of GAV mappings, we point out that every uniformly Cauchy sequence of GAV mappings is eventually constant, hence it has a GAV mapping as uniform limit. We also show that every pointwise Cauchy sequence of GAV mappings has a pointwise limit, but it need not have a uniform limit; moreover, there are pointwise Cauchy sequences of GAV mappings such that no GAV mapping is their pointwise limit. This raises the question as to when a sequence of GAV mappings has a GAV mapping as a pointwise limit. We prove that a sequence of GAV mappings has a GAV mapping as a pointwise limit if and only if it has a pointwise limit that allows for C Qrewriting^{1}.
For sequences of LAV mappings, we show that the notions of uniform limit and pointwise limit coincide; moreover, the same holds true for the notions of uniformly Cauchy and pointwise Cauchy sequences. However, there are uniformly Cauchy sequences of LAV mappings that have no uniform limit. We also establish that a uniformly Cauchy sequence of LAV mappings has a LAV mapping as a uniform limit if and only if it has a uniform limit that admits universal solutions. The aforementioned results lift to sequences of premisebounded sequences of GLAV mappings, i.e., sequences of GLAV mappings for which there is a k ≥ 1 such that, for every mapping in the sequence, the lefthand side of every GLAV constraint has at most k source atoms (LAV mappings have k = 1).
In terms of techniques, we use systematically the structural characterizations of schemamapping languages established in [19], thus creating a link with a different line of research.
The metric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\) is incomplete, i.e., there are Cauchy sequences of elements of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) that have no limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). It is well known that every incomplete metric space (X, d) has a completion, which means that it can be embedded into a complete metric space (X ^{∗}, d ^{∗}) so that X is a dense subset of X ^{∗}. Moreover, pointwise (respectively, uniformly) Cauchy sequences of functions on X have pointwise (respectively, uniform) limits that take values in X ^{∗}. The construction of X ^{∗} from X involves equivalence classes of Cauchy sequences of elements of X, thus, in general, the members of X ^{∗} do not have a concrete representation. In the last part of the paper, we show that the members of \(\mathcal {P}(\text {Inst}(\mathbf {T}))^{*}\) can be represented by suitably constructed infinite Tinstances. As a consequence of this, the pointwise (respectively, uniform) limits of Cauchy sequences of schema mappings can be represented by generalized schema mappings, i.e., schema mappings that allow for infinite target instances as solutions to finite source instances.
2 Preliminaries
This section contains a minimum amount of necessary background material.
Schemas, Instances, and Conjunctive Queries
A schema R is a finite sequence 〈R _{1},…, R _{ k }〉 of relation symbols, where each R _{ i } has a fixed arity. An instance I over R, or an R instance, is a sequence (R1I,…, R k I), where each \({R^{I}_{i}}\) is a finite relation of the same arity as R _{ i }. We will often use R _{ i } to denote both the relation symbol and the relation \({R^{I}_{i}}\) that interprets it. The active domain a d o m(I) of an instance I is the set of all values occurring in the relations of I. A fact of an instance I (over R) is an expression \({R_{i}^{I}}(t_{1}, \ldots , t_{m})\) (or simply R _{ i }(t _{1},…, t _{ m })), where R _{ i } is a relation symbol of R and \((t_{1},\ldots ,t_{m}) \in {R_{i}^{I}}\).
A conjunctive query is a firstorder formula of the form ∃z 𝜃(x, z), where 𝜃(x, z) is a conjunction of atomic formulas R _{ i }(v _{1},..., v _{ m }) and each v _{ j } is one of the variables in x and z. A boolean conjunctive query is a conjunctive query with no free variables. We write C Q for the class of all conjunctive queries over some schema. For every n ≥ 1, we let C Q _{ n } denote the class of all conjunctive queries with at most n variables. We also let C Q _{0} denote the singleton consisting of a trivially true query. If I is an instance and q is a conjunctive query, then we write q(I) for the result of evaluating q on I; in particular, for boolean conjunctive queries q we have that q(I) = t r u e if and only if I satisfies q.
Schema Mappings, Universal Solutions, Certain Answers
Motivated by the terminology in data exchange [6], we typically work with two schemas, a source schema S and a target schema T with no relation symbols in common. We refer to Sinstances as source instances, and to Tinstances as target instances. We assume that the values occurring in the active domains of instances come from two fixed countably infinite disjoint sets, the set Const of all constants and the set Null of (labeled) nulls. We also assume that the active domains of source instances consist entirely of constants; the active domains of target instances may contain both constants and nulls.
In its most general form, a schema mapping \(\mathcal {M}\) between a source schema S and a target schema T is a set of pairs (I, J), where I is source instance and J a target instance. To avoid anomalies that arise from such a relaxed notion, we will assume that a schema mapping \(\mathcal {M}\) must also possess a mild closure property, namely, that \(\mathcal {M}\) is closed under isomorphisms that rename nulls by other nulls. This is a natural “genericity” condition that is akin to the condition that database queries are closed under arbitrary isomorphisms. The precise definitions are as follows.
Definition 1
 An isomorphism that renames nulls between two target instances J and J ^{′} is a onetoone and onto function h : a d o m(J) → a d o m(J ^{′}) such that:
 (i)
If c is a constant in a d o m(J), then h(c) = c.
 (ii)
If w is a null in a d o m(J), then h(w) is also a null.
 (iii)
For every relation symbol R of T of arity m and for every tuple (a _{1},…, a _{ m }) of constants and nulls, we have that R ^{ J }(a _{1},…, a _{ m }) is a fact of J if and only if \(R^{J^{\prime }}(h(a_{1}),\ldots ,h(a_{m}))\) is a fact of J ^{′}.
In this case, we write J ^{′} = h(J) and say that J ^{′} is an isomorphic copy of J via an isomorphism that renames nulls.
 (i)

A schema mapping \(\mathcal {M}\) between S and T is a set of pairs (I, J), where I is source instance and J a target instance, such that the following holds: if a pair (I, J) is in \(\mathcal {M}\) and if J ^{′} = h(J) is an isomorphic copy of J via an isomorphism h that renames nulls, then also (I, J ^{′}) is in \(\mathcal {M}\).
A schema mapping is often (but not always) given as a triple \({\mathcal {M}} = (\mathbf {S}, \mathbf {T}, {\Sigma })\), where Σ is a set of formulas in some logical formalism such that \((I,J) \in \mathcal {M}\) if and only if I ∪ J⊧Σ. Clearly, if Σ is a set of firstorder formulas or a set of secondorder formulas, then \(\mathcal {M}\) is indeed closed under isomorphisms that rename nulls holds.
On the face of it, the definition of certain answers may entail computing an intersection of infinitely many sets. One of the main findings in [6] is that there is a notion of a “good” solution in data exchange, called universal solution, that can also be used to compute the certain answers of conjunctive queries in a much more direct way.
Let J _{1} and J _{2} be two target instances. A function h is a homomorphism from J _{1} to J _{2} if the following hold: (i) for every constant c, we have that h(c) = c; and (ii) for every relation symbol R in R and every tuple \((a_{1},\ldots ,a_{n})\in R^{J_{1}}\), we have that \((h(a_{1}),\ldots ,h(a_{n}))\in R^{J_{2}}\). We write J _{1} → J _{2} to denote that there is a homomorphism from J _{1} to J _{2}. We say that J _{1} is homomorphically equivalent to J _{2}, written J _{1} ⇔ J _{2}, if J _{1} → J _{2} and J _{2} → J _{1}.
Let I be a source instance. A universal solution for I w.r.t. \(\mathcal {M}\) is a solution J such that for every solution \(J^{\prime } \in \text {Sol}(I,\mathcal {M})\), we have that J → J ^{′}. Intuitively, a universal solution for I is a “most general” solution for I. We write \(\text {UnivSol}(I,\mathcal {M})\) to denote the set of all universal solutions for I w.r.t. \(\mathcal {M}\) (note that universal solutions need not always exist, so it is possible that \(\text {UnivSol}(I,\mathcal {M})= \emptyset \)). The following useful property of universal solutions was first identified in [6].
Proposition 1
Assume that \(\mathcal {M}\) is a schema mapping, I is a source instance, and J is a universal solution for I w.r.t. \(\mathcal {M}\).If q is a conjunctive query, then \(cert(q,I,\mathcal {M}) = q(J)_{\downarrow }\) , where q(J)_{ ↓ } is the set of all nullfree tuples in q(J).
Proof 1
First, assume that \(\textbf {t} \in cert(q,I,\mathcal {M})\). Then, as discussed earlier, t must be a nullfree tuple. Since J is a solution for I w.r.t. \(\mathcal {M}\), we have that t ∈ q(J), hence we have that t ∈ q(J) ↓. Next, assume that t is a nullfree tuple in q(J). If J ^{′} is an arbitrary solution for I w.r.t. \(\mathcal {M}\), then, since J is a universal solution for I w.r.t. I, there is a homomorphism h from J to J ^{′}. Since conjunctive queries are preserved under homomorphisms, it follows that h(t) = t ∈ q(J ^{′}). Thus, \(\textbf {t} \in cert(q,I,\mathcal {M})\). □
Structural Properties of Schema Mappings
We now present a number of structural properties that a schema mapping may or may not possess. These properties were investigated in their own right in [19], where they were used to obtain characterizations of schemamapping languages that will be of great interest to us in this paper.

\(\mathcal {M}\) allows for C Qrewriting if for every target conjunctive query q, there exists a union q ^{′} of source conjunctive queries such that \(cert(I,\mathcal {M},q) = q^{\prime }(I)\), for every source instance I.

\(\mathcal {M}\) admits universal solutions if for every source instance I, there is a universal solution for I w.r.t. \(\mathcal {M}\).

\(\mathcal {M}\) is closed under target homomorphisms if \((I,J) \in \mathcal {M}\) and J → J ^{′} imply that \((I,J^{\prime }) \in \mathcal {M}\).

\(\mathcal {M}\) is closed under unions if \((I_{1},J_{1}) \in \mathcal {M}\) and \((I_{2},J_{2}) \in \mathcal {M}\) imply that \((I_{1}\cup I_{2}, J_{1} \cup J_{2}) \in \mathcal {M}\).

\(\mathcal {M}\) is closed under target intersections if \(J_{1} \in \text {Sol}(I,\mathcal {M})\) and \(J_{2} \in \text {Sol}(I,\mathcal {M})\) imply that \((J_{1} \cap J_{2}) \in \text {Sol}(I,\mathcal {M})\).

\(\mathcal {M}\) is nmodular if whenever \((I,J) \notin \mathcal {M}\), there is a subinstance I ^{′}⊆ I with at most n elements in its active domain such that \((I^{\prime },J) \notin \mathcal {M}\) (“small counterexample”).
Schema Mapping Languages
A GLAV (globalandlocalasview) constraint is a firstorder formula of the form ∀x(φ(x) →∃y ψ(x, y)), where φ(x) is a conjunction of atoms over the source schema S, each variable in x occurs in at least one atom in φ(x), and ψ(x, y) is a conjunction of atoms over the target schema T with variables in x and y. We refer to φ(x) as the lefthand side, or premise, and ∃y ψ(x, y) as the righthand side, or conclusion of the constraint. Another name for GLAV constraints is sourcetotarget tuplegenerating dependencies or, in short, st tgds.
A LAV (localasview) constraint is a GLAV constraint whose lefthand side is a single atom over the source, while a GAV (globalasview) constraint is a GLAV constraint whose righthand side contains no existential quantifiers and consists of a single atom over the target. For example, ∀x, y(E(x, y) →∃z(F(x, z) ∧ F(z, y))) is a LAV constraint, and ∀x, y, z(E(x, z) ∧ E(z, y) → F(x, y)) is a GAV constraint.
A GLAV (globalandlocalasview) mapping is a schema mapping \(\mathcal {M}=(\mathbf {S}, \mathbf {T}, {\Sigma })\) such that Σ is a finite set of GLAV constraints. The notions of a LAV mapping and of a GAV mapping are defined analogously.
Every GLAV mapping \(\mathcal {M}\) admits universal solutions [6]; furthermore, given a source instance I, a canonical universal solution \(chase(I,\mathcal {M})\) can be produced via the oblivious chase procedure as follows: whenever the antecedent of an st tgd in \(\mathcal {M}\) becomes true, fresh null values are introduced and facts involving these nulls are added to \(chase(I,\mathcal {M})\), so that the conclusion of the st tgd becomes true. Every GLAV mapping is also known to allow for C Qrewriting and to be nmodular, for some n ≥ 1. Moreover, every LAV mapping is closed under unions, while every GAV mapping is closed under target intersections.
Every SO tgd allows for C Qrewriting and admits universal solutions; however, an SO tgd may not be closed under target homomorphisms and there may not exist any n ≥ 1 such that the SO tgd is nmodular (see [8, 19]).
Pseudometric Spaces and Metric Spaces
A pseudometric space is a pair (X, d), where X is a set and d is a function from X × X to the set R ^{+} of nonnegative real numbers with the following properties: (i) d(x, x) = 0, for every x in X; (ii) d(x, y) = d(y, x), for every x and y in X; (iii) d(x, y) ≤ d(x, z) + d(y, z), for every x, y, z in X (triangle inequality). A metric space is a pseudometric space (X, d) such that if d(x, y) = 0, then x = y. It is easy to see that if (X, d) is a pseudometric space, then the relation R _{ d } = {(x, y) ∈ X × X∣d(x, y) = 0} is an equivalence relation on X. From this, it follows that every pseudometric space (X, d) gives rise to a metric space \((\widehat {X},\widehat {d})\), where \(\widehat {X}\) is the set of equivalence classes of elements of X modulo the equivalence relation R _{ d } and \(\widehat {d}([x],[y]) = d(x,y)\).
Let (X, d) be a pseudometric space. A sequence of elements x _{1}, x _{2},… of X converges to an element x of X, denoted by \(\lim \limits _{n\to \infty } x_{n} = x\), if for every 𝜖 > 0, there is an integer n _{0} such that d(x _{ n }, x) < 𝜖, for every n ≥ n _{0}. We say that x is a limit of this sequence. The limit is unique if (X, d) is a metric space. A sequence x _{1}, x _{2},… of elements of X is Cauchy if for every 𝜖 > 0, there is an integer n _{0} such that \(d(x_{n},x_{n^{\prime }}) < \epsilon \), for every n, n ^{′}≥ n _{0}.
Using the triangle inequality, it is easy to see that if a sequence of elements in a (pseudo)metric space has a limit, then the sequence is Cauchy. The converse, however, does not hold for arbitrary (pseudo)metric spaces. A (pseudo)metric space (X, d) is complete if every Cauchy sequence of elements of X has a limit in X; otherwise, it is incomplete.
It is well known that every incomplete (pseudo)metric space (X, d) can be embedded into a complete (pseudo)metric space (X ^{∗}, d ^{∗}), called the completion of (X, d), in such a way that X is a dense subset of X ^{∗}, i.e., every member of X ^{∗} is the limit of a sequence of members of X. The members of X ^{∗} are equivalence classes of Cauchy sequences of X, where two Cauchy sequences x _{1}, x _{2},... and y _{1}, y _{2},… of elements of X are equivalent if \(\lim \limits _{n\to \infty } d(x_{n},y_{n}) = 0\), while the distance function d ^{∗} is defined as \(d^{*}([x_{1},x_{2},\ldots ],[y_{1},y_{2},\ldots ]) = \lim \limits _{n\to \infty } d(x_{n},y_{n})\). The proof of correctness of this construction can be found in [18] or any other book on metric spaces.
As a concrete example, the metric space of the real numbers is the completion of the metric space of the rational numbers (both with the standard distance).
3 Metric Space of Target Instances
To study the limits of sequences of schema mappings, we first introduce a pseudometric space of sets of target instances. By considering schema mappings as functions that map each source instance to the set of its solutions, we can view sequences of schema mappings as sequences of functions. The (pointwise or uniform) limit of a sequence of schema mappings is then simply defined in the standard way as the limit of a sequence of functions taking values in a pseudometric space. Moreover, by passing to the associated metric space of equivalence classes of sets of target instances, we ensure the uniqueness of the limit. If T is a schema, we write Inst(T) for the set of all finite instances of T. We also write \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) for the power set of Inst(T). The notion of distance on \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) that we are about to introduce is heavily based on the notion of the certain answers to conjunctive queries and on the idea that two members \(\mathcal {J}\) and \(\mathcal {J}^{\prime }\) of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) are “close” to each other if only “big” conjunctive queries can yield different certain answers on \(\mathcal {J}\) and \(\mathcal {J}^{\prime }\).
Definition 2
 Let q be a query over T and let \(\mathcal {J}\) be a member of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). The certain answers of q over \(\mathcal {J}\) are defined as$$cert(q,\mathcal{J}) = \bigcap \{q(J) \mid J \in \mathcal{J}\}.$$

We say that two sets of instances \(\mathcal {J}\) and \(\mathcal {J}^{\prime }\) in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) are C Qequivalent, denoted \(\mathcal {J} \equiv _{\mathsf {CQ}} \mathcal {J}^{\prime }\), if \(cert(q,\mathcal {J}) = cert(q,\mathcal {J}^{\prime })\) holds for all conjunctive queries q.

We say that \(\mathcal {J}\) and \(\mathcal {J}^{\prime }\) are C Q _{ n }equivalent, denoted \(\mathcal {J} \equiv _{\mathsf {CQ}_{n}} \mathcal {J}^{\prime }\), if it holds that \(cert(q,\mathcal {J}) = cert(q,{\mathcal {J}^{\prime }})\) for all conjunctive queries q with at most n variables (i.e., for all q in C Q _{ n }.)
Definition 3

\(sim(\mathcal {J},\mathcal {J}^{\prime }) = \max \{k \mid \mathcal {J} \equiv _{\mathsf {CQ}_{k}} \mathcal {J}^{\prime }\}\);

\(dist(\mathcal {J},\mathcal {J}^{\prime }) = 2^{sim(\mathcal {J},\mathcal {J}^{\prime })}\).
Definition 4
Let T be a schema. If J is a Tinstance, then we write v(J) to denote the member of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) consisting of all isomorphic copies of J via isomorphisms that rename nulls. In other words, v(J) consists of all Tinstances J ^{′} such that J ^{′} is isomorphic to J via an isomorphism h that maps each constant to itself and maps each null to a (possibly different) null.
The next lemma will be used repeatedly in the sequel.
Lemma 1
 1.
If J is a T instance whose active domain consists entirely of nulls and q is a nonboolean conjunctive query, then c e r t(q, v(J)) = ∅.
 2.
If J is a T instance whose active domain consists entirely of nulls and q is a boolean conjunctive query, then c e r t(q, v(J)) = q(J).
 3.If J and J ^{′} are T instances whose active domains consist entirely of nulls, then, for every k ≥ 1, the following statements are equivalent:
 (a)
\(v(J) \equiv _{\mathsf {CQ}_{k}} v(J^{\prime })\).
 (b)
J and J ^{′} satisfy the same boolean conjunctive queries in C Q _{ k }.
 (a)
Proof 2
For the first two parts of the lemma, let J be a Tinstance whose active domain consists entirely of nulls. For every nonboolean query q in C Q _{ k }, we have that c e r t(q, v(J)) = ∅, because v(J) contains instances with disjoint active domains. For every boolean query q, we have c e r t(q, v(J)) = q(J) for the following reason: first, J is a member of v(J), so if c e r t(q, v(J)) = t r u e, then q(J) = t r u e as well; second, since every member of v(J) is isomorphic to J and since boolean conjunctive queries are preserved under isomorphisms, we have that if q(J) = t r u e, then c e r t(q, v(J)) = t r u e.
For the third part of the lemma, let J and J ^{′} be Tinstances whose active domains consist entirely of nulls and let k be a positive integer. If \(v(J) \equiv _{\mathsf {CQ}_{k}} v(J^{\prime })\), then J and J ^{′} must satisfy the same boolean conjunctive queries in C Q _{ k } because J ∈ v(J) and J ^{′}∈ v(J ^{′}). For the converse, assume that J and J ^{′} satisfy the same boolean conjunctive queries in C Q _{ k }. We have to show that c e r t(q, v(J)) = c e r t(q, v(J ^{′})), for every conjunctive query q in C Q _{ k }. If q is a nonboolean conjunctive query in C Q _{ k }, then, by the first part of the lemma, we have that c e r t(q, v(J)) = ∅ = c e r t(q, v(J ^{′})). If q is a boolean query in C Q _{ k }, then, by the second part of the lemma and the hypothesis about J and J ^{′}, we have that c e r t(q, v(J)) = q(J) = q(J ^{′}) = c e r t(q, v(J ^{′})). □
The preceding lemma will be used in the next example, which presents a sequence from \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) that has a limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\).
Example 1
Let T be a schema consisting of a single binary relation E and let C _{ m } be the undirected cycle of length m, m ≥ 1, where the vertices of the cycle are pairwise distinct labeled nulls. Consider the sequence (v(C _{2n+1}))_{ n ≥ 1} arising from the cycles of odd size. Then, for every m ≥ 1, we have that \(\lim \limits _{n\to \infty }v(C_{2n+1}) = v(C_{2m})\). In particular, \(\lim \limits _{n\to \infty }v(C_{2n+1}) = v(C_{2})\).
We first show that v(C _{2m }) ≡_{ C Q } v(C _{2}), for every m ≥ 1. By Lemma 1, it suffices to show that C _{2m } and C _{2} satisfy the same boolean conjunctive queries. This is true because C _{2m } and C _{2} are homomorphically equivalent (and boolean conjunctive queries are preserved under homomorphisms). Indeed, there is a homomorphism from C _{2} to C _{2m } because C _{2} is a subgraph of C _{2m }, and there is a homomorphism from C _{2m } to C _{2} because C _{2m } is 2colorable.
We will show that \(\lim \limits _{n\to \infty }v(C_{2n+1}) = v(C_{2})\) by showing that for every k, there exists n _{0} such that for all n ≥ n _{0}, we have that \(v(C_{2n+1}) \equiv _{\mathsf {CQ}_{k}} v(C_{2})\). For this, we take n _{0} = k and show that if n ≥ k, then \(v(C_{2n+1}) \equiv _{\mathsf {CQ}_{k}} v(C_{2})\). By the third part of Lemma 1, it suffices to show if q is a boolean conjunctive query in C Q _{ k }, then q(C _{2n+1}) = q(C _{2}). Since C _{2} is a subgraph of C _{2n+1}, we have that if q(C _{2}) = t r u e, then also q(C _{2n+1}) = t r u e. Assume that q(C _{2n+1}) = t r u e. Since q ∈C Q _{ k }, there is a subgraph H of C _{2n+1} with at most k distinct nodes such that q(H) = t r u e. Since 2n + 1 > n ≥ k, we have that H is a proper subgraph of C _{2n+1}. Consequently, H is 2colorable and so there is a homomorphism from H to C _{2}, which, in turn, implies that q(C _{2}) = t r u e.
In contrast to what we have Just seen, there are Cauchy sequences of elements of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) that have no limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). Thus, the pseudometric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\) is incomplete.
Proposition 2
Let T be a schema consisting of a single binary relation E and let K _{ n } be the clique of size n, for n ≥ 1, where the vertices are pairwise distinct labeled nulls. The sequence (v(K _{ n }))_{ n ≥ 1} is Cauchy, but has no limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\).
Proof 3
The sequence (v(K _{ n }))_{ n ≥ 1} is Cauchy, because if m ≥ n, then v(K _{ m }) and v(K _{ n }) satisfy the same conjunctive queries in C Q _{ n }. To show this, by the third part of Lemma 1, it suffices to show that if m ≥ n, then K _{ m } and K _{ n } satisfy the same boolean conjunctive queries in C Q _{ n }. Let q be a boolean conjunctive query in C Q _{ n }. Since K _{ n } is a subgraph of K _{ m }, if q(K _{ n }) = t r u e, then q(K _{ m }) = t r u e. Conversely, if q(K _{ m }) = t r u e, then there is a subgraph H of K _{ m } with at most n distinct nodes such that q(H) = t r u e. But then H is also a subgraph of K _{ n }, hence q(K _{ m }) = t r u e.
It remains to show that the sequence (v(K _{ n }))_{ n ≥ 1} has no limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). Assume to the contrary that there does exist a set \(\mathcal {J}\) of finite instances over T such that \(\lim \limits _{n\to \infty } v(K_{n})= \mathcal {J}\). We distinguish three cases.
First, if \(\mathcal {J}=\emptyset \), then \(cert(q,\mathcal {J})= {\mathit {true}}\), for every conjunctive query q. In particular, this holds for the query q = ∃x E(x, x), which asserts the existence of a selfloop. In contrast, for this conjunctive query, we have that c e r t(q, v(K _{ n })) = f a l s e, for every n ≥ 1, since K _{ n } ∈ v(K _{ n }) and none of the graphs K _{ n }, n ≥ 1 contains a selfloop.
Second, if \(\mathcal {J} \neq \emptyset \) and if every member J of \(\mathcal {J}\) contains a selfloop, then we again consider the query q = ∃x E(x, x). We thus have \(cert(q, \mathcal {J}) = {\mathit {true}}\), whereas c e r t(q, v(K _{ n })) = f a l s e, for every n ≥ 1.
Since (v(K _{ n })_{ n ≥ 1}) is a Cauchy sequence, it has a limit in the completion of \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\). As we will see in Section 6, a concrete representation of this limit is the set consisting of all disjoint unions of cliques of all finite sizes in which every node is a null.
The following definitions are perfectly meaningful for every pseudometric space (X, d) and for every sequence of functions taking values in X. For concreteness, we give the definitions for sequences of functions taking values in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\).
Definition 5

We say that (f _{ n })_{ n ≥ 1} converges pointwise to f , denoted as \(\lim \limits _{n\to \infty }^{p} f_{n} = f\), if for every element x ∈ A, we have that \(\lim \limits _{n\to \infty } f_{n}(x) = f(x)\).

We say that (f _{ n })_{ n ≥ 1} converges uniformly to f , denoted as \(\lim \limits _{n\to \infty }^{u} f_{n} = f\), if for every 𝜖 > 0, there exists an integer n _{0} ≥ 1 such that for every integer n ≥ n _{0} and for every element x ∈ A, we have d i s t(f _{ n }(x), f(x)) < 𝜖.

We say that (f _{ n })_{ n ≥ 1} is pointwise Cauchy, if for every element x ∈ A, the sequence (f _{ n }(x))_{ n ≥ 1} is Cauchy.

We say that (f _{ n })_{ n ≥ 1} is uniformly Cauchy, if for every 𝜖 > 0, there exists an integer n _{0} ≥ 1 such that for all integers n, n ^{′}≥ n _{0} and for every element x ∈ A, we have \(dist(f_{n}(x), f_{n^{\prime }}(x)) < \epsilon \).
Clearly, if (f _{ n })_{ n ≥ 1} converges pointwise (resp., uniformly), then (f _{ n })_{ n ≥ 1} is pointwise (resp., uniformly) Cauchy. The converse is not in general true for arbitrary (pseudo)metric spaces; in particular, it is not true for the pseudometric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\), as we shall see later on.
We now bring schema mappings into the picture. Every schema mapping \(\mathcal {M}\) over a source schema S and a target schema T can be identified with a function \(f \colon \text {Inst}(\mathbf {S}) \longrightarrow \mathcal {P}(\text {Inst}(\mathbf {T}))\), where \(f(I) = \text {Sol}(I,\mathcal {M})\) (recall that \(\text {Sol}(I,\mathcal {M})\) is the set of all solutions of I w.r.t. \(\mathcal {M}\), i.e., the set of all finite T instances J such that \((I,J)\in \mathcal {M}\)). Thus, a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings over a source schema S and target schema T can be viewed as a sequence of functions from Inst(S) to \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). Therefore, we can talk about a sequence of schema mappings being pointwise Cauchy and uniformly Cauchy if the sequence of the associated functions has these properties. Similarly, we say that a sequence of schema mappings has a pointwise limit (resp., a uniform limit) if the sequence of the associated functions converges pointwise (resp., converges uniformly) to a schema mapping.
The preceding notion of convergence of a sequence of schema mappings allows us to draw a connection to earlier work on schema mapping optimization [5, 7]. Here, we are considering C Qequivalence and C Q _{ n }equivalence of sets of instances. In previous works, these notions of equivalence have been mainly applied to schema mappings (see, e.g., [5, 7, 14]). Specifically, two schema mappings \(\mathcal {M}, \mathcal {M}^{\prime }\) are C Qequivalent (resp., C Q _{ n }equivalent) if for every target conjunctive query q (resp., every target conjunctive query q in C Q _{ n }) and every source instance I, we have that \(cert(q, I, \mathcal {M}) = cert(q, I, \mathcal {M}^{\prime })\). In this case, we write \(\mathcal {M} \equiv _{\mathsf {CQ}} \mathcal {M}^{\prime }\) (resp., \(\mathcal {M} \equiv _{\mathsf {CQ}_{n}} \mathcal {M}^{\prime }\)). The notion of C Q _{ n }equivalence has been studied in the context of schema mapping optimization [5, 7]. Below we discuss its relationship to the convergence of schema mappings.
Proposition 3
Consider a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings and a schema mapping \(\mathcal {M}\).Then \(\lim \limits _{n\to \infty }^{u} \mathcal {M}_{n} = \mathcal {M}\) if and only if for every integer k ≥ 1, there is an integer n _{0} ≥ 1such that for all integers n ≥ n _{0} , we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{k}} \mathcal {M}\).
Proof 4
The result follows by unfolding and comparing the definitions. Specifically, \(\lim \limits _{n\to \infty }^{u} \mathcal {M}_{n} = \mathcal {M}\) means that for every 𝜖 > 0, there is an integer n _{0} ≥ 1 such that for every integer n ≥ n _{0} and for every source instance I we have that \(dist(\text {Sol}(I,\mathcal {M}_{n}), \text {Sol}(I,\mathcal {M}))< \epsilon \). In turn, this means that for every integer k ≥ 1, there is an integer n _{0} ≥ 1 such that for every integer n ≥ n _{0} and for every source instance I we have that \(\text {Sol}(I,M_{n}) \equiv _{\mathsf {CQ}_{k}} \text {Sol}(I,\mathcal {M})\). Thus, for every integer k ≥ 1, there is an integer n _{0} ≥ 1 such that for every integer n ≥ n _{0}, we have that \(\mathcal {M}_{n}\equiv _{\mathsf {CQ}_{k}}\mathcal {M}\). □
Intuitively, the preceding proposition states that it takes bigger and bigger conjunctive queries to distinguish the members of a sequence \((\mathcal {M}_{n})_{n\geq 1}\) from its uniform limit.
Although never explicitly introduced, the notion of uniform convergence was implicit in [5], where it was shown that for every SO tgd σ and for every n ≥ 1, there is a GLAV mapping \(\mathcal {M}_{n}\) such that \(\sigma \equiv _{\mathsf {CQ}_{n}} \mathcal {M}_{n}\). From this, it is easy to see that \(\lim \limits _{n\to \infty }^{u} \mathcal {M}_{n} = \sigma \). Thus, we have the following result.
Theorem 1
(implicit in[5]) Every SO tgd is a uniform limit of a sequence of GLAV mappings.
As stated earlier, \((\mathcal {P}(\text {Inst}(\mathbf {T})), dist)\) is a pseudometric space since it cannot distinguish C Qequivalent sets of instances. Consequently, the limit of a sequence of sets of instances and the (uniform or pointwise) limit of a sequence of mappings need not be unique. However, the limit is unique up to C Qequivalence and, as described in Section 2, there is an associated metric space \((\widehat {\mathcal {P}(\text {Inst}(\mathbf {T}))}, \widehat {dist})\) obtained by considering the equivalence classes of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) modulo the equivalence relation R _{ d i s t }, where \((\mathcal {J},\mathcal {J}^{\prime }) \in R_{dist}\) if and only if \(dist(\mathcal {J},\mathcal {J}^{\prime }) = 0\) (i.e., if and only if \(\mathcal {J} \equiv _{\mathsf {CQ}} \mathcal {J}^{\prime }\)).
In subsequent sections, we will work with the metric space \((\widehat {\mathcal {P}(\text {Inst}(\mathbf {T}))}, \widehat {dist})\). Moreover, we will be interested in schema mappings modulo C Qequivalence, which means that from now on we will view schema mappings as functions from source instances to equivalence classes of sets of target instances modulo C Qequivalence. However, for notational simplicity, we will work each time with representatives of the equivalence classes. By a slight abuse of notation, we will write \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\), instead of \((\widehat {\mathcal {P}(\text {Inst}(\mathbf {T}))}, \widehat {dist})\). Likewise, we will not explicitly distinguish between a schema mapping \(\mathcal {M}\) and the equivalence class of the schema mappings that are C Qequivalent to \(\mathcal {M}\).
4 Limits of Sequences of GAV Mappings
Our goal in this section is to analyze sequences of GAV mappings. To this effect, we first investigate the existence of limits of such sequences and then examine the definability of limits. As discussed in Section 3, if a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings has a pointwise (resp., uniform) limit, then the sequence is pointwise (resp., uniformly) Cauchy. The next result asserts that the converse holds for sequences of GAV mappings.
Theorem 2

If \((\mathcal {M}_{n})_{n\geq 1}\) is pointwise Cauchy, then it has a pointwise limit.

If \((\mathcal {M}_{n})_{n\geq 1}\) is uniformly Cauchy, then it is eventually constant and thus has a GAV schema mapping as a uniform limit.
Proof 5
We consider GAV mappings over a source schema S and a target schema T. Let r denote the maximum arity of the relation symbols in T. For showing the first claim, assume that \((\mathcal {M}_{n})_{n\geq 1}\) is a pointwise Cauchy sequence of schema mappings and let I be a source instance. For each n ≥ 1, consider the universal solution \(chase(I,\mathcal {M}_{n})\) for I w.r.t. \(\mathcal {M}_{n}\) obtained by using the oblivious chase procedure. Since each \(\mathcal {M}_{n}\) is a GAV schema mapping, we have that \(chase(I,\mathcal {M}_{n})\) contains constants from the active domain of I and no nulls. We claim that there exists some n _{0} such that for all n ≥ n _{0}, we have that \(chase(I,\mathcal {M}_{n}) = chase(I, \mathcal {M}_{n_{0}})\). In other words, we claim that the sequence \((chase(I,\mathcal {M}_{n}))_{n\geq 1}\) is eventually constant (does not oscillate). Since every instance in the sequence \((chase(I,\mathcal {M}_{n}))_{n\geq 1}\) has no nulls, it can be identified by evaluating on that instance the atomic queries R(x _{1},…, x _{ k }), where R ranges over the relation symbols of T and k (with k ≤ r) denotes the arity of R. The assumption that the sequence \((chase(I,\mathcal {M}_{n}))_{n \geq 1}\) is pointwise Cauchy implies that there exists a positive integer n _{0} (that depends on I and r) such that for every integer n ≥ n _{0} and every conjunctive query q ∈C Q _{ r }, we have that \(cert(q,I,\mathcal {M}_{n}) = cert(q,I,\mathcal {M})\). This implies that \(q(chase(I,\mathcal {M}_{n})) = q(chase(I,\mathcal {M}_{n_{0}}))\) and, consequently, for every n ≥ n _{0}, we have that \(chase(I,\mathcal {M}_{n})) = chase(I,\mathcal {M}_{n_{0}})\).
We have Just shown that if \((\mathcal {M}_{n})_{n\geq 1}\) is a pointwise Cauchy sequence of GAV mappings, then for every I, there exists a positive integer m _{ I } such that \(chase(I,\mathcal {M}_{m_{I}}) = chase(I,\mathcal {M}_{n})\), for all n ≥ m _{ I }. It follows that the schema mapping \(\mathcal {M} = \{(I, chase(I,\mathcal {M}_{m_{I}}))\mid I \text { is a source instance}\}\) is a pointwise limit of the sequence \((\mathcal {M}_{n})_{n\geq 1}\). Note that \(\mathcal {M}\) is indeed a schema mapping because \(chase(I,\mathcal {M}_{m_{I}})\) contains no nulls.
For showing the second claim, assume that \((\mathcal {M}_{n})_{n\geq 1}\) is a uniformly Cauchy sequence of GAV mappings. We claim that \((\mathcal {M}_{n})_{n\geq 1}\) is eventually constant, i.e., there is some n _{0} such that for all n ≥ n _{0}, \(\mathcal {M}_{n}\equiv _{\mathsf {CQ}} \mathcal {M}_{n_{0}}\) holds. For this, we repeat the previous argument, but also note that, since the sequence \((\mathcal {M}_{n})_{n\geq 1}\) is uniformly Cauchy, there exists a positive integer n _{0} that depends only on r such that for every source instance I, for every integer n ≥ n _{0} and every conjunctive query q ∈C Q _{ r }, we have that \(cert(q,I,\mathcal {M}_{n}) = cert(q,I,\mathcal {M})\). This implies that for every source instance I and every n ≥ n _{0}, we have that \(q(chase(I,\mathcal {M}_{n})) = q(chase(I,\mathcal {M}_{n_{0}}))\); consequently, for every source instance I and every n ≥ n _{0}, we have that \(chase(I,\mathcal {M}_{n})) = chase(I,\mathcal {M}_{n_{0}})\). □
Next, we point out that, for sequences of GAV mappings, the notions of pointwise convergence and uniform convergence are genuinely different.
Proposition 4
There exists a sequence of GAV mappings that has a GAV mapping as a pointwise limit, but has no uniform limit.
Proof 6
For every n ≥ 2, let \(q_{n} = \bigwedge _{1 \leq i < j \leq n}(E(x_{i},x_{j}) \wedge E(x_{j},x_{i})).\) Intuitively, if E is interpreted as edge relation, then q _{ n } yields a nonempty answer over any graph that contains a selfloop or a clique of size n. Let S be a source schema consisting of a binary relation symbol E and a unary relation symbol P, let T be a target schema consisting of a unary relation symbol P ^{′}. Let \((\mathcal {M}_{n})_{n\geq 1}\) be the sequence of GAV mappings, where \(\mathcal {M}_{n}\) is specified by the constraint ∀x∀x _{1},…, x _{ n+1}(P(x) ∧ q _{ n+1} → P ^{′}(x)). Intuitively, \(\mathcal {M}_{n}\) is a “copy” schema mapping, but the copying action is triggered only if the source instance contains a selfloop or a clique of size n + 1. We will show that the GAV schema mapping \(\mathcal {M} = \{\forall x\forall y(P(x) \wedge E(y,y) \rightarrow P^{\prime }(x))\}\) is a pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\), but that this pointwise limit is not a uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\) and thus no uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\) exists.

If I contains a selfloop, then J = {P ^{′}(x)∣P(x) ∈ I} is a universal solution for I w.r.t. \(\mathcal {M}\) and w.r.t. \(\mathcal {M}_{n}\), for all n. Thus, \(cert(q,I,\mathcal {M}) = cert(q,I,\mathcal {M}_{n})\), for all n.

If I is selfloop free, let n _{0} be such that no clique larger than n _{0} exists in I. Then, J = ∅ is a universal solution for I w.r.t. \(\mathcal {M}\) and w.r.t. \(\mathcal {M}_{n}\), for all n ≥ n _{0}. Thus, \(cert(q,I,\mathcal {M}) = cert(q,I,\mathcal {M}_{n})\), for all n ≥ n _{0}.
Next, we show that \((\mathcal {M}_{n})_{n\geq 1}\) has no uniform limit. Towards a contradiction, suppose that such a uniform limit exists. Every uniform limit is also a pointwise limit; moreover, pointwise and uniform limits are unique up to C Qequivalence. Hence, since the schema mapping \(\mathcal {M}\) defined above is a pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\), it follows that \(\mathcal {M}\) is also a uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\). Let m = 1. Then there exists an n _{0} such that for all n ≥ n _{0} we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{1}} \mathcal {M}\). Take n = n _{0}. Let I be the source instance K _{ n+1} ∪{P(c)} and let q be the target conjunctive query ∃x P ^{′}(x). We now claim that \(cert(q,I,\mathcal {M}_{n}) \neq cert(q,I,\mathcal {M})\), which contradicts the previously derived fact that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{1}} \mathcal {M}\). Indeed, since I contains a clique of size n + 1, we have P(c) is a universal solution for I w.r.t. \(\mathcal {M}_{n}\), hence \(cert(q,I,\mathcal {M}_{n}) = \mathit {true}\). However, since I contains no selfloop, we have that ∅ is a universal solution for I w.r.t. \(\mathcal {M}\), hence \(cert(q,I,\mathcal {M}) = \mathit {false}\). □
Proposition 4 and Theorem 2 imply that the sequence of GAV mappings in the proof of Proposition 4 is an example of a pointwise Cauchy sequence that is not uniformly Cauchy. Theorem 2 also implies that if a sequence of GAV mappings has a uniform limit, then it must have a GAV mapping as such a limit. In turn, this gives rise to the following natural question concerning the definability of pointwise limits: if a sequence of GAV mappings has a pointwise limit, does it have a GAV mapping as such a limit? We answer this question in the negative by showing that even the much richer language of SO tgds cannot express pointwise limits of sequences of GAV mappings.
Proposition 5
There is a pointwise Cauchy sequence of GAV schema mappings such that no SO tgd is a pointwise limit of that sequence.
Proof 7
We have just seen that there are sequences of GAV mappings that have a pointwise limit, but no such limit is definable by a GAV mapping. This raises the question of finding necessary and sufficient conditions guaranteeing that a sequence of GAV mappings has a GAV mapping as a pointwise limit. The next result provides an answer to this question.
Theorem 3
 1.
\((\mathcal {M}_{n})_{n\geq 1}\) has a GAV mapping as a pointwise limit.
 2.
\((\mathcal {M}_{n})_{n\geq 1}\) has a pointwise limit that allows for C Q rewriting.
Proof 8
 (a)
\((\mathcal {M}_{n})_{n\geq 1}\) has a GAV mapping as a pointwise limit.
 (b)
\((\mathcal {M}_n)_{n\geq 1}\) has a pointwise limit that allows for C Qrewriting.
 (c)
\(\mathcal {M}^{\star }\) allows for C Qrewriting.
 (d)
\(\mathcal {M}^{\star }\) is logically equivalent to a GAV mapping.

(a) ⇒ (b) This is true because every GAV mapping allows for C Qrewriting.

(b) ⇒ (c) This is true because if \(\mathcal {M}^{\prime }\) is a pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\) that allows for C Qrewriting, then so does \(\mathcal {M}^{\star }\) since \({\mathcal {M}^{\prime }} \equiv _{\mathsf {CQ}} \mathcal {M}^{\star }\).

(c) ⇒ (d) This is the most involved part of the proof. Let us examine the structural properties that the schema mapping \(\mathcal {M}^{\star }\) possesses. By hypothesis, \(\mathcal {M}^{\star }\) allows for C Qrewriting. By construction, \(\mathcal {M}^{\star }\) admits universal solutions, since \(chase(I,\mathcal {M}_{m_{I}})\) is a universal solution for I w.r.t. \(\mathcal {M}^{\star }\), for every source instance I. Moreover, it is clear from its definition that \(\mathcal {M}^{\star }\) is closed under target homomorphisms. Finally, we claim that \(\mathcal {M}^{\star }\) is closed under target intersections. Indeed, assume that both (I, J _{1}) and (I, J _{2}) are in \(\mathcal {M}^{\star }\). Then \(chase(I,\mathcal {M}_{m_{I}})\) is contained in both J _{1} and J _{2}, hence \(chase(I,\mathcal {M}_{m_{I}})\) is contained in J _{1} ∩ J _{2}, hence J _{1} ∩ J _{2} is a solution for I w.r.t. \(\mathcal {M}^{\star }\).

(d) ⇒ (a) This is obvious since \(\mathcal {M}^{\star }\) is a pointwise limit of \((\mathcal {M}_n)_{n\geq 1}\).
Observe that Theorem 3 (and its proof) provide necessary and sufficient conditions for a pointwise Cauchy sequence of GAV mappings to have a GAV mapping as a pointwise limit, but these conditions are on the pointwise limit and not on the sequence itself. By analyzing the proof of Theorem 3, however, it is possible to extract a necessary and sufficient condition on the sequence itself. For this, we need to introduce the following concept.
Definition 6
Let \((\mathcal {M}_{n})_{n\geq 1}\) be a sequence of schema mappings. We say that \((\mathcal {M}_{n})_{n\geq 1}\) allows for C Q rewriting if for every target conjunctive query q, there is a union q ^{′} of source conjunctive queries having the following property: for every source instance I, there is a positive integer n _{ I } such that \(cert(q,I,\mathcal {M}_{n}) = q^{\prime }(I)\), for every n ≥ n _{ I }.
Let \(\mathcal {M}\) be a pointwise limit of a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings. It is easy to show that \(\mathcal {M}\) allows for C Qrewriting if and only if \((\mathcal {M}_{n})_{n\geq 1}\) allows for C Qrewriting. Indeed, assume first that \(\mathcal {M}\) allows for C Qrewriting. To show that \((\mathcal {M}_{n})_{n\geq 1}\) allows for C Qrewriting, let q be a conjunctive query and let q ^{′} be a union of conjunctive queries such that \(cert(q,I,\mathcal {M}) = q^{\prime }(I)\), for every source instance I. Since \(\mathcal {M}\) is a pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\), for every instance I, there is a positive integer \(n^{\prime }_{I}\) such that \(cert(q,I,\mathcal {M}) = cert(q,I,\mathcal {M}_{n})\), for every \(n\geq n^{\prime }_{I}\). It follows that \(cert(q,I,\mathcal {M}_{n}) = q^{\prime }(I)\), for every \(n\geq n^{\prime }_{I}\), which shows that \((\mathcal {M}_{n})_{n\geq 1}\) allows for C Qrewriting. In the other direction, assume that \((\mathcal {M}_{n})_{n\geq 1}\) allows for C Qrewriting. To show that \(\mathcal {M}\) allows for C Qrewriting, let q be a conjunctive query and let q ^{′} be a union of conjunctive queries such that for every source instance I, there is a positive integer n _{ I } such that \(cert(q,I,\mathcal {M}_{n}) = q^{\prime }(I)\), for every n ≥ n _{ I }. By the pointwise convergence of \((\mathcal {M}_{n})_{n\geq 1}\) to \(\mathcal {M}\), for every source instance I, there is a positive integer \(n^{\prime }_{I}\) such that \(cert(q,I,\mathcal {M}) = cert(q,I,\mathcal {M}_{n})\), for every \(n\geq n^{\prime }_{I}\). Let I be a source instance. By taking any \(n\geq \max \{n_{I},n^{\prime }_{I}\}\), we have that \(cert(q,I,\mathcal {M}) = cert(q,I,M_{n})\) and c e r t(q, I, M _{ n }) = q ^{′}(I), hence \(cert(q,I,\mathcal {M}) = q^{\prime }(I)\), which shows that \(\mathcal {M}\) allows for C Qrewriting.
By combining the preceding remarks with Theorems 2 and 3, we obtain the following result.
Corollary 1
 1.
\((\mathcal {M}_{n})_{n\geq 1}\) has a GAV mapping as a pointwise limit.
 2.
\((\mathcal {M}_{n})_{n\geq 1}\) allows for C Q rewriting.
Since every schema mapping specified by an SO tgd allows for C Qrewriting, Theorem 3 also implies the following result.
Corollary 2
 1.
\((\mathcal {M}_{n})_{n\geq 1}\) has a GAV mapping as a pointwise limit.
 2.
\((\mathcal {M}_{n})_{n\geq 1}\) has an SO tgd as a pointwise limit.
 (1)
No pointwise limit allows for C Qrewriting and no GAV mapping is a pointwise limit.
 (2)
Every pointwise limit admits C Qrewriting and there is a GAV mapping that is a pointwise limit. Moreover, this happens precisely when the schema mapping \(\mathcal {M}^{\star }\) in the proof of Theorem 3 allows for C Qrewriting or, equivalently, when \(\mathcal {M}^{\star }\) is logically equivalent to a GAV mapping.
5 Limits of Sequences of LAV Mappings
In this section, we investigate the existence and definability of limits of sequences of LAV mappings. In fact, we will consider a much broader class of GLAV mappings, namely kpremisebounded GLAV mappings for arbitrary k ≥ 1. LAV mappings correspond to the special case of k = 1.
Definition 7
Let \(\mathcal {M}\) be a GLAV mapping and k a positive integer. We call \(\mathcal {M}\) a kpremisebounded GLAV mapping if the premise of every constraint in \(\mathcal {M}\) has at most k atoms.
Let \((\mathcal {M}_{n})_{n\geq 1}\) be a sequence of GLAV mappings. We say that \((\mathcal {M}_{n})_{n\geq 1}\) is premisebounded if there exists an integer k such that every element \(\mathcal {M}_{n}\) of \((\mathcal {M}_{n})_{n\geq 1}\) is kpremise bounded.
Unlike the case of GAV mappings, the notions of pointwise Cauchy and uniformly Cauchy sequences of premisebounded GLAV mappings coincide. Moreover, the same holds true for the notions of pointwise limit and uniform limit of sequences of such schema mappings.
Theorem 4
 (1)
The sequence \((\mathcal {M}_{n})_{n\geq 1}\) is pointwise Cauchy if and only if it is uniformly Cauchy.
 (2)
The sequence \((\mathcal {M}_{n})_{n\geq 1}\) has a pointwise limit if and only if it has a uniform limit.
Proof 9
We prove the first part and then use it to prove the second part.
Part 1. It is obvious that every uniformly Cauchy sequence of mappings is also pointwise Cauchy. We focus on the reverse direction. Let \((\mathcal {M}_{n})_{n\geq 1}\) be a pointwise Cauchy sequence of premise bounded GLAV mappings. We have to show that for every m, there is an N _{0} such that for all n, n ^{′}≥ N _{0}, we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{m}} \mathcal {M}_{n^{\prime }}\).
Fix an integer m. Since \((\mathcal {M}_{n})_{n\geq 1}\) is pointwise Cauchy, for every source instance I, there is an integer n _{0}(I) such that for all n, n ^{′}≥ n _{0}(I) and for every conjunctive query q in C Q _{ m }, we have that \(cert(q,I,\mathcal {M}_{n}) = cert(q,I,\mathcal {M}_{n^{\prime }})\). Let p be the number of relation symbols in the target schema, let r be their maximum arity, and let k be the bound on the number of atoms in the premises of the members of the sequence \((\mathcal {M}_{n})_{n\geq 1}\). We write \(\mathcal {I}\) to denote the class of all source instances with at most k ⋅ p ⋅ m ^{ r } atoms. Clearly, up to isomorphism, there are only finitely many instances \(I \in \mathcal {I}\). Moreover, if I ^{′}≅I ^{″}, then n _{0}(I ^{′}) = n _{0}(I ^{″}). Consequently, the quantity \(N_{0} = \mathop {max}\{n_{0}(I) \mid I \in \mathcal {I}\}\) is a positive integer. We claim that for all n, n ^{′}≥ N _{0}, we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{m}} \mathcal {M}_{n^{\prime }}\).
Let I be an arbitrary source instance and let q be an arbitrary conjunctive query in C Q _{ m }. We have to show that \(cert(q,I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M}_{n^{\prime }})\), for all n, n ^{′}≥ N _{0}. Let a be a tuple of constants such that \(\mathbf {a} \in cert(q, I, \mathcal {M}_{n})\), hence \(\mathbf {a} \in q(chase(I, \mathcal {M}_{n}))_{\downarrow }\). Since the query q has at most m variables, it must consist of at most p ⋅ m ^{ r } atoms. Let \(h: \mathit {atoms}(q) \to chase(I, \mathcal {M}_{n})\) be a homomorphism establishing that \(\mathbf {a} \in q(chase(I,\mathcal {M}_{n}))\). It follows that there are at most p ⋅ m ^{ r } facts in \(chase(I,\mathcal {M}_{n})\) witnessing that \(\mathbf {a} \in q(chase(I, \mathcal {M}_{n}))_{\downarrow }\). Each of these facts must be produced in a single step while chasing the source instance I with \(\mathcal {M}_{n}\), which implies that each of these facts is produced using at most k facts from I. Let I ^{∗} be the subinstance of I consisting of all the aforementioned facts of I used to produce the facts in \(chase(I,\mathcal {M}_{n})\) witnessing that \(\mathbf {a} \in q(chase(I, \mathcal {M}_{n}))_{\downarrow }\). We then have that I ^{∗}≤ k ⋅ p ⋅ m ^{ r } and \(\mathbf {a} \in q(chase(I^{*}, \mathcal {M}_{n}))_{\downarrow }\). Since n, n ^{′}≥ N _{0}, we have that \(q(chase(I^{*}, \mathcal {M}_{n}))_{\downarrow } = q(chase(I^{*}, \mathcal {M}^{\prime }_{n}))_{\downarrow }\), hence \(\mathbf {a} \in q(chase(I^{*}, \mathcal {M}^{\prime }_{n}))_{\downarrow }\). By the monotonicity of the chase procedure, we have that \(\mathbf {a} \in q(chase(I, \mathcal {M}^{\prime }_{n}))_{\downarrow }\). It follows that \(q(chase(I, \mathcal {M}_{n}))_{\downarrow } \subseteq q(chase(I, \mathcal {M}^{\prime }_{n}))_{\downarrow }\). A symmetric argument establishes the containment \(q(chase(I, \mathcal {M}^{\prime }_{n}))_{\downarrow } \subseteq q(chase(I, \mathcal {M}_{n}))_{\downarrow }\), hence \(q(chase(I, \mathcal {M}_{n}))_{\downarrow } = q(chase(I, \mathcal {M}^{\prime }_{n}))_{\downarrow }\), which, in turn, implies that \(cert(q,I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M}_{n^{\prime }})\).
Part 2. It is obvious that if a sequence of schema mappings has a uniform limit, then it has a pointwise limit. We focus on the reverse direction. Let \((\mathcal {M}_{n})_{n\geq 1}\) be a sequence of premise bounded GLAV mappings that has a pointwise limit \(\mathcal {M}\). We claim that \(\mathcal {M}\) is also a uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\).
Since \((\mathcal {M}_{n})_{n\geq 1}\) has a pointwise limit, we have that \((\mathcal {M}_{n})_{n\geq 1}\) is pointwise Cauchy. The previous part implies that \((\mathcal {M}_{n})_{n\geq 1}\) is uniformly Cauchy as well. Fix an integer m. Since \((\mathcal {M}_{n})_{n\geq 1}\) is uniformly Cauchy, there exists an n _{0} such that for all n, n ^{′}≥ n _{0}, we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{m}} \mathcal {M}_{n^{\prime }}\). We claim that also \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{m}} \mathcal {M}\) holds, for every n ≥ n _{0}. To show this, fix some n ≥ n _{0} and let I be a source instance and q a conjunctive query in C Q _{ m }. We have to show that \(cert(q,I,\mathcal {M}_{n}) = cert(q,I,\mathcal {M})\). Since \(\mathcal {M}\) is a pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\), there is an \(n^{\prime }_{0}(I)\) such that for all \(n^{\prime } \geq n^{\prime }_{0}(I)\), we have that \(cert(q,I,\mathcal {M}_{n^{\prime }})=cert(q,I,\mathcal {M})\). Take an integer n ^{′} such that \(n^{\prime } \geq \max \{n_{0}, n^{\prime }_{0}(I)\}\). Since n ^{′}≥ n _{0}, we have that \(cert(q,I,\mathcal {M}_{n}) = cert(q,I,\mathcal {M}_{n^{\prime }})\). Since \(n^{\prime }\geq n^{\prime }_{0}(I)\), we have that \(cert(q,I,\mathcal {M}_{n^{\prime }}) = cert(q,I,\mathcal {M})\). Thus, \(cert(q,I,\mathcal {M}_{n}) = cert(q,I,\mathcal {M})\). □
Note that the preceding proof of Part 2 used only the hypothesis that the sequence \((\mathcal {M}_{n})_{n\geq 1}\) is uniformly Cauchy and the fact that the sequence \((\mathcal {M}_{n})_{n\geq 1}\) has a pointwise limit, as we have proved in Part 1. As a matter of fact, this is an instance of a general result about pseudometric spaces, namely, that if a uniformly Cauchy sequence of functions converges pointwise, then it also converges uniformly.
The following two propositions further demarcate the differences between GAV and premisebounded GLAV mappings. In fact, these differences are already witnessed by sequences of LAV mappings. The first difference concerns the existence of limits of uniformly Cauchy sequences. In contrast to the GAV case, uniformly Cauchy sequences of LAV mappings may have no uniform limit; in fact, they may not even have a pointwise limit.
Proposition 6
There exists a uniformly Cauchy sequence of LAV mappings that has no pointwise limit; in particular, it has no uniform limit either.
Proof 10
We first show that the sequence\((\mathcal {M}_{n})_{n\geq 1}\) is uniformly Cauchy. Let k ≥ 1. We claim that if we take n _{0} = k, then for every source instance I, for every n, m ≥ n _{0}, and every q ∈C Q _{ k }, we have that \(cert(q, I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M}_{m})\). To see this, note that for every source instance I and for every t ≥ 1, the universal solutions of I w.r.t. \(\mathcal {M}_{t}\) have active domains consisting entirely of labeled nulls. Hence, only boolean queries may return a nonempty result. Moreover, observe that these universal solutions have no selfloops, i.e., they contain no atoms of the form F(v, v) for some labeled null v.
We now distinguish two cases: First, suppose that q ∈C Q _{ k } is a boolean conjunctive query which contains a “selfloop”, i.e., an atom of the form F(z, z) for some variable z. Then we clearly have \(cert(q, I, \mathcal {M}_{n}) = \mathit {false} = cert(q, I, \mathcal {M}_{m})\). It remains to consider the case that q ∈ C Q _{ k } is a boolean C Q containing no selfloop. Then we clearly have \(cert(q, I, \mathcal {M}_{n}) = \mathit {true} = cert(q, I, \mathcal {M}_{m})\), since we are assuming that m, n ≥ k holds.
Using an argument similar to the one in the proof of Proposition 2, we now show that the sequence \((\mathcal {M}_{n})_{n\geq 1}\) has no pointwise limit. Towards a contradiction, assume that \((\mathcal {M}_{n})_{n\geq 1}\) does have a pointwise limit \(\mathcal {M}\). Let I be a nonempty source instance. We consider three cases.
First, assume that \(\text {Sol}(I,\mathcal {M})\) is empty. Then, for every boolean conjunctive query q, it holds trivially that \(cert(q,I,\mathcal {M}) = \mathit {true}\). This is, in particular, the case for the query q = ∃z F(z, z), which asks for the existence of a selfloop. However, for this query q, we have that \(cert(q, I, \mathcal {M}_{n}) = \mathit {false}\) for every n ≥ 1.
Second, assume that \(\text {Sol}(I,\mathcal {M})\) is nonempty and that all solutions \(J \in \text {Sol}(I,\mathcal {M})\) contain a selfloop. For the query q = ∃z F(z, z) as above, we again have \(cert(q, I, \mathcal {M}) = {\mathit {true}}\), whereas \(cert(q, I, \mathcal {M}_{n}) = \mathit {false}\), for every n ≥ 1.
The next difference is the definability of uniform limits. In Section 4, we saw that if a sequence of GAV mappings has a uniform limit, then it is eventually constant, hence it has a GAV mapping as a uniform limit. This property need not hold for sequences of LAV mappings (hence, it need not hold for sequences of premisebounded schema mappings).
Proposition 7
There exists a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of LAV mappings that has a uniform limit, but no uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\) admits universal solutions. In particular, no SO tgd is a uniform limit of the sequence \((\mathcal {M}_{n})_{n\geq 1}\).
Proof 11
Let n _{0} = m. Since each \(\mathcal {M}_{n}\) has solutions consisting entirely of nulls, it suffices to consider boolean C Q s only. Let q be a boolean C Q with m variables and assume that \(cert(q, I, \mathcal {M}_{n})= true\), where n ≥ m. This implies that there is a homomorphism from the body of q into P _{ n }, where P _{ n } is the simple path with n nodes. In turn, this implies that C _{ k }⊧q, for every k. Thus, \(cert(q, I, \mathcal {M}) = \mathit {true}\) as well. In the other direction, assume that \(cert(q, I, \mathcal {M}) = \mathit {true}\). Note that q cannot contain a directed cycle, since no directed cycle can be mapped homomorphically in every cycle of length greater than one. Let h be a homomorphism from the body of q into C _{ m+1}. Since q ∈C Q _{ m }, the variables of q have at most m distinct images among the nodes of C _{ m+1}. This means that \(\tilde C_{m+1} \models q\), where \(\tilde C_{m+1}\) is obtained from C _{ m+1} by removing the facts that contain at least one element that is not the image of one of the variables of q under h. Note that \(\tilde C_{m+1}\) has at least one fact less than C _{ m+1}, and so it is a collection of simple paths of length at most m; therefore, there is a homomorphism from \(\tilde C_{m+1}\) to P _{ n }, hence P _{ n }⊧q.
Part 2. For the second part of the claim and towards a contradiction, assume that \(\mathcal {M}^{\prime }\) is a uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\) such that there exists a nonempty source instance I and a finite universal solution J for I w.r.t. \(\mathcal {M}^{\prime }\). Note that for every i, we have that \(cert(\exists P_{i}, I, \mathcal {M}^{\prime }) = \mathit {true}\), because \(\mathcal {M}^{\prime }\) is a (uniform and, hence also pointwise) limit of the sequence \((\mathcal {M}_{n})_{n\geq 1}\). Then we also have that J⊧∃P _{ i }, since J is universal. Since J is finite, this is possible only if J contains a directed cycle.
We can now derive a contradiction as follows. For each positive integer l, let ∃C _{ l } be the boolean conjunctive query asserting the existence of a cycle of length l. Then there is no n such that \(cert(\exists C_{l}, I, \mathcal {M}_{n}) = \mathit {true}\). Thus, \(cert(\exists C_{l}, I, \mathcal {M}^{\prime }) = \mathit {false}\) must hold for every l, since \(\mathcal {M}^{\prime }\) is a limit of \((\mathcal {M}_{n})_{n\geq 1}\). Hence, J cannot contain cycles.
Since every SO tgd admits universal solutions, it follows that no SO tgd is a (uniform or pointwise) limit of \((\mathcal {M}_{n})_{n\geq 1}\). □
By Theorem 1, every SO tgd is the uniform limit of a sequence of GLAV mappings. Proposition 7 implies that the converse is false, even for sequences of LAV mappings.
In the previous section, we showed that a sequence of GAV mappings has a GAV mapping as a pointwise limit if and only if it has a pointwise limit that allows for C Qrewriting. Is there some structural property that characterizes when a sequence of premisebounded GLAV mappings has a GLAV mapping as a pointwise limit (which, for premisebounded mappings, is the same as a uniform limit)? We will show that the property of admitting universal solutions is the key to this question. Specifically, we have the following result.
Theorem 5
 (1)
\((\mathcal {M}_{n})_{n\geq 1}\) has a GLAV mapping \(\mathcal {M}\) as a uniform limit.
 (2)
\((\mathcal {M}_{n})_{n\geq 1}\) has a uniform limit that admits universal solutions.
Moreover, if \((\mathcal {M}_{n})_{n\geq 1}\) is a sequence of LAV mappings, then \((\mathcal {M}_{n})_{n\geq 1}\) has a LAV mapping as a uniform limit if and only \((\mathcal {M}_{n})_{n\geq 1}\) has a uniform limit that admits universal solutions.
We now give two lemmas which will be used in the proof of Theorem 5, but are also of interest in their own right.
Lemma 2
If \(\mathcal {M}\) is the uniform limit of a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings each of which allows for C Q rewriting, then also \(\mathcal {M}\) allows for C Q rewriting.
Proof 12
Let q be a target conjunctive query with m variables. Since \(\mathcal {M}\) is a uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\), there exists an integer n _{0} such that for every n ≥ n _{0} and every source instance I, we have that \(cert(q,I,\mathcal {M}) = cert(q,I,\mathcal {M}_{n})\). In particular, \(cert(q,I,\mathcal {M}) = cert(q,I,\mathcal {M}_{n_{0}})\). Since \(\mathcal {M}_{n_{0}}\) allows for C Qrewriting, there is a source conjunctive query q ^{′} such that \(cert(q,I,\mathcal {M}_{n_{0}}) = q^{\prime }(I)\), for every source instance I. Hence, \(cert(q,I,\mathcal {M}) = q^{\prime }(I)\) holds, for every source instance I. □
It should be noted that the conclusion of Lemma 2 does not hold, in general, if \(\mathcal {M}\) is a pointwise limit of a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings each of which allows for C Qrewriting. Indeed, if \((\mathcal {M}_{n})_{n\geq 1}\) is the sequence of GAV mappings in the proof of Proposition 5, then Theorem 3 and Proposition 5 imply that no pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\) allows for C Qrewriting.
Lemma 3
Let \(\mathcal {M}\) be a uniform limit of a sequence \((\mathcal {M}_{n})_{n\geq 1}\) of LAV mappings. If \(\mathcal {M}\) admits universal solutions, then it is closed under unions.
Proof 13
The proof proceeds through several stages and involves four claims, each of which builds on preceding ones. We first state the claims without proof and then use the last claim to show the desired conclusion. After this, we complete the proof of the lemma by proving each claim.

For ℓ ≥ 1, we define C Q ^{′} _{ ℓ } = {q ∈C Q ∣ l e n g t h(q) ≤ ℓ}, where l e n g t h(q) denotes the number of atoms in q.

We say that two schema mappings \(\mathcal {M}_{1}\) and \(\mathcal {M}_{2}\) are C Q ^{′} _{ ℓ }equivalent, denoted by \(\mathcal {M}_{1} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}_{2}\), if for every source instance I and for every \(q \in \mathsf {CQ}^{\prime }_{\ell }\), we have that \(cert(q,I,\mathcal {M}_{1}) = cert(q,I,\mathcal {M}_{2})\).

We say that \(\mathcal {M}\) is the u ^{′}limit of a sequence \((\mathcal {M}_{n})_{n\geq 1}\), denoted by \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\), if for every ℓ, there exists n _{0} such that for all n ≥ n _{0}, it holds that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\).
Claim A.
The notions of u ^{′}limit and uniform limit coincide. Formally, for every sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings and every schema mapping \(\mathcal {M}\), we have that \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\) if and only if \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\).
Next, we use the given sequence \((\mathcal {M}_{n})_{n\geq 1}\) to construct another sequence \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\) of LAV mappings that possesses some desirable properties. To define the sequence \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\), we need another claim.
Claim B.
Assume that \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\). Then, there exists a strictly increasing sequence (n _{ i })_{ i ≥ 1} of positive integers, such that for every ℓ ≥ 1 and for every n ≥ n _{ ℓ }, we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\).
Claim C.
Let (n _{ i })_{ i ≥ 1} be the strictly increasing sequence of positive integers according to Claim B and let \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\) be the sequence of LAV mappings constructed above. Then, for every ℓ ≥ 1, the following properties hold: (i) for every n ≥ n _{ ℓ }, we have that \(\mathcal {M}^{\prime }_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\); (ii) the conclusion of every LAV constraint in \(\mathcal {M}^{\prime }_{n_{\ell }}\) is of length at most ℓ.
We now make the following claim about the sequence \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\).
Claim D.
For every source instance I, there exists an integer n _{0} ≥ 1 such that for every I ^{′}⊆ I, we have that \(\text {Sol}(I^{\prime },\mathcal {M}^{\prime }_{n_{0}}) = \text {Sol}(I^{\prime },\mathcal {M})\).
Next, we use Claim D to show that \(\mathcal {M}\) is closed under unions, i.e., given \((I_{1},J_{1}) \in \mathcal {M}\) and \((I_{2},J_{2}) \in \mathcal {M}\), we must show that \((I,J) \in \mathcal {M}\) with I = I _{1} ∪ I _{2} and J = J _{1} ∪ J _{2}. From Claim D, we know that there exists n _{0} such that \(\text {Sol}(I^{\prime },\mathcal {M}^{\prime }_{n_{0}}) = \text {Sol}(I^{\prime },\mathcal {M})\), for every I ^{′}⊆ I. In particular, I _{1}, I _{2} ⊆ I. Hence, for each i ∈{1,2}, we have \(J_{i} \in \text {Sol}(I_{i}, \mathcal {M}^{\prime }_{n_{0}})\), that is, \((I_{1}, J_{1})\in \mathcal {M}^{\prime }_{n_{0}}\) and \((I_{2}, J_{2})\in \mathcal {M}^{\prime }_{n_{0}}\). Since \(\mathcal {M}^{\prime }_{n_{0}}\) is a LAV mapping, it is closed under unions. Hence, \((I,J) \in \mathcal {M}^{\prime }_{n_{0}}\), and, since \(\text {Sol}(I,\mathcal {M}^{\prime }_{n_{0}}) = \text {Sol}(I,\mathcal {M})\), we conclude that \(J \in \text {Sol}(I,\mathcal {M})\), i.e., \((I,J)\in \mathcal {M}\).
To complete the proof of the lemma, it remains to prove Claims AD.
Claim A.
The notions of u ^{′}limit and uniform limit coincide. Formally, for every sequence \((\mathcal {M}_{n})_{n\geq 1}\) of schema mappings and every schema mapping \(\mathcal {M}\), we have that \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\) if and only if \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\). (⇒) Assume \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\). We have to show that also \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\) holds. Consider an arbitrary ℓ ≥ 1 and let r be the maximal arity of the target schema of \(\mathcal {M}\). Any conjunctive query with at most ℓ atoms can have at most m = ℓ ⋅ r variables. Hence, the inclusion C Q ^{′} _{ ℓ } ⊆C Q _{ m } holds.
We are assuming \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\). Hence, there exists n _{0}(m) such that for all n ≥ n _{0}(m), we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}_{m}} \mathcal {M}\). That is, for all q ∈C Q _{ m } and for all I, it holds that \(cert(q, I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M})\). Since C Q ^{′} _{ ℓ } ⊆C Q _{ m }, we may conclude that for all q ∈C Q ^{′} _{ ℓ } and for all I, it holds that \(cert(q, I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M})\). Hence, \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\) indeed holds.
(⇐) Assume \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\). We have to show that also \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\) holds. Consider an arbitrary m ≥ 1. As above, let r be the maximal arity of the target schema of \(\mathcal {M}\). Moreover, let p be the number of target relation symbols. Any conjunctive query with at most m variables can have at most ℓ = p ⋅ m ^{ r } atoms. Hence, the inclusion C Q _{ m } ⊆C Q ^{′} _{ ℓ } holds.
We are assuming \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\). Hence, there exists n _{0}(ℓ) such that for all n ≥ n _{0}(ℓ), we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\). That is, for all \(q \in \mathsf {CQ}^{\prime }_{\ell }\) and for all I, it holds that \(cert(q, I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M})\). Since C Q _{ m } ⊆C Q ^{′} _{ ℓ }, we may conclude that for all q ∈C Q _{ m } and for all I, it holds that \(cert(q, I, \mathcal {M}_{n}) = cert(q, I, \mathcal {M})\). Hence, \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\) indeed holds.
Claim B.
Assume that \(\mathcal {M}_{n} \stackrel {u}{\longrightarrow } \mathcal {M}\). Then, there exists a strictly increasing sequence (n _{ i })_{ i ≥ 1} of positive integers, such that for every ℓ ≥ 1 and for every n ≥ n _{ ℓ }, we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\). Since \(\mathcal {M}_{n} \stackrel {u^{\prime }}{\longrightarrow } \mathcal {M}\), for each ℓ ≥ 1 there exists an integer \(n^{\prime }_{\ell }\) such that for all \(n \geq n^{\prime }_{\ell } \), we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\). We may choose n _{ ℓ } as follows to ensure strict monotonicity: \(n_{1} \text {:=}\, n^{\prime }_{1}\)
…\(n_{\ell } \text {:=}\, \max (n_{\ell 1} + 1, n^{\prime }_{\ell })\)Then the sequence (n _{ i })_{ i ≥ 1} is strictly increasing and for all ℓ ≥ 1 and for all n ≥ n _{ ℓ }, we have that \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\).
Claim C.
Let (n _{ i })_{ i ≥ 1} be the strictly increasing sequence of positive integers according to Claim B and let \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\) be the sequence of LAV mappings constructed above. Then, for every ℓ ≥ 1, the following properties hold: (i) for every n ≥ n _{ ℓ }, we have that \(\mathcal {M}^{\prime }_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\); (ii) the conclusion of every LAV constraint in \(\mathcal {M}^{\prime }_{n_{\ell }}\) is of length at most ℓ. Consider an arbitrary ℓ ≥ 1. By the construction of the sequence \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\), every LAV constraint in \(\mathcal {M}^{\prime }_{n}\) has a conclusion of length at most ℓ. Hence, property (ii) clearly holds.
To prove property (i), consider an arbitrary n ≥ n _{ ℓ }. We have to show that \(\mathcal {M}^{\prime }_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\), i.e., for arbitrary source instance I and arbitrary conjunctive query q ∈C Q ^{′} _{ ℓ }, we have to show that \(cert(q, I, \mathcal {M}^{\prime }_{n}) = cert(q, I, \mathcal {M})\). By Claim B, we have \(\mathcal {M}_{n} \equiv _{\mathsf {CQ}^{\prime }_{\ell }} \mathcal {M}\). Hence, it suffices to show that \(cert(q, I, \mathcal {M}^{\prime }_{n}) = cert(q, I, \mathcal {M}_{n})\) holds. We prove the two inclusions separately.
By the construction of \(\mathcal {M}^{\prime }_{n}\), we clearly have \(chase(I,\mathcal {M}^{\prime }_{n}) \to chase(I,\mathcal {M}_{n})\). From this, it follows immediately that \(cert(q,I,\mathcal {M}^{\prime }_{n}) \subseteq cert(q,I,\mathcal {M}_{n})\).
For the reverse inclusion, consider an arbitrary tuple \(\mathbf {a} \in cert(q, I,\mathcal {M}_{n})\). Then, there exists a homomorphism \(h_{n} \colon q \to chase(I,\mathcal {M}_{n})\) with h(z) = a, where z denotes the free variables of q. Let \(h_{n}(q) = \{A_{1}, {\ldots } A_{k}\} \subseteq chase(I,\mathcal {M}_{n})\) with k ≤ ℓ. By construction, \(\mathcal {M}^{\prime }_{n}\) is obtained by restricting the conclusions of the LAV constraints \(\tau \in \mathcal {M}_{n}\) in all possible ways to at most ℓ atoms. Hence, since k ≤ ℓ, we have that also \(chase(I,\mathcal {M}^{\prime }_{n})\) contains the set {A _{1},…A _{ k }} of atoms (up to renaming of labeled nulls). Thus, there exists a homomorphism \(h \colon \{A_{1}, {\ldots } A_{k}\} \to chase(I,\mathcal {M}^{\prime }_{n})\) and h(h _{ n }(⋅)) is a homomorphism \(q \to chase(I,\mathcal {M}^{\prime }_{n})\) with h(z) = a. Therefore, \(\mathbf {a} \in cert(q, I,\mathcal {M}^{\prime }_{n})\) holds.
Before presenting the proof of Claim D, we need to bring the notion of fact block size into the picture; this notion was introduced in [7].
Fact Blocks. Let J be an instance. The Gaifman graph of facts G _{ J } of J is the graph whose nodes are the facts of J and there is an edge between two facts if they have a null in common. The fact blocks (or fblocks) of J are the sets of nodes of the connected components of G _{ J }. The block size of an undirected graph G is the size of the maximal connected component of G _{ J }, where the size of a component is given as the number of nodes. The fact block size (fblock size) of an instance J is the block size of the Gaifman graph of facts of J.
Claim D.

(i) Let u = (u _{1},…, u _{ i }) in J denote the labeled nulls in J and let y = (y _{1},…, y _{ i }) denote a vector of pairwise distinct variables. Consider the boolean conjunctive query ∃y q _{ J } whose atoms are the atoms in J where we instantiate the labeled nulls u = (u _{1},…, u _{ i }) with y = (y _{1},…, y _{ i }). Clearly q _{ J } → J holds and, therefore, also \(cert(\exists \mathbf {y}\, q_{J}, I^{\prime },\mathcal {M}) = true\).

(ii) For every fblock F ^{′} of J ^{′}, we consider the boolean conjunctive query \(\exists \mathbf {z}\, q_{F^{\prime }}\) whose atoms are the atoms in F ^{′} and z = (z _{1},…, z _{ i }) instantiates the labeled nulls v = (v _{1},…, v _{ i }) in F ^{′} with pairwise distinct variables. Clearly, for every F ^{′}, we have \(q_{F^{\prime }} \to J^{\prime }\) and, therefore, also \(cert(\exists \mathbf {z}\, q_{F^{\prime }}, I^{\prime },\mathcal {M}) = true\).

(iii) Finally, we show that \(\text {Sol}(I^{\prime },\mathcal {M}^{\prime }_{n_{0}}) = \text {Sol}(I^{\prime },\mathcal {M})\) holds.
“ ⊆”: Let \(K \in \text {Sol}(I^{\prime },\mathcal {M}^{\prime }_{n_{0}})\). Since J ^{′} is a universal solution for I ^{′} w.r.t. \(\mathcal {M}^{\prime }_{n_{0}}\), there exists a homomorphism g ^{′} : J ^{′}→ K. By composing g ^{′} with the homomorphism h : J → J ^{′}, we obtain a homomorphism from J to K. By the closure under target homomorphisms, we conclude that \(K \in \text {Sol}(I^{\prime },\mathcal {M})\)
“ ⊇”: Now let \(K \in \text {Sol}(I^{\prime },\mathcal {M})\). Since J is a universal solution for I ^{′} w.r.t. \(\mathcal {M}\), there exists a homomorphism g : J → K. By composing g with the homomorphism h ^{′} : J ^{′}→ J, we obtain a homomorphism from J ^{′} to K. Since LAV mapping \(\mathcal {M}^{\prime }_{n_{0}}\) is closed under target homomorphisms, we conclude that \(K \in \text {Sol}(I^{\prime },\mathcal {M}^{\prime }_{n_{0}})\).
The proof of Lemma 3 is now complete.
□
We now have all the tools needed to present the proof of Theorem 5. Before doing so and for the sake of readability, we reproduce its statement.
 (1)
\((\mathcal {M}_{n})_{n\geq 1}\) has a GLAV mapping \(\mathcal {M}\) as a uniform limit.
 (2)
\((\mathcal {M}_{n})_{n\geq 1}\) has a uniform limit that admits universal solutions.
Moreover, if \((\mathcal {M}_{n})_{n\geq 1}\) is a sequence of LAV mappings, then \((\mathcal {M}_{n})_{n\geq 1}\) has a LAV mapping as a uniform limit if and only \((\mathcal {M}_{n})_{n\geq 1}\) has a uniform limit that admits universal solutions.
Proof 14 (Proof of Theorem 5)
The direction (1) ⇒ (2) is obvious. For the direction (2) ⇒ (1), we start with the case when \((\mathcal {M}_{n})_{n\geq 1}\) is a sequence of LAV mappings.
 1.
\(\mathcal {M}\) allows for C Qrewriting (by Lemma 2);
 2.
\(\mathcal {M}\) admits universal solutions (by hypothesis);
 3.
\(\mathcal {M}\) is closed under target homomorphisms (by hypothesis);
 4.
\(\mathcal {M}\) is closed under unions (by Lemma 3).
Theorem 3.1 in [19] asserts that if a schema mapping admits universal solutions, allows for query rewriting, and is closed under both target homomorphisms and unions, then it is logically equivalent to a LAV mapping. Consequently, we have that \(\mathcal {M}\) is logically equivalent to a LAV mapping.For the case when \((\mathcal {M}_{n})_{n\geq 1}\) is a sequence of premisebounded GLAV mappings (but not necessarily LAV mappings), we apply yet another structural characterization of GLAV mappings from [19], namely, Theorem 3.9, which asserts that if a schema mapping allows for C Qrewriting, admits universal solutions, is closed under target homomorphisms, and is nmodular, for some fixed n, then it is logically equivalent to a GLAV mapping.
Let k be the constant bounding the length of premises in \((\mathcal {M}_{n})_{n\geq 1}\). We proceed exactly as in the proof of Lemma 3 and construct a sequence \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\), in which the premises of tgds are the same as in tgds in \((\mathcal {M}_n)_{n\geq 1}\), hence each tgd in \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\) has at most k atoms in its premise. We proceed exactly as in the proof of Lemma 3 to establish the following analog of Claim D.Claim D (in the proof of Lemma 3) For every source instance I, there exists an integer n _{0} ≥ 1 such that for every I ^{′}⊆ I , we have that \(\text {Sol}(I^{\prime },\mathcal {M}^{\prime }_{n_{0}}) = \text {Sol}(I^{\prime },\mathcal {M})\).
Now, since each tgd in every element of \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\) has at most k atoms in its premise, it follows that there is a positive integer N _{ k } so that each mapping \(\mathcal {M}^{\prime }_{n}\) in \((\mathcal {M}^{\prime }_{n})_{n\geq 1}\) is N _{ k }modular. It is easy to see that N _{ k } ≤ k ⋅ r holds where r is the maximum relation arity in the source schema.
We now prove that \(\mathcal {M}\) is N _{ k }modular. Assume that J is not a solution for I w.r.t. to \(\mathcal {M}\). Take an integer n _{0} as in Claim D and consider \(\mathcal {M}^{\prime }_{n_{0}}\). It follows that J is not a solution for I w.r.t. \(\mathcal {M}^{\prime }_{n_{0}}\). Since \(\mathcal {M}^{\prime }_{n_{0}}\) is N _{ k }modular, there is a subinstance I ^{′} of I such that J is not a solution for I ^{′} w.r.t. \(\mathcal {M}^{\prime }_{n_{0}}\) and d o m(I ^{′})≤ N _{ k }. Again by Claim D, we have that J is not a solution for I ^{′} w.r.t. \(\mathcal {M}\), hence M is N _{ k }modular.
Thus, \(\mathcal {M}\) has the following properties: it admits C Qrewriting (since it is the uniform limit of GLAV mappings that admit C Qrewriting), it admits universal solutions, is closed under target homomorphisms (if it is not, we take its closure before we begin the construction), and, as just shown, it is N _{ k }modular. Consequently, by Theorem 3.9 in [19], we have that \(\mathcal {M}\) is logically equivalent to a GLAV schema mapping, which completes the proof. □
We conclude this section with a conjecture concerning uniform limits of arbitrary sequences of GLAV mappings.
Conjecture 1
 1.
\((\mathcal {M}_{n})_{n\geq 1}\) has an SO tgd as a uniform limit.
 2.
\((\mathcal {M}_{n})_{n\geq 1}\) has a uniform limit that admits universal solutions.
It is not hard to show that the preceding conjecture is implied by a conjecture in [2] to the effect that the language of plain SOtgds^{2} can be characterized by the following three properties: allowing for C Qrewriting, admitting universal solutions, and closure under target homomorphisms.
6 Metric Space Completion and Generalized Schema Mappings
Let T be a schema containing a binary relation symbol. By Proposition 2, the metric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\) is not complete, i.e., there are Cauchy sequences of elements of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\) that have no limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). Let \((\mathcal {P}(\text {Inst}(\mathbf {T}))^{*},dist^{*})\) be the completion of \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\). As described in Section 2, the elements of \(\mathcal {P}(\text {Inst}(\mathbf {T}))^{*}\) are the equivalence classes of Cauchy sequences of elements of \(\mathcal {P}(\text {Inst}(\mathbf {T}))\), where two Cauchy sequences \(\mathcal {I}_{1},\mathcal {I}_{2},\ldots \) and \(\mathcal {J}_{1},\mathcal {J}_{2},\ldots \) are equivalent if \(\lim \limits _{n\to \infty } dist(\mathcal {I}_{n},\mathcal {J}_{n}) = 0\). Clearly, this is a rather abstract description of \(\mathcal {P}(\text {Inst}(\mathbf {T}))^{*}\). In this section we show that, in many cases, the elements of \(\mathcal {P}(\text {Inst}(\mathbf {T}))^{*}\) can be represented by suitably constructed infinite Tinstances. In turn, this result and basic results about complete metric spaces imply that the (pointwise or uniform) limits of a Cauchy sequence of schema mappings can be represented by a generalized schema mapping, that is, a schema mapping in which infinite solutions are allowed. We also establish a tight connection between these results and the representation of structural limits in the monograph by Nešetřil and Ossona de Mendez [15].
6.1 Representing Limits of Cauchy Sequences in the Metric Completion
Let T be a schema. Recall that, by definition, a Tinstance is a finite set of facts. In what follows, we will also consider infinite Tinstances, where, by definition, an infinite Tinstance is an infinite set I of facts R _{ i }(t _{1},…, t _{ m }). The term Tinstance will continue to denote a finite Tinstance, but, at times and for emphasis or disambiguation, we will also use the term finite Tinstance, especially in contexts in which infinite Tinstances are also considered. According to Definitions 2 and 3, the notion of the distance between two sets of finite instances has been defined using the notion of C Q _{ n }equivalence, where two sets \(\mathcal {J}\) and \(\mathcal {J}^{\prime }\) of finite Tinstances are C Q _{ n }equivalent, denoted \(\mathcal {J} \equiv _{\mathsf {CQ}_{n}} \mathcal {J}^{\prime }\), if it holds that \(cert(q,\mathcal {J}) = cert(q,{\mathcal {J}^{\prime }})\), for all q ∈C Q _{ n }. The notion of C Q _{ n }equivalence naturally extends to arbitrary (i.e., finite or infinite) Tinstances. Hence, also the notions of similarity and distance, both of which were defined via C Q _{ n }equivalence, immediately carry over to sets of arbitrary Tinstances. Furthermore, the set of sets of arbitrary Tinstances forms a pseudometric space, in which we can speak about Cauchy sequences and limits.
Definition 8
 Let \(\mathcal {X}\) and \(\mathcal {Y}\) be two sets of finite Tinstances. We say that \(\mathcal {Y}\) is an isomorphic copy of \(\mathcal {X}\) with nulls named apart if
 1.
For every member J of \(\mathcal {X}\), there is a member J ^{′} of \(\mathcal {Y}\) that is an isomorphic copy of J via an isomorphism that renames nulls.
 2.
Every member J ^{′} of \(\mathcal {Y}\) is an isomorphic copy of some member J of \(\mathcal {X}\) via an isomorphism that renames nulls.
 3.
No two members of \(\mathcal {Y}\) have nulls in common.
 1.

If \(\mathcal {Y}\) is a set of finite T instances, then \(\bigcup {\mathcal {Y}}\) denotes the union of all members of \(\mathcal {Y}\) (where each member of \(\mathcal {Y}\) is viewed as a set of facts).
 If \(\mathcal {X}\) is a set of finite Tinstances, then \(\bigoplus {\mathcal {X}}\) denotes the set consisting of the unions of isomorphic copies of \(\mathcal {X}\) with nulls named apart, i.e.,$$\bigoplus {\mathcal{X}} = \left\{ \bigcup {\mathcal{Y}}\mid \mathcal{Y}\text{ is an isomorphic copy of }\mathcal{X}\text{ with nulls named apart}\right\}.$$

Let \(\mathcal {Y}\) be a set of finite Tinstances. Clearly, if \(\mathcal {Y}\) is finite, then \(\bigcup {\mathcal {Y}}\) is a finite Tinstance, while if \(\mathcal {Y}\) is infinite, then \(\bigcup {\mathcal {Y}}\) is an infinite Tinstance. Note also that if \(\mathcal {X}\) is a set of finite Tinstances such that at least one instance in \(\mathcal {X}\) contains nulls, then \(\bigoplus {\mathcal {X}}\) is infinite (even if \(\mathcal {X}\) is a finite set).
 According to Definition 4, if J is a Tinstance whose active domain contains nulls only, then v(J) is the set of all Tinstances that are isomorphic copies of J via an isomorphism that renames nulls. This notation makes sense also for infinite Tinstances J whose active domains contain nulls only. With this in mind, observe that if \(\mathcal {X}\) is a finite set of Tinstances and if \(\mathcal {Y}\) is an isomorphic copy of \(\mathcal {X}\) with nulls named apart, then$$\bigoplus {\mathcal{X}}= v(\bigcup {\mathcal{Y}}).$$

As a concrete example, if \({\mathcal {K}} = \{K_{n} \mid n\geq 1\}\), where K _{ n } is a clique of size n in which every node is a null, then the members of \(\bigoplus {\mathcal {K}}\) are precisely the disjoint unions of cliques of all finite sizes in which every node is a null.
Definition 9
Let q be a conjunctive query over the schema T with k free variables, k ≥ 0, and let a be a ktuple of constants (if k = 0, then a = (), i.e., a is the empty tuple).
We write q(a) to denote the Tinstance J obtained from q and a by (i) substituting the free variables of q by the respective elements of a; (ii) replacing the existential variables of q by fresh distinct labeled nulls; and (iii) treating the resulting body atoms of q as facts of the Tinstance J.
Note that if q is a boolean query (in which case a = ()), then q(()) is the canonical database of q, i.e., the Tinstance whose active domain is the set of variables of q viewed as distinct nulls and whose facts are the atoms of q. Conversely, every Tinstance J whose active domain consists entirely of nulls is the canonical database of a boolean conjunctive query.
Before stating the main result of this section, we need to introduce one more concept. Let \(\mathcal {J}\) be a set of finite or infinite Tinstances. We say that \(\mathcal {J}\) is closed under isomorphisms that rename nulls if for every (finite or infinite) Tinstance J in \(\mathcal {J}\) and for every (finite or infinite) Tinstance J ^{′} that is an isomorphic copy of J via an isomorphism that renames nulls, we have that J ^{′} is also in \(\mathcal {J}\). Note that if \(\mathcal {X}\) is a set of finite Tinstances, then \(\bigoplus {\mathcal {X}}\) is closed under isomorphisms that rename nulls. Moreover, if \({\mathcal {M}}\) is a schema mapping between S and T, then, for every source instance I, the set \(\text {Sol}(I,{\mathcal {M}})\) of the solutions of I w.r.t. \(\mathcal {M}\) is closed under isomorphisms that rename nulls (see Definition 1).
Theorem 6
Proof 15

Step 1: We will show that, for every m ≥ 1, there is some n _{1} such that \(cert(q,\mathcal {J}_{n}) \subseteq cert(q,\bigoplus \mathcal {J}^{*})\), for every q ∈C Q _{ m } and every n ≥ n _{1}.

Step 2: We will show that, for every m ≥ 1, there is some n _{2} such that \( cert(q,\bigoplus \mathcal {J}^{*})\subseteq cert(q,\mathcal {J}_{n})\), for every q ∈C Q _{ m } and every n ≥ n _{2}.
We start by pointing out that for every n ≥ 1 and every q ∈C Q, the certain answers \(cert(q,\mathcal {J}_{n})\) consist entirely of nullfree tuples. This follows from the assumption that \(\mathcal {J}_{n}\) is closed under isomorphisms that rename nulls (the proof is essentially the same as the proof of Proposition 1 in Section 2). Moreover, for every q ∈C Q, the certain answers \(cert(q,\bigoplus \mathcal {J}^{*})\) also consist entirely of nullfree tuples. This is so because \(\bigoplus \mathcal {J}^{*}\) contains isomorphic copies of \(\mathcal {J}^{*}\) having no nulls in common (e.g., if v _{1},…, v _{ n },… is a list of all nulls, then \(\bigoplus \mathcal {J}^{*}\) contains an isomorphic copy of \(\mathcal {J}^{*}\) in which all nulls have even index and an isomorphic copy of \(\mathcal {J}^{*}\) in which all nulls have odd index). Thus, we only need to focus on tuples of constants as possible certain answers.
To prove Step 1, since the sequence \((\mathcal {J}_{n})_{n\geq 1}\) is Cauchy, for every m ≥ 1, there is some n _{1} such that if s ≥ n _{1} and t ≥ n _{1}, then \(\mathcal {J}_{s} \equiv _{\mathsf {CQ}_{m}} \mathcal {J}_{t}\). We now claim that \(cert(q,\mathcal {J}_{n}) \subseteq cert(q,\bigoplus \mathcal {J}^{*})\), for every q ∈C Q _{ m } and every n ≥ n _{1}. Indeed, assume that q ∈C Q _{ m } and let a be a (possibly empty) tuple of constants in \(cert(q,\mathcal {J}_{n})\), where n ≥ n _{1}. It follows that \(\mathbf {a} \in cert(q,\mathcal {J}_{j})\), for every j ≥ n _{1}, hence the finite Tinstance q(a) is in the set \(\mathcal {J}^{*}\). Consequently, \(\mathbf {a} \in q(\bigcup {\mathcal {Y}})\), for every isomorphic copy \(\mathcal {Y}\) of \(\mathcal {J}^{*}\) with nulls named apart, which implies that \(\mathbf {a} \in cert(q,\bigoplus \mathcal {J}^{*})\).
To prove Step 2, we will first show that the set D of constants occurring in \( \mathcal {J}^{*}\) is finite (note that D is also the set of constants occurring in \(\bigoplus \mathcal {J}^{*}\)). As a stepping stone, we will show the finiteness of a set D ^{′} that is defined next.
A singleatom conjunctive query is a query of the form ∃y R(x, y), where R is a relation symbol in the schema T. Let D ^{′} be the set of all constants b for which there is a singleatom query q and an index p, such that b occurs in \(cert(q,\mathcal {J}_{i})\), for all i ≥ p. We claim that the set D ^{′} is finite. To see this, observe first that every singleatom query has at most r variables, where r is the maximum arity of the relation symbols in T. Since the sequence \((\mathcal {J}_{n})_{n\geq 1}\) is Cauchy, there exists an integer p _{ r } such that \(\mathcal {J}_{i} \equiv _{\mathsf {CQ}_{r}} \mathcal {J}_{p_{r}}\), for all i ≥ p _{ r }. This implies that the certain answers to singleatom conjunctive queries become fixed in \((\mathcal {J}_{n})_{n\geq 1}\) starting from the index p _{ r }, which depends only on the schema T. By definition, the certain answers hold in every instance in \(\mathcal {J}_{p_{r}}\). Since \(\mathcal {J}_{p_{r}}\) consists entirely of finite instances, the set D ^{′} must be finite as well.
To complete the proof of the finiteness of D, we will show that D ⊆ D ^{′}. Let a be a tuple of constants for which there is a conjunctive query q and an index p, such that \(\mathbf {a} \in cert(q,\mathcal {J}_{i})\), for all i ≥ p. Let s be the number of atoms of q and consider the singleatom queries \(q^{\prime }_{1}(\mathbf {y}_{i}), \dots , q^{\prime }_{s}(\mathbf {y}_{s})\) that cover q in the following sense: for every j with 1 ≤ j ≤ s, the atom of \(q^{\prime }_{j}\) is the jth atom of q, and y _{ j } contains exactly the free variables of q that occur in this atom. Let a _{ j } be the tuple of elements from a assigned to the variables y _{ j }. Clearly, every element of a is an element of some a _{ j }, 1 ≤ j ≤ s. Observe that \(\mathbf {a}_{j} \in cert(q, \mathcal {J}_{i})\) implies that \(\mathbf {a}_{j} \in cert(q^{\prime }_{j}, \mathcal {J}_{i})\), hence we have that \(\mathbf {a}_{j} \in cert(q^{\prime }_{j},\mathcal {J}_{i})\), for every i ≥ p. Thus, each element of a _{ j }, 1 ≤ j ≤ s, is an element of D ^{′}. This shows that D ⊆ D ^{′} holds, hence D is a finite set.
We now return to the proof of Step 2. We will show that for every m ≥ 1, there is some n _{2} such that \( cert(q,\bigoplus \mathcal {J}^{*})\subseteq cert(q,\mathcal {J}_{n})\), for every q ∈C Q _{ m } and every n ≥ n _{2}. Assume that q ∈C Q _{ m } and let a be a tuple of constants such that \(\mathbf {a} \in cert(q,\bigoplus \mathcal {J}^{*})\). Then, for every instance \(J \in \bigoplus \mathcal {J}^{*}\), we have that a ∈ q(J), hence there is a homomorphism h from the variables of q to the active domain ofJ such that the tuple of the free variables of q is mapped to a and the atoms of q are mapped to facts ofJ. Let s be the number of atoms of q and let f _{1},…, f _{ s } be the facts ofJ that are the images of the atoms of q under the homomorphism h. Up to renaming nulls, each fact f _{ j } is a fact of some finite Tinstance of the form q _{ j }(b _{ j }), where q _{ j } is a conjunctive query and b _{ j } is a tuple of constants such that \(\mathbf {b}_{j} \in cert(q_{j},\mathcal {J}_{i})\), for all sufficiently large i. Let n _{ q(a)} be an index such that for every i ≥ n _{ q(a)}, we have that \(\mathbf {b}_{j} \in cert(q_{j},\mathcal {J}_{i})\) holds, for 1 ≤ j ≤ s. Furthermore, let n _{2} be the maximum such index n _{ q(a)}, for all q in C Q _{ m } and for all tuples a in D. Such an index exists (i.e., it is a finite number) because both the set C Q _{ m } and the set of tuples of elements D of length at most m is finite.
Observe that n _{2} has been chosen so that for every tuple a and for every q ∈C Q _{ m } with a homomorphism h mapping q(a) to some instance in \(\bigoplus \mathcal {J}^{*}\) (and thus to every instance in \(\bigoplus \mathcal {J}^{*}\), by renaming the nulls in the codomain of h accordingly), every fact f _{ j } in h(q(a)) can be mapped further to every instance \(\mathcal {J}_{n}\), n ≥ n _{2}, via a homomorphism h _{ i } defined on the entire fblock of f _{ j }. (Recall that, by the definition of \(\bigoplus \mathcal {J}^{*}\), each fact f _{ j } instantiates an atom of some conjunctive query q _{ j } whose certain answers persist in the sequence \((\mathcal {J}_{n})_{n\geq 1}\); the bodies of these queries are mapped into instances of \(\bigoplus \mathcal {J}^{*}\) after renaming apart the nulls in them, thus ensuring that no two distinct queries end up in the same f block of an instance of \(\bigoplus \mathcal {J}^{*}\)).
The union of two homomorphisms h _{1}, h _{2} defined on two distinct fblocks B _{1}, B _{2} is unambiguously defined, and it is a homomorphism on the instance B _{1} ∪ B _{2}, since homomorphisms are the identity on constants and fblocks do not share nulls. Thus, for an instance \(J \in \bigoplus \mathcal {J}^{*}\) and for the image {f _{1},…, f _{ s }} of q(a) under some homomorphism h, we also have a homomorphism from q(a) to J _{ n }, n ≥ n _{2}, obtained by composing h with a union h _{1} ∪⋯ ∪ h _{ s } of homomorphisms from the fblocks of the atoms f _{1},…, f _{ s } to J _{ n }. It follows that \(\mathbf {a} \in cert(q,\mathcal {J}_{n})\), for every n ≥ n _{2}. This establishes the inclusion \(cert(q,\mathcal {J}^{*}) \subseteq cert(q,\mathcal {J}_{n})\), for n ≥ n _{2}, and completes the proof of the theorem. □
Recall the sequence (v(K _{ n }))_{ n ≥ 1} in Proposition 2, where K _{ n } is the clique of size n whose vertices are pairwise distinct labeled nulls. By Proposition 2, this sequence is Cauchy, but has no limit in \(\mathcal {P}(\text {Inst}(\mathbf {T}))\). Theorem 6 tells us how to find the limit in the complete metric space via the conjunctive queries with nonempty certain answers over all but finitely many members of the sequence. Since the instances K _{ n }, n ≥ 1, have active domains consisting entirely of nulls, Lemma 1 tells us that we only need to consider boolean conjunctive queries and, moreover, it suffices to evaluate them on each K _{ n }. These queries can only use the edge relation E, thus they can be considered as graphs  with the variables representing the vertices. If a query contains a selfloop (i.e., an atom of the form E(z, z) for some variable z), then the query evaluates to false over every K _{ n }. On the other hand, if a query contains no selfloop, then it evaluates to true over all but finitely many instances K _{ n }. Indeed, let q be a conjunctive query without selfloop and suppose that q contains m variables. It is easy to verify that q evaluates to true over all instances K _{ n } with n ≥ m. Hence, by Theorem 6, the limit of (v(K _{ n }))_{ n ≥ 1} is \(\bigoplus {\mathcal {G}}\), where \(\mathcal {G}\) is a set of graphs with the following properties: (i) every member of \(\mathcal {G}\) is a graph with no selfloops and with labelled nulls as vertices; (ii) every graph with no selfloops is isomorphic to a graph in \(\mathcal {G}\). Clearly, \(\bigoplus {\mathcal {K}}\) is also the limit of (v(K _{ n }))_{ n ≥ 1}, where \({\mathcal {K}}\) is a set of graphs with the following properties: (i) every member of \(\mathcal {K}\) is a clique with labelled nulls as vertices; (ii) every clique is isomorphic to a graph in \({\mathcal {K}}\). Thus, the limit of (v(K _{ n }))_{ n ≥ 1} is the set consisting of all disjoint unions of cliques of all finite sizes in which every node is a null. At any rate, it is clear that infinite instances have to be used to represent the limit of (v(K _{ n }))_{ n ≥ 1}.
Next, we extend our results about limits of Cauchy sequences of instances to limits of Cauchy sequences of mappings. To this end, we first recall two basic results about complete metric spaces.
Proposition 8

If (f _{ n })_{ n ≥ 1} is a pointwise Cauchy sequence, then (f _{ n })_{ n ≥ 1} has a pointwise limit f : X → Y , where \(f(x) = \lim \limits _{n\to \infty } f_{n}(x)\) , for every x ∈ X.

If (f _{ n })_{ n ≥ 1} is a uniformly Cauchy sequence, then (f _{ n })_{ n ≥ 1} has a uniform limit. Moreover, the pointwise limit f : X → Y of (f _{ n })_{ n ≥ 1} is also the uniform limit of (f _{ n })_{ n ≥ 1}.
The proof of the first part of Proposition 8 is immediate from the definitions; the proof of the second part can be found in any standard book on metric spaces (see, e.g., Proposition 3.6.6 in [18]). In fact, the argument is essentially the same as the one given in the proof of Part 2 of Theorem 4. Note that the second part of Proposition 8 is known as the Cauchy criterion.
We are now ready to obtain concrete representations of the (pointwise or uniform) limits of Cauchy sequences of schema mappings.
Definition 10
Let S, T be two schemas. A generalized schema mapping is a set \(\mathcal {M}\) of pairs (I, J) such that I is a finite Sinstance,J is a finite or infinite Tinstance, and \(\mathcal {M}\) has the following closure property: if \((I,J)\in \mathcal {M}\) and if J ^{′} is an isomorphic copy ofJ via an isomorphism that renames nulls, then \((I,J^{\prime }) \in \mathcal {M}\).
Corollary 3

If \((\mathcal {M}_{n})_{n\geq 1}\)is a pointwise Cauchy sequence, then the schema mapping\(\mathcal {M}\)is the pointwise limit of \((\mathcal {M}_{n})_{n\geq 1}\).

If \((\mathcal {M}_{n})_{n\geq 1}\)is a uniformly Cauchy sequence, then the schema mapping\(\mathcal {M}\)is the uniform limit of \((\mathcal {M}_{n})_{n\geq 1}\).
Proof 16
The first part follows from Theorem 6 and the definitions. The second part follows from the first part and Proposition 8. □
Finally, we consider (pointwise or uniformly) Cauchy sequences of schema mappings admitting universal solutions and obtain a different representation of their limits.
Corollary 4
 1.
For every I ∈Inst(S), the sequence \((\text {UnivSol}(I, \mathcal {M}_{n}))_{n\geq 1}\) is Cauchy, and hence it has a limit \(\lim \limits _{n\to \infty }(\text {UnivSol}(I, \mathcal {M}_{n}))\) in the complete metric space \((\mathcal {P}(\text {Inst}(\mathbf {T}))^{*},dist^{*})\).
 2.The generalized schema mappingis a pointwise limitof \((\mathcal {M}_{n})_{n\geq 1}\). Moreover, if\((\mathcal {M}_{n})_{n\geq 1}\)is a uniformly Cauchysequence, then \(\mathcal {M}^{*}\)is its uniform limit.$$\mathcal{M}^{*} = \{(I, J) \mid I \in \text{Inst}(\mathbf{S}), J \in \lim\limits_{n\to \infty}(\text{UnivSol}(I, \mathcal{M}_{n}))\} $$
6.2 Connections with Representations of Structural Limits

The first main difference is that they did not distinguish two classes of domain elements (namely, constants and nulls), as we did here. As a result, in the definition of homomorphism in [15], no special treatment of constants is needed, while, in our setting, constants must always be mapped to themselves. Their notion of homomorphism coincides with ours on instances whose active domains consist of labeled nulls only. Note that this is exactly the scenario we had in Example 1 and Proposition 2, which are both inspired by results in [15].

The second main difference is that the notion of distance in [15] is between a pair of two instances, while our notion of distance is between a pair of two sets of instances. This, of course, raises the question of how the two notions compare if, in our setting, both sets are singletons. We will address this question soon.

The third main difference is that, when cast in terms of the certain answers of conjunctive queries, the notion of distance in [15] involves boolean conjunctive queries only, while ours involves all conjunctive queries (boolean and nonboolean ones).
In what follows, we recall the definition of the similarity measure and the metric from [15] and briefly sketch the approach that Nešetřil and Ossona de Mendez took in representing limits of Cauchy sequences of instances via infinite instances.
Let T be a schema and let J and J ^{′} be two Tinstances. By a slight abuse of notation, we write J → J ^{′} to denote the existence of a homomorphism from J to J ^{′} in the sense of Nešetřil and Ossona de Mendez (i.e., not distinguishing two types of domain elements). As mentioned before, if the active domains of J and J ^{′} contain nulls only, then this notion of homomorphism coincides with the one considered in the context of schema mappings and data exchange (which is the one we used here).
Definition 11

The similarity s i m _{ h }(J, J ^{′}) between J and J ^{′} is the size of the active domain of a smallest instance B such that one of the following two conditions holds: (a) B → J and \(B \nrightarrow J^{\prime }\); (b) \(B \nrightarrow J\) and B → J ^{′}. If no such finite instance B exists, we let s i m _{ h }(J, J ^{′}) = ∞.

The distance d i s t _{ h }(J, J ^{′}) between J and J ^{′} is the quantity \(dist_{h}(J,J^{\prime }) = 2^{sim_{h}(J,J^{\prime })}\).

s i m _{ h }(J, J ^{′}) = m.

m is the largest number such that J and J ^{′} satisfy the same boolean conjunctive queries with at most m − 1 variables.
How do the notions of s i m _{ h } of similarity and d i s t _{ h } of distance compare with our notions sim of similarity and dist of distance? Clearly, this comparison is meaningful only when, in our setting, we consider singletons of instances and, moreover, the active domains of these instances contain nulls only. Recall that, according to the notation introduced in Definition 4, if J is a Tinstance whose active domain contains nulls only, then v(J) is the set of all Tinstances that are isomorphic copies of J via an isomorphism that renames nulls. The next observation is a direct consequence of Definitions 3 and 11, Lemma 1, and the preceding remarks.
Proposition 9

s i m(v(J), v(J ^{′})) = s i m _{ h }(J, J ^{′}) − 1.

d i s t(v(J), v(J ^{′})) = 2 ⋅ d i s t _{ h }(I, I ^{′}).
In what follows, we will write NInst(T) to denote the set of all Tinstances whose active domain consists entirely of nulls. The pair (NInst(T), d i s t _{ h }) is a pseudometric space, so a metric space can be obtained from it by passing to the equivalence classes [J] of target instances J, where [J] consists of all target instances that are homomorphically equivalent to J. As we did for the distance dist and the pseudometric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\), we will identify each equivalence class with one of its members.
Cauchy sequences and limits arising from d i s t _{ h } are called left Cauchy sequences and left limits in [15]. Proposition 9 implies that if (J _{ n })_{ n ≥ 1} is a sequence of elements of NInst(T), then (J _{ n })_{ n ≥ 1} is Cauchy with respect to the distance d i s t _{ h } if and only if the sequence (v(J _{ n }))_{ n ≥ 1} is Cauchy with respect to the distance dist. If (J _{ n })_{ n ≥ 1} is a sequence of elements of NInst(T), then we will write \(\lim \limits ^{h}_{n\to \infty } J_{n}\) for the limit of the sequence (J _{ n })_{ n ≥ 1} in the metric completion \((\text {NInst}(\textbf {T})^{*},dist^{*}_{h})\) of the space (NInst(T), d i s t _{ h }). Nešetřil and Ossona de Mendez obtained representations of the left limits of Cauchy sequences of instances by an approach that is based on the homomorphism preorder on instances and on ideals of partial orders.

If (J _{ n })_{ n ≥ 1} and (L _{ n })_{ n ≥ 1} are two Cauchy sequences from NInst(T), then \(\lim \limits _{n\to \infty }^{h} L_{n} \leq _{h}^{*} \lim \limits _{n\to \infty }^{h} J_{n}\) if for every m, there is a positive integer p such that for every i ≥ p, we have that \(\min \{B: B\to L_{i}~\text {and}~B \nrightarrow J_{i}\} \geq m\).

As a special case, it is easy to see that if L is an element of NInst(T) and (J _{ n })_{ n ≥ 1} is a Cauchy sequence from NInst(T), then \(L \leq _{h}^{*} \lim \limits _{n\to \infty }^{h} J_{n}\) holds if and only if there is a positive integer p such that for every i ≥ p, we have that L → J _{ i } (this is the special case of \(\lim \limits _{n\to \infty }^{h} L_{n} \leq _{h}^{*} \lim \limits _{n\to \infty }^{h} J_{n}\) in which L _{ n } = L, for all n).

A downset is a subset F of X with the property that for all x ∈ F and y ≤ x, also y ∈ F holds.

An ideal is a downset F with the additional property that for all x and y in F, there exists z in F such that both x ≤ z and y ≤ z hold.
In [15], it is shown that there is a correspondence between left limits of Cauchy sequences from NInst(T) and ideals in the partial order (NInst(T),≤_{ h }). Before presenting this correspondence, we need to introduce a piece of notation.
If \({\mathcal {X}}\) is a set of Tinstances, then the disjoint union \(\biguplus {\mathcal {X}}\) is the set \(\bigcup {\mathcal {Y}}\), where \({\mathcal {Y}}\) is an isomorphic copy of \({\mathcal {X}}\) with nulls named apart. In other words, \(\biguplus {\mathcal {X}}\) is the union of copies of all elements of \(\mathcal {X}\) (one copy of each element of \(\mathcal {X}\)) so that no two members in the union have nulls in common. Clearly, \(\biguplus {\mathcal {X}}\) is unique up to isomorphisms that rename nulls.
Proposition 10 (15)

There is a bijection \(\mathcal {F}\) between NInst(T)^{∗} and the set of ideals of (NInst(T),≤_{ h })given by \({\mathcal {F}}(\lim \limits _{n\to \infty }^{h}J_{n}) = \{L \in \text {NInst}(\textbf {T})\mid L \leq _{h}^{*} \lim \limits _{n\to \infty }^{h}J_{n}\}.\)
 If \(\lim \limits _{n\to \infty }^{h}J_{n}\) is the left limit of a Cauchy sequence (J _{ n })_{ n ≥ 1} from NInst(T), then \(\lim \limits _{n\to \infty }^{h}J_{n}\) can be represented as the disjoint union of the associated ideal \({\mathcal {F}}(\lim \limits _{n\to \infty }J_{n})\) , namely,$$\lim\limits_{n\to \infty}^{h}J_{n} = \biguplus \{L \in \text{NInst}(\textbf{T})\mid L \leq_{h}^{*} \lim\limits_{n\to \infty}^{h}J_{n}\}. $$
We now have all the conceptual and technical apparatus needed to establish a tight connection between the representations of limits given in Theorem 6 and the representation of limits given in Proposition 10.
Let {J _{ n }}_{ n ≥ 1} be a Cauchy sequence (w.r.t. the distance function d i s t _{ h }) such that each J _{ n } is a member of NInst(T), i.e., each J _{ n } is a Tinstance whose active domain consists entirely of nulls. Let \(\lim \limits _{n\to \infty }^{h}J_{n}\) be its leftlimit in the metric completion of (NInst(T), d i s t _{ h }). As discussed earlier, the sequence {v(J _{ n })}_{ n ≥ 1} is Cauchy (w.r.t. the distance function dist), so it has a limit \(\lim \limits _{n\to \infty }v(J_{n})\) in the metric completion of \((\mathcal {P}(\text {Inst}(\mathbf {T})),dist)\). The following proposition establishes the close relationship between these two limits.
Proposition 11
Proof 17
7 Concluding Remarks
In words, we have shown that, for GAV mappings, a pointwise Cauchy sequence need not be uniformly Cauchy; moreover, the existence of a pointwise limit does not imply the existence of a uniform limit. This cannot happen for LAV mappings. On the other side, a uniformly Cauchy sequence of LAV mappings need not even have a pointwise limit, which cannot happen for GAV mappings. We have also shown that structural properties of schema mappings can be used to characterize when the limit of a pointwise Cauchy sequence of GAV (or of LAV) mappings is equivalent to a GAV (or to a LAV) mapping. Finally, we have shown that infinite target instances and generalized mappings (i.e., schema mappings where target instances may be infinite) can be used to represent limits of Cauchy sequences of sets of target instances and limits of Cauchy sequences of arbitrary schema mappings.
We believe that the work reported here has laid the foundation for several interesting lines of subsequent investigations. We have seen that our results about sequences of LAV mappings extend in a natural way to sequences of premisebounded GLAV mappings; an analogous extension of our results about sequences of GAV mappings to sequences of conclusionbounded GLAV mappings is left for future work. We have also seen that there are sequences of LAV mappings for which no SO tgd is a uniform limit. Are there structural properties that characterize when a sequence of GLAV mappings has an SO tgd as a pointwise limit? In this vein, we have offered Conjecture 1. A related interesting open problem is whether schema mappings with target constraints are powerful enough to express pointwise limits or uniform limits of sequences of arbitrary GLAV schema mappings. We have some preliminary evidence that this is plausible, but much more work remains to be done.
We believe that the work reported in this paper provides a new perspective on the study of schema mappings by examining them from a dynamic viewpoint. As stated earlier, our original motivation came from schemamapping optimization and, in particular, from the idea that “complex” schema mappings can be “approximated” by “simpler” ones. It remains to be seen whether the work reported here will lead to applications to schemamapping optimization. We believe, however, that the study of the limiting behavior of schema mappings via metric spaces is interesting in its own right.
We also note there are several areas in theoretical computer science where the study of limiting behavior of objects has produced results that were significant in their own right and also had fruitful consequences. For example, starting with the work of Fagin [4], there has been an extensive investigation of the asymptotic probabilities of logical properties and of 01 laws for various logics of interest in computer science. More recently, there has been a study of profinite words, which has found applications to automata theory and to the satisfiability problem for variants of monadic secondorder logic (see, e.g., [17, 20]). Note that the profinite words form the completion of a metric space on words in which the distance is based on the size of the largest deterministic finite automaton needed to separate two words. Finally, the connection between graph limits in the monograph [15] by Nešetřil and Ossona de Mendez and the completion of the metric space \((\mathcal {P}(\text {Inst}(\mathbf {T})),d)\), which was mentioned in the previous section, may merit further exploration. It should also be pointed out that, motivated by the study of largescale networks, there has been an extensive body of work on a notion of graph limits arising from converging sequences of homomorphism densities; a detailed account of this work is given in the monograph [13] by Lovász. In addition, Nešetřil and Ossona de Mendez [16] developed a general framework for limits of graphs and relational structures; in that framework, different fragments of firstorder logic are used to define different notions of limits arising from converging sequences of the frequencies that firstorder formulas in the fragment at hand are satisfied by an assignment (homomorphism densities correspond to the fragment consisting of all quantifierfree conjunctive queries). Homomorphisms, metric completions, and representations of limits of finite structures play a central role in [13, 16].
Footnotes
 1.
Allowing for C Qrewriting means that the certain answers of every conjunctive query over the target schema is definable by a union of conjunctive queries over the source schema  see [19].
 2.
A plain SO tgd is an SO tgd that contains no nested terms and no equalities. Every SO tgd is known to be C Qequivalent to a plain one [2].
Notes
Acknowledgements
The research of Reinhard Pichler, Emanuel Sallinger, and Vadim Savenkov was supported by the Austrian Science Fund, projects (FWF):P25207N23 and (FWF):Y698, and the Vienna Science and Technology Fund, project ICT12015. The research of Phokion Kolaitis on this paper was partially supported by NSF Grant IIS1217869. The full version was completed while Kolaitis was visiting the Simons Institute for the Theory of Computing during the fall of 2016. The research of Emanuel Sallinger was supported by the EPSRC programme grant EP/M025268/1.
References
 1.Arenas, M., Barceló, P., Libkin, L., Murlak, F.: Foundations of data exchange. Cambridge University Press, Cambridge (2014)MATHGoogle Scholar
 2.Arenas, M., Pérez, J., Reutter, J., Riveros, C.: The language of plain SOtgds: composition, inversion and structural properties. J. Comput. Syst. Sci. 79 (6), 763–784 (2013)MathSciNetCrossRefMATHGoogle Scholar
 3.Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR (2003)Google Scholar
 4.Fagin, R.: Probabilities on finite models. J. Symb. Log. 41(1), 50–58 (1976)MathSciNetCrossRefMATHGoogle Scholar
 5.Fagin, R., Kolaitis, P.G.: Local transformations and conjunctivequery equivalence. In: PODS, pp 179–190 (2012)Google Scholar
 6.Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATHGoogle Scholar
 7.Fagin, R., Kolaitis, P.G., Nash, A., Popa, L.: Towards a theory of schemamapping optimization. In: PODS, pp 33–42 (2008)Google Scholar
 8.Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings: secondorder dependencies to the rescue. ACM Trans. Database Syst. 30 (4), 994–1055 (2005)CrossRefGoogle Scholar
 9.Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Schema mapping evolution through composition and inversion. In: In Schema Matching and Mapping, pp. 191–222. Springer (2011)Google Scholar
 10.Feinerer, I., Pichler, R., Sallinger, E., Savenkov, V.: On the undecidability of the equivalence of secondorder tuple generating dependencies. Inf. Syst. 48, 113–129 (2015)CrossRefGoogle Scholar
 11.Kolaitis, P.G.: Schema mappings, data exchange, and metadata management. In: PODS, pp 61–75 (2005)Google Scholar
 12.Lenzerini, M.: Data integration: a theoretical perspective. In: PODS, pp 233–246 (2002)Google Scholar
 13.Lovász, L.: Large networks and graph limits, volume 60 of colloquium publications. American Mathematical Society (2012)Google Scholar
 14.Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp 572–583 (2003)Google Scholar
 15.Nešetřil, J., de Mendez, P.O.: Sparsity  graphs, structures, and algorithms, volume 28 of algorithms and combinatorics. Springer, Berlin (2012)MATHGoogle Scholar
 16.Nešetřil, J., de Mendez, P.O.: A unified approach to structural limits, and limits of graphs with bounded treedepth. arXiv:1303.6471 (2013)
 17.Pin, J.: Profinite methods in automata theory. In: STACS, pp 31–50 (2009)Google Scholar
 18.Shirali, S., Vasudeva, H.: Metric spaces. Springer, Berlin (2006)MATHGoogle Scholar
 19.ten Cate, B., Kolaitis, P.G.: Structural characterizations of schemamapping languages. In: ICDT, pp 63–72 (2009)Google Scholar
 20.Torunczyk, S.: Languages of profinite words and the limitedness problem. In: ICALP, pp 377–389 (2012)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.