Abstract
If H is a Hilbert space (or a closed convex subspace thereof) and x 1, …, x N ∈ H, then the empirical mean \(\overline x_N=N^{-1}\sum x_i\) is the unique element of H that minimises the sum of squared distances from the x i’s.
You have full access to this open access chapter, Download chapter PDF
If H is a Hilbert space (or a closed convex subspace thereof) and x 1, …, x N ∈ H, then the empirical mean \(\overline x_N=N^{-1}\sum x_i\) is the unique element of H that minimises the sum of squared distances from the x i’s.Footnote 1 That is, if we define
then \(\theta =\overline x_N\) is the unique minimiser of F. This is easily seen by “opening the squares” and writing
The concept of a Fréchet mean (Fréchet [55]) generalises the notion of mean to a more general metric space by replacing the usual “sum of squares” with a “sum of squared distances”, giving rise to the so-called Fréchet functional. A closely related notion is that of a Karcher mean (Karcher [78]), a term that describes stationary points of the sum of squares functional, when the latter is differentiable (see Sect. 3.1.6). Population versions of Fréchet means, assuming the space is endowed with a probability law, can also be defined, replacing summation by expectation with respect to that law.
Fréchet means are perhaps the most basic object of statistical interest, and this chapter studies such means when the underlying space is the Wasserstein space \(\mathcal W_2\). In general, existence and uniqueness of a Fréchet mean can be subtle, but we will see that the nature of optimal transport allows for rather clean statements in the case of Wasserstein space.
3.1 Empirical Fréchet Means in \(\mathcal W_2\)
3.1.1 The Fréchet Functional
As foretold in the preceding paragraph, the definition of a Fréchet mean requires the definition of an appropriate sum-of-squares functional, the Fréchet functional:
Definition 3.1.1 (Empirical Fréchet Functional and Mean)
The Fréchet functi- onal associated with measures \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) is
A Fréchet mean of (μ 1, …, μ N) is a minimiser of F in \(\mathcal W_2(\mathcal X)\) (if it exists).
In analysis, a Fréchet mean is often called a barycentre. We shall use the terminology of “Fréchet mean” that is arguably more popular in statistics.Footnote 2
The factor 1∕(2N) is irrelevant for the definition of Fréchet mean. It is introduced in order to have simpler expressions for the derivatives (Theorems 3.1.14 and 3.2.13) and to be compatible with a population version \(\mathbb {E} W_2^2(\gamma ,\varLambda )/2\) (see (3.3)).
The first reference that deals with empirical Fréchet means in \(\mathcal W_2(\mathbb {R}^d)\) is the seminal paper of Agueh and Carlier [2]. They treat the more general weighted Fréchet functional
but, for simplicity, we shall focus on the case of equal weights. (If all the w i’s are rational, then the weighted functional can be encompassed in (3.1) by taking some of the μ i’s to be the same. The case of irrational w i’s is then treated with continuity arguments. Moreover, (3.3) encapsulates (3.1) as well as the weighted version when Λ can take finitely many values.)
3.1.2 Multimarginal Formulation, Existence, and Continuity
In [60], Gangbo and Świȩch consider the following multimarginal Monge– Kantorovich problem. Let μ 1, …, μ N be N measures in \(\mathcal W_2(\mathcal X)\) and let Π(μ 1, …, μ N) be the set of probability measures in \(\mathcal X^N\) having \(\{\mu ^i\}_{i=1}^N\) as marginals. The problem is to minimise
The factor 1∕(2N 2) is of course irrelevant for the minimisation and its purpose will be clarified shortly. If N = 2, we obtain the Kantorovich problem with quadratic cost. The probabilistic interpretation (as in Sect. 1.2) is that one is given random variables X 1, …, X N with marginal probability laws μ 1, …, μ N and seeks to construct a random vector Y = (Y 1, …, Y N) on \(\mathcal X^N\) such that \(X_i\stackrel {d}{=}Y_i\) and
for any other random vector Z = (Z 1, …, Z N) such that \(X_i\stackrel {d}{=}Z_i\). Intuitively, we seek a random vector with prescribed marginals but maximally correlated entries.
We refer to elements of Π(μ 1, …, μ N) (equivalently, joint laws of X 1, …, X N) as multicouplings (of μ 1, …, μ N). Just like in the Kantorovich problem, there always exists an optimal multicoupling π.
Let us now show how the multimarginal problem is equivalent to the problem of finding the Fréchet mean of μ 1, …, μ N. The first thing to observe is that the objective function can be written as
The next result shows that the Fréchet mean and the multicoupling problems are essentially the same.
Proposition 3.1.2 (Fréchet Means and Multicouplings)
Let \(\mu ^1\!,\dots ,\mu ^N\!\in \mathcal W(\mathcal X)\). Then μ is a Fréchet mean of (μ 1, …, μ N) if and only if there exists an optimal multicoupling \(\pi \in \mathcal W(\mathcal X^N)\) of (μ 1, …, μ N) such that μ = M#π, and furthermore F(μ) = G(π).
Proof
Let π be an arbitrary multicoupling of (μ 1, …, μ N) and set μ = M#π. Then (x↦x i, M)#π is a coupling of μ i and μ, and therefore
Summation over i gives F(μ) ≤ G(π) and so \(\inf F\le \inf G\).
For the other inequality, let \(\mu \in \mathcal W(\mathcal X)\) be arbitrary. For each i, let π i be an optimal coupling between μ and μ i. Invoking the gluing lemma (Ambrosio and Gigli [10, Lemma 2.1]), we may glue all π i’s using their common marginal μ. This procedure constructs a measure η on \(\mathcal X^{N+1}\) with marginals μ 1, …, μ N, μ and its relevant projection π is then a multicoupling of μ 1, …, μ N.
Since \(\mathcal X\) is a Hilbert space, the minimiser of \(y\mapsto \sum \|x_i - y\|{ }^2\) is y = M(x). Thus
In particular, \(\inf F\ge \inf G\) and combining this with the established converse inequality we see that \(\inf F=\inf G\). Observe also that the last displayed inequality holds as equality if and only if y = M(x) η-almost surely, in which case μ = M#π. Therefore, if μ does not equal M#π, then F(μ) > G(π) ≥ F(M#π), and μ cannot be optimal. Finally, if π is optimal, then
establishing optimality of μ = M#π and completing the proof.
Since optimal couplings exist, we deduce that so do Fréchet means.
Corollary 3.1.3 (Fréchet Means and Moments)
Any finite collection of measures \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) admits a Fréchet mean μ, for all p ≥ 1
and when p > 1 equality holds if and only if μ 1 = ⋯ = μ N.
Proof
Let π be a multicoupling of μ 1, …, μ N such that μ = M N#π (Proposition 3.1.2). Then
The statement about equality follows from strict convexity of x↦∥x∥p if p > 1.
A further corollary of Proposition 3.1.2 is a bound on the support:
Corollary 3.1.4
The support of any Fréchet mean is included in the set
In particular, if all the μ i ’s are supported on a common convex set K, then so is any of their Fréchet means.
The multimarginal formulation also yields a continuity property for the empirical Fréchet mean. Conditions for uniqueness will be given in the next subsection.
Theorem 3.1.5 (Continuity of Fréchet Means)
Suppose that \(W_2(\mu _k^i,\mu ^i)\to 0\) for i = 1, …, N and let \(\overline \mu _k\) denote any Fréchet mean of \((\mu _k^1,\dots ,\mu _k^N)\). Then \((\overline \mu _k)\) stays in a compact set of \(\mathcal W_2(\mathcal X)\), and any limit point is a Fréchet mean of (μ 1, …, μ N).
In particular, if μ 1, …, μ N have a unique Fréchet mean \(\overline \mu \), then \(\overline \mu _k\to \overline \mu \) in \(\mathcal W_2(\mathcal X)\).
Proof
We sketch the steps of the proof here, with the full details given on page 70 of the supplement.
Step 1: tightness of \((\overline \mu _k)\). This is true because the collection of multicouplings is tight, and the mean function M is continuous.
Step 2: weak limits are limits in \(\mathcal W_2(\mathcal X)\). This holds because the mean function has linear growth.
Step 3: the limit is a Fréchet mean of (μ 1, …, μ N). From Corollary 3.1.3, it follows that \(\overline \mu _k\) must be sought on some fixed bounded set in \(\mathcal W_2(\mathcal X)\). On such sets, the Fréchet functionals are uniformly Lipschitz, so their minimisers converge as well.
3.1.3 Uniqueness and Regularity
A general situation in which Fréchet means are unique is when the Fréchet functional is strictly convex. In the Wasserstein space, this requires some regularity, but weak convexity holds in general. Absolutely continuous measures on infinite-dimensional \(\mathcal X\) are defined in Definition 1.6.4.
Proposition 3.1.6 (Convexity of the Fréchet Functional)
Let \(\varLambda ,\gamma _i\in \mathcal W_2(\mathcal X)\) and t ∈ [0, 1]. Then
When Λ is absolutely continuous, the inequality is strict unless t ∈{0, 1} or γ 1 = γ 2.
Remark 3.1.7
The Wasserstein distance is not convex along geodesics. That is, if we replace the linear interpolant tγ 1 + (1 − t)γ 2 by McCann’s interpolant, then \(t\mapsto W_2^2(\gamma _t,\varLambda )\) is not necessarily convex (Ambrosio et al. [ 12, Example 9.1.5]).
Proof
Let π i ∈ Π(γ i, Λ) be optimal and notice that the linear interpolant tπ 1 + (1 − t)π 2 ∈ Π(tγ 1 + (1 − t)γ 2, Λ), so that
which is (3.2). When Λ is absolutely continuous and t ∈ (0, 1), equality in (3.2) holds if and only if \(\pi _t=t\pi _1+(1-t)\pi _2=({\mathbf t_{\varLambda }^{t\gamma _1+(1-t)\gamma _2}}\times \mathbf i)\#\varLambda \). But π t is supported on the graphs of two functions: \({\mathbf t_{\varLambda }^{\gamma _1}}\) and \({\mathbf t_{\varLambda }^{\gamma _2}}\). Consequently, equality can hold only if these two maps equal Λ-almost surely, or, equivalently, if γ 1 = γ 2.
As a corollary, we deduce that the Fréchet mean is unique if one of the measures μ i is absolutely continuous, and this extends to the population version (see Proposition 3.2.7).
We conclude this subsection by stating an important regularity property in the Euclidean case. See Agueh and Carlier [2, Proposition 5.1] for a proof.
Proposition 3.1.8 (L ∞-Regularity of Fréchet Means)
Let \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R}^d)\) and suppose that μ 1 is absolutely continuous with density bounded by M. Then the Fréchet mean of {μ i} is absolutely continuous with density bounded by N dM and is consequently a Karcher mean.
In Theorem 5.5.2, we extend Proposition 3.1.8 to the population level.
3.1.4 The One-Dimensional and the Compatible Case
When \(\mathcal X=\mathbb {R}\), there is a simple expression for the Fréchet mean because \(\mathcal W_2(\mathbb {R})\) can be imbedded in a Hilbert space. Indeed, recall that
(see Sect. 2.3.2 or 1.5). In view of that, \(\mathcal W_2(\mathbb {R})\) can be seen as the convex closed subset of L 2(0, 1) formed by equivalence classes of left-continuous nondecreasing functions on (0, 1): any quantile function is left-continuous and nondecreasing, and any such function G can be seen to be the inverse function of the distribution function, the right-continuous inverse of G
(See, for example, Bobkov and Ledoux [25, Appendix A].) Therefore, the Fréchet mean of \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R})\) is the measure μ having quantile function
The Fréchet mean is thus unique. This is no longer true in higher dimension, unless some regularity is imposed on the measures (Proposition 3.2.7).
Boissard et al. [28] noticed that compatibility of μ 1, …, μ N according to Definition 2.3.1 allows for a simple solution to the Fréchet mean problem, as in the one-dimensional case. Recall from Proposition 3.1.2 that this is equivalent to the multimarginal problem. Returning to the original form of G, we obtain an easy lower bound for any π ∈ Π(μ 1, …, μ N):
because the (i, j)-th marginal of π is a coupling of μ i and μ j. Thus, if equality above holds for π, then π is optimal and M#π is the Fréchet mean by Proposition 3.1.2. This is indeed the case for \(\pi =(\mathbf i,\mathbf t_{\mu ^1}^{\mu ^2},\dots ,\mathbf t_{\mu ^1}^{\mu ^N})\#\mu ^1\) because the compatibility gives:
We may thus conclude, in a slightly more general form (γ was μ 1 above):
Theorem 3.1.9 (Fréchet Mean of Compatible Measures)
Suppose that {γ, μ 1, …, μ N} are compatible measures. Then
is the Fréchet mean of (μ 1, …, μ N).
A population version is given in Theorem 5.5.3.
3.1.5 The Agueh–Carlier Characterisation
Agueh and Carlier [2] provide a useful sufficient condition for γ to be the Fréchet mean. When \(\mathcal X=\mathbb {R}^d\), this condition is also necessary [2, Proposition 3.8], hence characterising Fréchet means in \(\mathbb {R}^d\). It will allow us to easily deduce some equivariance results for Fréchet means with respect to independence (Lemma 3.1.11) and rotations (3.1.12). More importantly, it provides a sufficient condition under which a local minimum of F is a global minimum (Theorem 3.1.15) and the same idea can be used to relate the population Fréchet mean to the expected value of the optimal maps (Theorem 4.2.4). Recall that ϕ ∗ denotes the Legendre transform of ϕ, as defined on page 14.
Proposition 3.1.10 (Fréchet Means and Potentials)
Let \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) be absolutely continuous, let \(\gamma \in \mathcal W_2(\mathcal X)\) and denote by \(\phi ^*_i\) the convex potentials of \({\mathbf t_{\mu ^i}^{\gamma }}\) . If \(\phi _i=\phi _i^{**}\) are such that
then γ is the unique Fréchet mean of μ 1, …, μ N.
Proof
Uniqueness follows from Proposition 3.2.7. If \(\theta \in \mathcal W_2(\mathcal X)\) is any measure, then the Kantorovich duality yields
Summation over i gives the result.
A population version of this result, based on similar calculations, is given in Theorem 4.2.4.
The next two results are formulated in \(\mathbb {R}^d\) because then the converse of Proposition 3.1.10 is proven to be true. If one could extend [2, Proposition 3.8] to any separable Hilbert \(\mathcal X\), then the two lemmata below will hold with \(\mathbb {R}^d\) replaced by \(\mathcal X\). The simple proofs are given on page 74 of the supplement.
Lemma 3.1.11 (Independent Fréchet Means)
Let μ 1, …, μ N and ν 1, …, ν N be absolutely continuous measures in \(\mathcal W_2(\mathbb {R}^{d_1})\) and \(\mathcal W_2(\mathbb {R}^{d_2})\) with Fréchet means μ and ν, respectively. Then the independent coupling μ ⊗ ν is the Fréchet mean of μ 1 ⊗ ν 1, …, μ N ⊗ ν N.
By induction (or a straightforward modification of the proof), one can show that the Fréchet mean of (μ i ⊗ ν i ⊗ ρ i) is μ ⊗ ν ⊗ ρ, and so on.
Lemma 3.1.12 (Rotated Fréchet Means)
If μ is the Fréchet mean of the absolutely continuous measures μ 1, …, μ N and U is orthogonal, then U#μ is the Fréchet mean of U#μ 1, …, U#μ N.
3.1.6 Differentiability of the Fréchet Functional and Karcher Means
Since we seek to minimise the Fréchet functional F, it would be helpful if F were differentiable, because we could then find at least local minima by solving the equation F′ = 0. This observation of Karcher [78] leads to the notion of Karcher mean.
Definition 3.1.13 (Karcher Mean)
Let F be a Fréchet functional associated with some random measure Λ in \(\mathcal W_2(\mathcal X)\). Then γ is a Karcher mean for Λ if F is differentiable at γ and F′(γ) = 0.
Of course, if γ is a Fréchet mean for the random measure Λ and F is differentiable at γ, then F′(γ) must vanish. In this subsection, we build upon the work of Ambrosio et al. [12] and determine the derivative of the Fréchet functional. This will not only allow for a simple characterisation of Karcher means in terms of the optimal maps \({\mathbf t_{\gamma }^{\varLambda }}\) (Proposition 3.2.14), but will also be the cornerstone of the construction of a steepest descent algorithm for empirical calculation of Fréchet means. The differentiability holds at the population level too (Theorem 3.2.13).
It turns out that the tangent bundle structure described in Sect. 2.3 gives rise to a differentiable structure in the Wasserstein space. Fix \(\mu ^0\in \mathcal W_2(\mathcal X)\) and consider the function
Ambrosio et al. [12, Corollary 10.2.7] show that when γ is absolutely continuous,
Parts of the proof of this result (the limit superior above is ≤ 0; the limit inferior is bounded below) are reproduced in Proposition 3.2.12. The integral above can be seen as the inner product
in the space \(\mathcal {L}_2(\gamma )\) that includes as a (closed) subspace the tangent space Tanγ. In terms of this inner product and the log map, we can write
so that F 0 is Fréchet-differentiableFootnote 3 at γ with derivative
By linearity, one immediately obtains:
Theorem 3.1.14 (Gradient of the Fréchet Functional)
Fix a collection of measures \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) . When \(\gamma \in \mathcal W_2(\mathcal X)\) is absolutely continuous, the Fréchet functional
is Fréchet-differentiable and
It follows from this that an absolutely continuous \(\gamma \in \mathcal W_2(\mathcal X)\) is a Karcher mean if and only if the average of the optimal maps is the identity. If in addition one μ i is absolutely continuous with bounded density, then the Fréchet mean \(\overline \mu \) is absolutely continuous by Proposition 3.1.8, so it is a Karcher mean. The result extends to the population version; see Proposition 3.2.14.
It may happen that a collection μ 1, …, μ N of absolutely continuous measures have a Karcher mean that is not a Fréchet mean; see Álvarez-Esteban et al. [9, Example 3.1] for an example in \(\mathbb {R}^2\). But a Karcher mean γ is “almost” a Fréchet mean in the following sense. By Proposition 3.2.14, \(N^{-1}\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)=x\) for γ-almost all x. If, on the other hand, the equality holds for all \(x\in \mathcal X\), then γ is the Fréchet mean by taking integrals and applying Proposition 3.1.10. One can hope that under regularity conditions, the γ-almost sure equality can be upgraded to equality everywhere. Indeed, this is the case:
Theorem 3.1.15 (Optimality Criterion for Karcher Means)
Let \(U\subseteq \mathbb {R}^d\) be an open convex set and let \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R}^d)\) be probability measures on U with bounded strictly positive densities g 1, …, g N. Suppose that an absolutely continuous Karcher mean γ is supported on U with bounded strictly positive density f there. Then γ is the Fréchet mean of μ 1, …, μ N if one of the following holds:
-
1.
\(U=\mathbb {R}^d\) and the densities f, g 1, …, g N are of class C 0, α for some α > 0;
-
2.
U is bounded and the densities f, g 1, …, g N are bounded below on U.
Proof
The result exploits Caffarelli’s regularity theory for Monge–Ampère equations in the form of Theorem 1.6.7. In the first case, there exist C 1 (in fact, C 2, α) convex potentials φ i on \(\mathbb {R}^d\) with \({\mathbf t_{\gamma }^{\mu ^i}}=\nabla \varphi _i\), so that \({\mathbf t_{\gamma }^{{\mu ^i}}}(x)\) is a singleton for all \(x\in \mathbb {R}^d\). The set \(\{x\in \mathbb {R}^d:\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)/N\ne x\}\) is γ-negligible (and hence Lebesgue negligible) and open by continuity. It is therefore empty, so F′(γ) = 0 everywhere, and γ is the Fréchet mean (see the discussion before the theorem).
In the second case, by the same argument we have \(\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)/N=x\) for all x ∈ U. Since U is convex, there must exist a constant C such that \(\sum \varphi _i(x)=C+N\|x\|{ }^2/2\) for all x ∈ U, and we may assume without loss of generality that C = 0. If one repeats the proof of Proposition 3.1.10, then F(γ) ≤ F(θ) for all θ ∈ P(U). By continuity considerations, the inequality holds for all \(\theta \in P(\overline U)\) (Theorem 2.2.7) and since \(\overline U\) is closed and convex, γ is the Fréchet mean by Corollary 3.1.3.
3.2 Population Fréchet Means
In this section, we extend the notion of empirical Fréchet mean to the population level, where Λ is a random element in \(\mathcal W_2(\mathcal X)\) (a measurable mapping from a probability space to \(\mathcal W_2(\mathcal X)\)). This requires a different strategy, since it is not clear how to define the analogue of the multicouplings at that level of abstraction. However, it is important to point out that when there is more structure in Λ, multicouplings can be defined as laws of stochastic processes; see Pass [102] for a detailed account of the problem in this case.
In analogy with (3.1), we define:
Definition 3.2.1 (Population Fréchet Mean)
Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) . The Fréchet mean of Λ is the minimiser (if it exists and is unique) of the Fréchet functional
Since W 2 is continuous and nonnegative, the expectation is well-defined.
3.2.1 Existence, Uniqueness, and Continuity
Existence and uniqueness of Fréchet means on a general metric space M are rather delicate questions. Usually, existence proofs are easier: for example, since the Fréchet functional F is continuous on M (as we show below), one often invokes local compactness of M in order to establish existence of a minimiser. Unfortunately, a different strategy is needed when \(M=\mathcal W_2(\mathcal X)\), because the Wasserstein space is not locally compact (Proposition 2.2.9).
The first thing to notice is that F is indeed continuous (this is clear for the empirical version). This is a consequence of the triangle inequality and holds when \(\mathcal W_2(\mathcal X)\) is replaced by any metric space.
Lemma 3.2.2 (Finiteness of the Fréchet Functional)
If F is not identically infinite, then it is finite and locally Lipschitz everywhere on \(\mathcal W_2(\mathcal X)\).
Proof
Assume that F is finite at γ. If θ is any other measure in \(\mathcal W_2(\mathcal X)\), write
Since x ≤ 1 + x 2 for all x, the triangle inequality in \(\mathcal W_2(\mathcal X)\) yields
Since F(γ) < ∞, this shows that F is finite everywhere and the right-hand side vanishes as θ → γ in \(\mathcal W_2(\mathcal X)\). Now that we know that F is continuous, the same upper bound shows that it is in fact locally Lipschitz.
Example: let (a n) be a sequence of positive numbers that sum up to one. Let x n = 1∕a n and suppose that Λ equals \(\delta \{x_n\}\in \mathcal W_2(\mathbb {R})\) with probability a n. Then
and by Lemma 3.2.2 F is identically infinite. Henceforth, we say that F is finite when the condition in Lemma 3.2.2 holds.
Using the lower semicontinuity (2.5), one can prove existence on \(\mathbb {R}^d\) rather easily. (The empirical means exist even in infinite dimensions by Corollary 3.1.3.)
Proposition 3.2.3 (Existence of Fréchet Means)
The Fréchet functional associated with any random measure Λ in \(\mathcal W_2(\mathbb {R}^d)\) admits a minimiser.
Proof
The assertion is clear if F is identically infinite. Otherwise, let (γ n) be a minimising sequence. We wish to show that the sequence is tight. Define L =supnF(γ n) < ∞ and observe that since x ≤ 1 + x 2 for all \(x\in \mathbb {R}\),
By the triangle inequality
so that for all n
Since closed and bounded sets in \(\mathbb {R}^d\) are compact, it follows that (γ n) is a tight sequence. We may assume that γ n → γ weakly, then use (2.5) and Fatou’s lemma to obtain
Thus, γ is a minimiser of F, and existence is established.
When \(\mathcal X\) is an infinite-dimensional Hilbert space, existence still holds under a compactness assumption. We first prove a result about the support of the Fréchet mean. At the empirical level, one can say more about the support (see Corollary 3.1.4).
Proposition 3.2.4 (Support of Fréchet Mean)
Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) and let \(K\subseteq \mathcal X\) be a convex closed set such that \(\mathbb {P}[\varLambda (K)=1]=1\). If γ minimises F, then γ(K) = 1.
Remark 3.2.5
For any closed \(K\subseteq \mathcal X\) and any α ∈ [0, 1], the set \(\{\varLambda \in \mathcal W_p(\mathcal X):\varLambda (K)\ge \alpha \}\) is closed in \(\mathcal W_p(\mathcal X)\) because \(\{\varLambda \in P(\mathcal X):\varLambda (K)\ge \alpha \}\) is weakly closed by the portmanteau lemma (Lemma 1.7.1 ).
The proof amounts to a simple projection argument; see page 79 in the supplement.
Corollary 3.2.6
If there exists a compact convex K satisfying the hypothesis of Proposition 3.2.4 , then the Fréchet functional admits a minimiser supported on K.
Proof
Proposition 3.2.4 allows us to restrict the domain of F to \(\mathcal W_2(K)\), the collection of probability measures supported on K. Since this set is compact in \(\mathcal W_2(\mathcal X)\) (Corollary 2.2.5), the result follows from continuity of F.
From the convexity (3.2), one obtains a simple criterion for uniqueness. See Definition 1.6.4 for absolute continuity in infinite dimensions.
Proposition 3.2.7 (Uniqueness of Fréchet Means)
Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) with finite Fréchet functional. If Λ is absolutely continuous with positive (inner) probability, then the Fréchet mean of Λ is unique (if it exists).
Remark 3.2.8
It is not obvious that the set of absolutely continuous measures is measurable in \(\mathcal W_2(\mathcal X)\) . We assume that there exists a Borel set \(A\subset \mathcal W_2(\mathcal X)\) such that \(\mathbb {P}(\varLambda \in A)>0\) and all measures in A are absolutely continuous.
Proof
By taking expectations in (3.2), one sees that F is convex on \(\mathcal W_2(\mathcal X)\) with respect to linear interpolants. From Proposition 3.1.6, we conclude that
As F was already shown to be weakly convex in any case, it follows that
Since strictly convex functionals have at most one minimiser, this completes the proof.
We state without proof an important consistency result (Le Gouic and Loubes [87, Theorem 3]). Since \(\mathcal W_2(\mathcal X)\) is a complete and separable metric space, we can define the “second degree” Wasserstein space \(\mathcal W_2(\mathcal W_2(\mathcal X))\). The law of a random measure Λ is in \(\mathcal W_2(\mathcal W_2(\mathcal X))\) if and only if the corresponding Fréchet functional is finite.
Theorem 3.2.9 (Consistency of Fréchet Means)
Let Λ n, Λ be random measures in \(\mathcal W_2(\mathbb {R}^d)\) with finite Fréchet functionals and laws \(\mathbb {P}_n,\mathbb {P}\in \mathcal W_2(\mathcal W_2(\mathbb {R}^d))\). If \(\mathbb {P}_n\to \mathbb {P}\) in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\), then any sequence λ n of Fréchet means of Λ n has a W 2-limit point λ, which is a Fréchet mean of Λ.
See the Bibliographical Notes for a more general formulation.
Corollary 3.2.10 (Wasserstein Law of Large Numbers)
Let Λ be a random measure in \(\mathcal W_2(\mathbb {R}^d)\) with finite Fréchet functional and let Λ 1, … be a sample from Λ. Assume λ is the unique Fréchet mean of Λ (see Proposition 3.2.7). Then almost surely, the sequence of empirical Fréchet means of Λ 1, …, Λ n converges to λ.
Proof
Let \(\mathbb {P}\) be the law of Λ and let \(\mathbb {P}_n\) be its empirical counterpart (a random element in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\). Like in the proof of Proposition 2.2.6 (with \(\mathcal X\) replaced by the complete separable metric space \(\mathcal W_2(\mathbb {R}^d)\)), almost surely \(\mathbb {P}_n\to \mathbb {P}\) in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\) and Theorem 3.2.9 applies.
Under a compactness assumption, one can give a direct proof for the law of large numbers as in Theorem 3.1.5. This is done on page 80 in the supplement.
3.2.2 The One-Dimensional Case
As a generalisation of the empirical version, we have:
Theorem 3.2.11 (Fréchet Means in \(\mathcal W_2(\mathbb {R})\))
Let Λ be a random measure in \(\mathcal W_2(\mathbb {R})\) with finite Fréchet functional. Then the Fréchet mean of Λ is the unique measure λ with quantile function \(F_\lambda ^{-1}(t)=\mathbb {E} F_\varLambda ^{-1}(t)\), t ∈ (0, 1).
Proof
Since L 2(0, 1) is a Hilbert space, the random element \(F_\varLambda ^{-1}\in L_2(0,1)\) has a unique Fréchet mean g ∈ L 2(0, 1), defined by the relations \({\left \langle {g},{f}\right \rangle } =\mathbb {E}{\left \langle {F_\varLambda ^{-1}},{f}\right \rangle } \) for all f ∈ L 2(0, 1). On page 80 of the supplement, we show that g can be identified with \(F_\lambda ^{-1}\).
Interestingly, no regularity is needed in order for the Fréchet mean to be unique. This is not the case for higher dimensions, see Proposition 3.2.7. If there is some regularity, then one can state Theorem 3.2.11 in terms of optimal maps, because \(F_\lambda ^{-1}\) is the optimal map from Leb|[0,1] to Λ. If \(\gamma \in \mathcal W_2(\mathbb {R})\) is any absolutely continuous (or even just continuous) measure, then Theorem 3.2.11 can be stated as follows: the Fréchet mean of Λ is the measure \([\mathbb {E}{\mathbf t_{\gamma }^{\varLambda }}]\#\gamma \). A generalisation of this result to compatible measures (Definition 2.3.1) can be carried out in the same way, since compatible measures are imbedded in a Hilbert space, using the Bochner integrals for the definition of the expected optimal maps (see Sect. 2.4).
3.2.3 Differentiability of the Population Fréchet Functional
We now use the Fubini result (Proposition 2.4.9) in order to extend the differentiability of the Fréchet functional to the population version. This will follow immediately if we can interchange the expectation and the derivative in the form
In order to do this, we will use dominated convergence in conjunction with uniform bounds on the slopes
Proposition 3.2.12 (Slope Bounds)
Let θ 0, Λ, and θ be probability measures with θ 0 absolutely continuous, and set δ = W 2(θ, θ 0). Then
where u is defined by (3.4). If the measures are compatible in the sense of Definition 2.3.1 , then u(θ, Λ) = δ∕2.
The proof is a slight variation of Ambrosio et al. [12, Theorem 10.2.2 and Proposition 10.2.6], and the details are given on page 81 of the supplement.
Theorem 3.2.13 (Population Fréchet Gradient)
Let Λ be a random measure with finite Fréchet functional F. Then F is Fréchet-differentiable at any absolutely continuous θ 0 in the Wasserstein space, and \(F'(\theta _0)=\mathbb {E}{\mathbf t_{\theta _0}^{\varLambda }}-\mathbf i\in \mathcal {L}_2(\theta _0)\) . More precisely,
Thus, the Fréchet derivative of F can be identified with the map \(-(\mathbb {E}{\mathbf t_{\theta _0}^{\varLambda }}-\mathbf i)\) in the tangent space at θ 0, a subspace of \(\mathcal {L}_2(\theta _0)\).
Proof
Introduce the slopes u(θ, Λ) defined by (3.4). Then for all Λ,u(θ, Λ) → 0 as W 2(θ, θ 0) → 0, by the differentiability properties established above. Let us show that \(\mathbb {E} u(\theta ,\varLambda )\to 0\) as well. By Proposition 3.2.12, the expectation of u is bounded above by a constant that does not depend on Λ, and below by the negative of
Both expectations are finite by the hypothesis on Λ because the Fréchet functional is finite. The dominated convergence theorem yields
The measurability of the integral and the result then follow from Fubini’s theorem (see Proposition 2.4.9).
Proposition 3.2.14
Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) with finite Fréchet functional F, and let γ be absolutely continuous in \(\mathcal W_2(\mathcal X)\) . Then γ is a Karcher mean of Λ if and only if \(\mathbb {E}{\mathbf t_{\gamma }^{\varLambda }}-\mathbf i=0\) in \(\mathcal {L}_2(\gamma )\) . Furthermore, if γ is a Fréchet mean of Λ, then it is also a Karcher mean.
The characterisation of Karcher means follows immediately from Theorem 3.2.13. The other statement is that the derivative vanishes at the minimum, which is fairly obvious intuitively; see page 82 in the supplement.
3.3 Bibliographical Notes
Proposition 3.1.2 is essentially due to Agueh and Carlier [2, Proposition 4.2], who show it on \(\mathbb {R}^d\) (see also Zemel and Panaretos [134, Theorem 2]). An earlier result in a compact setting can be found in Carlier and Ekeland [33]. The formulation given here is from Masarotto et al. [91]. A more general version is provided by Le Gouic and Loubes [87, Theorem 8].
Lemmata 3.1.11 and 3.1.12 are from [135], but were known earlier (e.g., Bonneel et al. [30]).
Proposition 3.1.6 is a simplified version of Álvarez-Esteban et al. [8, Theorem 2.8] (see [8, Corollary 2.9]).
Propositions 3.2.3 and 3.2.7 are from Bigot and Klein [22], who also show the law of large numbers (Corollary 3.2.10) and deal with the one-dimensional setup (Theorem 3.2.11) in a compact setting. Section 2.4 appears to be new, but see the discussion in its beginning for other measurability results.
Barycentres can be defined for any p ≥ 1 as the measures minimising \(\mu \mapsto \mathbb {E} W_p^p(\varLambda ,\mu )\). (Strictly speaking, these are not Fréchet means unless p = 2.) Le Gouic and Loubes [87] show Proposition 3.2.3 and Theorem 3.2.9 in this more general setup, where \(\mathbb {R}^d\) can be replaced by any separable locally compact geodesic space.
Notes
- 1.
It should be remarked that this is a Hilbertian property (or at least a property linked to an inner product), not merely a linear property. In other words, it does not extend to Banach spaces. As an example, let \(H=\mathbb {R}^2\) with the L 1 norm and consider the vertices (0, 0), (0, 1), and (1, 0) of the unit simplex. The mean of these is (1∕3, 1∕3) but for (x, y) in the triangle,
$$\displaystyle \begin{aligned} F(x,y) =(x+y)^2 + (x+1-y)^2 + (1-x+y)^2 =2 + x^2 + y^2 + (x-y)^2 \end{aligned}$$is minimised at (0, 0).
- 2.
- 3.
The notion of Fréchet derivative is also named after Maurice Fréchet, but is not directly related to Fréchet means.
References
M. Agueh, G. Carlier, Barycenters in the Wasserstein space. Soc. Indus. Appl. Math. 43(2), 904–924 (2011)
P.C. Álvarez-Esteban, E. del Barrio, J.A. Cuesta-Albertos, C. Matrán, Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincaré Probab. Stat. 47(2), 358–375 (2011)
P.C. Álvarez-Esteban, E. del Barrio, J.A. Cuesta-Albertos, C. Matrán, A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appl. 441(2), 744–762 (2016)
L. Ambrosio, N. Gigli, A user’s guide to optimal transport, in Modelling and Optimisation of Flows on Networks (Springer, Berlin, 2013), pp. 1–155
L. Ambrosio, N. Gigli, G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Zürich, 2nd edn. (Springer, Berlin, 2008)
J. Bigot, T. Klein, Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probab. Stat. 22, 35–57 (2018)
S. Bobkov, M. Ledoux, One-Dimensional Empirical Measures, Order Statistics and Kantorovich Transport Distances, vol. 261, no. 1259 (Memoirs of the American Mathematical Society, Providence, 2019). https://doi.org/10.1090/memo/1259
E. Boissard, T. Le Gouic, J.-M. Loubes, Distribution‘s template estimate with Wasserstein metrics. Bernoulli 21(2), 740–759 (2015)
N. Bonneel, J. Rabin, G. Peyré, H. Pfister, Sliced and radon Wasserstein barycenters of measures. J. Math. Imag. Vis. 51(1), 22–45 (2015)
G. Carlier, I. Ekeland, Matching for teams. Econ. Theory 42(2), 397–418 (2010)
D. Dowson, B. Landau, The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982)
M. Fréchet, Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. Inst. Henri Poincaré 10(4), 215–310 (1948)
M. Fréchet, Sur la distance de deux lois de probabilité. C.R. Hebd. Seances Acad. Sci. 244(6), 689–692 (1957)
W. Gangbo, A. Świȩch, Optimal maps for the multidimensional Monge–Kantorovich problem. Comm. Pure Appl. Math. 51(1), 23–45 (1998)
H. Karcher, Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 30(5), 509–541 (1977)
T. Le Gouic, J.-M. Loubes, Existence and consistency of Wasserstein barycenters. Prob. Theory Relat. Fields 168(3–4), 901–917 (2017)
V. Masarotto, V.M. Panaretos, Y. Zemel, Procrustes metrics on covariance operators and optimal transportation of Gaussian processes. Sankhyā A 81, 172–213 (2019) (Invited Paper, Special Issue on Statistics on non-Euclidean Spaces and Manifolds)
B. Pass, Optimal transportation with infinitely many marginals. J. Funct. Anal. 264(4), 947–963 (2013)
Y. Zemel, V.M. Panaretos, Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli 25(2), 932–976 (2019)
Y. Zemel, V.M. Panaretos, Supplement to “Fréchet means and Procrustes analysis in Wasserstein space” (2019)
Author information
Authors and Affiliations
3.1 Electronic Supplementary Material
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this chapter
Cite this chapter
Panaretos, V.M., Zemel, Y. (2020). Fréchet Means in the Wasserstein Space \(\mathcal W_2\) . In: An Invitation to Statistics in Wasserstein Space. SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-38438-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-38438-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38437-1
Online ISBN: 978-3-030-38438-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)