# Fréchet Means in the Wasserstein Space \(\mathcal W_2\)

- 262 Downloads

## Abstract

If *H* is a Hilbert space (or a closed convex subspace thereof) and *x*_{1}, …, *x*_{N} ∈ *H*, then the empirical mean \(\overline x_N=N^{-1}\sum x_i\) is the unique element of *H* that minimises the sum of squared distances from the *x*_{i}’s.

*H*is a Hilbert space (or a closed convex subspace thereof) and

*x*

_{1}, …,

*x*

_{N}∈

*H*, then the empirical mean \(\overline x_N=N^{-1}\sum x_i\) is the unique element of

*H*that minimises the sum of squared distances from the

*x*

_{i}’s.

^{1}That is, if we define

*F*. This is easily seen by “opening the squares” and writing

*Fréchet functional*. A closely related notion is that of a

*Karcher mean*(Karcher [78]), a term that describes stationary points of the sum of squares functional, when the latter is differentiable (see Sect. 3.1.6). Population versions of Fréchet means, assuming the space is endowed with a probability law, can also be defined, replacing summation by expectation with respect to that law.

Fréchet means are perhaps the most basic object of statistical interest, and this chapter studies such means when the underlying space is the Wasserstein space \(\mathcal W_2\). In general, existence and uniqueness of a Fréchet mean can be subtle, but we will see that the nature of optimal transport allows for rather clean statements in the case of Wasserstein space.

## 3.1 Empirical Fréchet Means in \(\mathcal W_2\)

### 3.1.1 The Fréchet Functional

As foretold in the preceding paragraph, the definition of a Fréchet mean requires the definition of an appropriate sum-of-squares functional, the *Fréchet functional*:

### Definition 3.1.1 (Empirical Fréchet Functional and Mean)

*The Fréchet functi- onal associated with measures*\(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\)

*is*

*A Fréchet mean of* (*μ*^{1}, …, *μ*^{N}) *is a minimiser of F in* \(\mathcal W_2(\mathcal X)\) *(if it exists).*

In analysis, a Fréchet mean is often called a *barycentre*. We shall use the terminology of “Fréchet mean” that is arguably more popular in statistics.^{2}

The factor 1∕(2*N*) is irrelevant for the definition of Fréchet mean. It is introduced in order to have simpler expressions for the derivatives (Theorems 3.1.14 and 3.2.13) and to be compatible with a population version \(\mathbb {E} W_2^2(\gamma ,\varLambda )/2\) (see (3.3)).

*w*

_{i}’s are rational, then the weighted functional can be encompassed in (3.1) by taking some of the

*μ*

^{i}’s to be the same. The case of irrational

*w*

_{i}’s is then treated with continuity arguments. Moreover, (3.3) encapsulates (3.1) as well as the weighted version when

*Λ*can take finitely many values.)

### 3.1.2 Multimarginal Formulation, Existence, and Continuity

*multimarginal*Monge– Kantorovich problem. Let

*μ*

^{1}, …,

*μ*

^{N}be

*N*measures in \(\mathcal W_2(\mathcal X)\) and let

*Π*(

*μ*

^{1}, …,

*μ*

^{N}) be the set of probability measures in \(\mathcal X^N\) having \(\{\mu ^i\}_{i=1}^N\) as marginals. The problem is to minimise

*N*

^{2}) is of course irrelevant for the minimisation and its purpose will be clarified shortly. If

*N*= 2, we obtain the Kantorovich problem with quadratic cost. The probabilistic interpretation (as in Sect. 1.2) is that one is given random variables

*X*

_{1}, …,

*X*

_{N}with marginal probability laws

*μ*

^{1}, …,

*μ*

^{N}and seeks to construct a random vector

*Y*= (

*Y*

_{1}, …,

*Y*

_{N}) on \(\mathcal X^N\) such that \(X_i\stackrel {d}{=}Y_i\) and

*Z*= (

*Z*

_{1}, …,

*Z*

_{N}) such that \(X_i\stackrel {d}{=}Z_i\). Intuitively, we seek a random vector with prescribed marginals but maximally correlated entries.

We refer to elements of *Π*(*μ*^{1}, …, *μ*^{N}) (equivalently, joint laws of *X*_{1}, …, *X*_{N}) as *multicouplings* (of *μ*^{1}, …, *μ*^{N}). Just like in the Kantorovich problem, there always exists an optimal multicoupling *π*.

*μ*

^{1}, …,

*μ*

^{N}. The first thing to observe is that the objective function can be written as

### Proposition 3.1.2 (Fréchet Means and Multicouplings)

*Let* \(\mu ^1\!,\dots ,\mu ^N\!\in \mathcal W(\mathcal X)\)*. Then μ is a Fréchet mean of* (*μ*^{1}, …, *μ*^{N}) *if and only if there exists an optimal multicoupling* \(\pi \in \mathcal W(\mathcal X^N)\) *of* (*μ*^{1}, …, *μ*^{N}) *such that μ *=* M#π, and furthermore F*(*μ*) =* G*(*π*).

### Proof

*π*be an arbitrary multicoupling of (

*μ*

^{1}, …,

*μ*

^{N}) and set

*μ*=

*M#π*. Then (

*x*↦

*x*

_{i},

*M*)

*#π*is a coupling of

*μ*

^{i}and

*μ*, and therefore

*i*gives

*F*(

*μ*) ≤

*G*(

*π*) and so \(\inf F\le \inf G\).

For the other inequality, let \(\mu \in \mathcal W(\mathcal X)\) be arbitrary. For each *i*, let *π*^{i} be an optimal coupling between *μ* and *μ*^{i}. Invoking the gluing lemma (Ambrosio and Gigli [10, Lemma 2.1]), we may glue all *π*^{i}’s using their common marginal *μ*. This procedure constructs a measure *η* on \(\mathcal X^{N+1}\) with marginals *μ*_{1}, …, *μ*_{N}, *μ* and its relevant projection *π* is then a multicoupling of *μ*_{1}, …, *μ*_{N}.

*y*=

*M*(

*x*). Thus

*y*=

*M*(

*x*)

*η*-almost surely, in which case

*μ*=

*M#π*. Therefore, if

*μ*does not equal

*M#π*, then

*F*(

*μ*) >

*G*(

*π*) ≥

*F*(

*M#π*), and

*μ*cannot be optimal. Finally, if

*π*is optimal, then

*μ*=

*M#π*and completing the proof.

Since optimal couplings exist, we deduce that so do Fréchet means.

### Corollary 3.1.3 (Fréchet Means and Moments)

*Any finite collection of measures*\(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\)

*admits a Fréchet mean μ, for all p*≥ 1

*and when p *> 1 *equality holds if and only if μ*^{1} = ⋯ =* μ*^{N}.

### Proof

*π*be a multicoupling of

*μ*

^{1}, …,

*μ*

^{N}such that

*μ*=

*M*

_{N}

*#π*(Proposition 3.1.2). Then

*x*↦∥

*x*∥

^{p}if

*p*> 1.

A further corollary of Proposition 3.1.2 is a bound on the support:

### Corollary 3.1.4

*The support of any Fréchet mean is included in the set*

*In particular, if all the μ* ^{ i } *’s are supported on a common convex set K, then so is any of their Fréchet means.*

The multimarginal formulation also yields a continuity property for the empirical Fréchet mean. Conditions for uniqueness will be given in the next subsection.

### Theorem 3.1.5 (Continuity of Fréchet Means)

*Suppose that* \(W_2(\mu _k^i,\mu ^i)\to 0\) *for i *= 1, …, *N and let* \(\overline \mu _k\) *denote any Fréchet mean of* \((\mu _k^1,\dots ,\mu _k^N)\)*. Then* \((\overline \mu _k)\) *stays in a compact set of* \(\mathcal W_2(\mathcal X)\)*, and any limit point is a Fréchet mean of* (*μ*^{1}, …, *μ*^{N}).

In particular, if *μ*^{1}, …, *μ*^{N} have a *unique* Fréchet mean \(\overline \mu \), then \(\overline \mu _k\to \overline \mu \) in \(\mathcal W_2(\mathcal X)\).

### Proof

We sketch the steps of the proof here, with the full details given on page 70 of the supplement.

**Step 1:** tightness of \((\overline \mu _k)\). This is true because the collection of multicouplings is tight, and the mean function *M* is continuous.

**Step 2:** weak limits are limits in \(\mathcal W_2(\mathcal X)\). This holds because the mean function has linear growth.

**Step 3:** the limit is a Fréchet mean of (*μ*^{1}, …, *μ*^{N}). From Corollary 3.1.3, it follows that \(\overline \mu _k\) must be sought on some fixed bounded set in \(\mathcal W_2(\mathcal X)\). On such sets, the Fréchet functionals are uniformly Lipschitz, so their minimisers converge as well.

### 3.1.3 Uniqueness and Regularity

A general situation in which Fréchet means are unique is when the Fréchet functional is strictly convex. In the Wasserstein space, this requires some regularity, but weak convexity holds in general. Absolutely continuous measures on infinite-dimensional \(\mathcal X\) are defined in Definition 1.6.4.

### Proposition 3.1.6 (Convexity of the Fréchet Functional)

*Let*\(\varLambda ,\gamma _i\in \mathcal W_2(\mathcal X)\)

*and t*∈ [0, 1]

*. Then*

*When Λ is absolutely continuous, the inequality is strict unless t *∈{0, 1} *or γ*_{1} =* γ*_{2}.

### Remark 3.1.7

*The Wasserstein distance is not convex along geodesics. That is, if we replace the linear interpolant tγ*_{1} + (1 −* t*)*γ*_{2} *by McCann’s interpolant, then* \(t\mapsto W_2^2(\gamma _t,\varLambda )\) *is not necessarily convex (Ambrosio et al. [*12*, Example 9.1.5]).*

### Proof

*π*

_{i}∈

*Π*(

*γ*

_{i},

*Λ*) be optimal and notice that the linear interpolant

*tπ*

_{1}+ (1 −

*t*)

*π*

_{2}∈

*Π*(

*tγ*

_{1}+ (1 −

*t*)

*γ*

_{2},

*Λ*), so that

*Λ*is absolutely continuous and

*t*∈ (0, 1), equality in (3.2) holds if and only if \(\pi _t=t\pi _1+(1-t)\pi _2=({\mathbf t_{\varLambda }^{t\gamma _1+(1-t)\gamma _2}}\times \mathbf i)\#\varLambda \). But

*π*

_{t}is supported on the graphs of two functions: \({\mathbf t_{\varLambda }^{\gamma _1}}\) and \({\mathbf t_{\varLambda }^{\gamma _2}}\). Consequently, equality can hold only if these two maps equal

*Λ*-almost surely, or, equivalently, if

*γ*

_{1}=

*γ*

_{2}.

As a corollary, we deduce that the Fréchet mean is unique if one of the measures *μ*^{i} is absolutely continuous, and this extends to the population version (see Proposition 3.2.7).

We conclude this subsection by stating an important regularity property in the Euclidean case. See Agueh and Carlier [2, Proposition 5.1] for a proof.

### Proposition 3.1.8 (*L*_{∞}-Regularity of Fréchet Means)

*Let* \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R}^d)\) *and suppose that μ*^{1} *is absolutely continuous with density bounded by M. Then the Fréchet mean of* {*μ*^{i}} *is absolutely continuous with density bounded by N*^{d}*M and is consequently a Karcher mean.*

In Theorem 5.5.2, we extend Proposition 3.1.8 to the population level.

### 3.1.4 The One-Dimensional and the Compatible Case

*L*

_{2}(0, 1) formed by equivalence classes of left-continuous nondecreasing functions on (0, 1): any quantile function is left-continuous and nondecreasing, and any such function

*G*can be seen to be the inverse function of the distribution function, the

*right-continuous inverse*of

*G*

*μ*having quantile function

*μ*

^{1}, …,

*μ*

^{N}according to Definition 2.3.1 allows for a simple solution to the Fréchet mean problem, as in the one-dimensional case. Recall from Proposition 3.1.2 that this is equivalent to the multimarginal problem. Returning to the original form of

*G*, we obtain an easy lower bound for any

*π*∈

*Π*(

*μ*

^{1}, …,

*μ*

^{N}):

*i*,

*j*)-th marginal of

*π*is a coupling of

*μ*

^{i}and

*μ*

^{j}. Thus, if equality above holds for

*π*, then

*π*is optimal and

*M#π*is the Fréchet mean by Proposition 3.1.2. This is indeed the case for \(\pi =(\mathbf i,\mathbf t_{\mu ^1}^{\mu ^2},\dots ,\mathbf t_{\mu ^1}^{\mu ^N})\#\mu ^1\) because the compatibility gives:

*γ*was

*μ*

^{1}above):

### Theorem 3.1.9 (Fréchet Mean of Compatible Measures)

*Suppose that*{

*γ*,

*μ*

^{1}, …,

*μ*

^{N}}

*are compatible measures. Then*

*is the Fréchet mean of* (*μ*^{1}, …, *μ*^{N}).

A population version is given in Theorem 5.5.3.

### 3.1.5 The Agueh–Carlier Characterisation

Agueh and Carlier [2] provide a useful sufficient condition for *γ* to be the Fréchet mean. When \(\mathcal X=\mathbb {R}^d\), this condition is also necessary [2, Proposition 3.8], hence characterising Fréchet means in \(\mathbb {R}^d\). It will allow us to easily deduce some equivariance results for Fréchet means with respect to independence (Lemma 3.1.11) and rotations (3.1.12). More importantly, it provides a sufficient condition under which a local minimum of *F* is a global minimum (Theorem 3.1.15) and the same idea can be used to relate the population Fréchet mean to the expected value of the optimal maps (Theorem 4.2.4). Recall that *ϕ*^{∗} denotes the Legendre transform of *ϕ*, as defined on page 14.

### Proposition 3.1.10 (Fréchet Means and Potentials)

*Let*\(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\)

*be absolutely continuous, let*\(\gamma \in \mathcal W_2(\mathcal X)\)

*and denote by*\(\phi ^*_i\)

*the convex potentials of*\({\mathbf t_{\mu ^i}^{\gamma }}\)

*. If*\(\phi _i=\phi _i^{**}\)

*are such that*

*then γ is the unique Fréchet mean of μ*^{1}, …, *μ*^{N}.

### Proof

*i*gives the result.

A population version of this result, based on similar calculations, is given in Theorem 4.2.4.

The next two results are formulated in \(\mathbb {R}^d\) because then the converse of Proposition 3.1.10 is proven to be true. If one could extend [2, Proposition 3.8] to any separable Hilbert \(\mathcal X\), then the two lemmata below will hold with \(\mathbb {R}^d\) replaced by \(\mathcal X\). The simple proofs are given on page 74 of the supplement.

### Lemma 3.1.11 (Independent Fréchet Means)

*Let μ*^{1}, …, *μ*^{N} *and ν*^{1}, …, *ν*^{N} *be absolutely continuous measures in* \(\mathcal W_2(\mathbb {R}^{d_1})\) *and* \(\mathcal W_2(\mathbb {R}^{d_2})\) *with Fréchet means μ and ν, respectively. Then the independent coupling μ *⊗* ν is the Fréchet mean of μ*^{1} ⊗* ν*^{1}, …, *μ*^{N} ⊗* ν*^{N}.

By induction (or a straightforward modification of the proof), one can show that the Fréchet mean of (*μ*^{i} ⊗ *ν*^{i} ⊗ *ρ*^{i}) is *μ* ⊗ *ν* ⊗ *ρ*, and so on.

### Lemma 3.1.12 (Rotated Fréchet Means)

*If μ is the Fréchet mean of the absolutely continuous measures μ*^{1}, …, *μ*^{N} *and U is orthogonal, then U#μ is the Fréchet mean of U#μ*^{1}, …, *U#μ*^{N}.

### 3.1.6 Differentiability of the Fréchet Functional and Karcher Means

Since we seek to minimise the Fréchet functional *F*, it would be helpful if *F* were differentiable, because we could then find at least local minima by solving the equation *F′* = 0. This observation of Karcher [78] leads to the notion of *Karcher mean*.

### Definition 3.1.13 (Karcher Mean)

*Let F be a Fréchet functional associated with some random measure Λ in* \(\mathcal W_2(\mathcal X)\)*. Then γ is a Karcher mean for Λ if F is differentiable at γ and F′*(*γ*) = 0.

Of course, if *γ* is a Fréchet mean for the random measure *Λ* and *F* is differentiable at *γ*, then *F′*(*γ*) must vanish. In this subsection, we build upon the work of Ambrosio et al. [12] and determine the derivative of the Fréchet functional. This will not only allow for a simple characterisation of Karcher means in terms of the optimal maps \({\mathbf t_{\gamma }^{\varLambda }}\) (Proposition 3.2.14), but will also be the cornerstone of the construction of a steepest descent algorithm for empirical calculation of Fréchet means. The differentiability holds at the population level too (Theorem 3.2.13).

*γ*is absolutely continuous,

_{γ}. In terms of this inner product and the log map, we can write

*F*

_{0}is Fréchet-differentiable

^{3}at

*γ*with derivative

### Theorem 3.1.14 (Gradient of the Fréchet Functional)

*Fix a collection of measures*\(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\)

*. When*\(\gamma \in \mathcal W_2(\mathcal X)\)

*is absolutely continuous, the Fréchet functional*

*is Fréchet-differentiable and*

It follows from this that an absolutely continuous \(\gamma \in \mathcal W_2(\mathcal X)\) is a Karcher mean if and only if the average of the optimal maps is the identity. If in addition one *μ*^{i} is absolutely continuous with bounded density, then the Fréchet mean \(\overline \mu \) is absolutely continuous by Proposition 3.1.8, so it is a Karcher mean. The result extends to the population version; see Proposition 3.2.14.

It may happen that a collection *μ*^{1}, …, *μ*^{N} of absolutely continuous measures have a Karcher mean that is not a Fréchet mean; see Álvarez-Esteban et al. [9, Example 3.1] for an example in \(\mathbb {R}^2\). But a Karcher mean *γ* is “almost” a Fréchet mean in the following sense. By Proposition 3.2.14, \(N^{-1}\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)=x\) for *γ*-almost all *x*. If, on the other hand, the equality holds *for all* \(x\in \mathcal X\), then *γ* is the Fréchet mean by taking integrals and applying Proposition 3.1.10. One can hope that under regularity conditions, the *γ*-almost sure equality can be upgraded to equality everywhere. Indeed, this is the case:

### Theorem 3.1.15 (Optimality Criterion for Karcher Means)

*Let*\(U\subseteq \mathbb {R}^d\)

*be an open convex set and let*\(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R}^d)\)

*be probability measures on U with bounded strictly positive densities g*

^{1}, …,

*g*

^{N}

*. Suppose that an absolutely continuous Karcher mean γ is supported on U with bounded strictly positive density f there. Then γ is the Fréchet mean of μ*

^{1}, …,

*μ*

^{N}

*if one of the following holds:*

- 1.
\(U=\mathbb {R}^d\)

*and the densities f*,*g*^{1}, …,*g*^{N}*are of class C*^{0, α}*for some α*> 0*;* - 2.
*U is bounded and the densities f*,*g*^{1}, …,*g*^{N}*are bounded below on U.*

### Proof

The result exploits Caffarelli’s regularity theory for Monge–Ampère equations in the form of Theorem 1.6.7. In the first case, there exist *C*^{1} (in fact, *C*^{2, α}) convex potentials *φ*_{i} on \(\mathbb {R}^d\) with \({\mathbf t_{\gamma }^{\mu ^i}}=\nabla \varphi _i\), so that \({\mathbf t_{\gamma }^{{\mu ^i}}}(x)\) is a singleton for all \(x\in \mathbb {R}^d\). The set \(\{x\in \mathbb {R}^d:\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)/N\ne x\}\) is *γ*-negligible (and hence Lebesgue negligible) and open by continuity. It is therefore empty, so *F′*(*γ*) = 0 everywhere, and *γ* is the Fréchet mean (see the discussion before the theorem).

In the second case, by the same argument we have \(\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)/N=x\) for all *x* ∈ *U*. Since *U* is convex, there must exist a constant *C* such that \(\sum \varphi _i(x)=C+N\|x\|{ }^2/2\) for all *x* ∈ *U*, and we may assume without loss of generality that *C* = 0. If one repeats the proof of Proposition 3.1.10, then *F*(*γ*) ≤ *F*(*θ*) for all *θ* ∈ *P*(*U*). By continuity considerations, the inequality holds for all \(\theta \in P(\overline U)\) (Theorem 2.2.7) and since \(\overline U\) is closed and convex, *γ* is the Fréchet mean by Corollary 3.1.3.

## 3.2 Population Fréchet Means

In this section, we extend the notion of empirical Fréchet mean to the population level, where *Λ* is a random element in \(\mathcal W_2(\mathcal X)\) (a measurable mapping from a probability space to \(\mathcal W_2(\mathcal X)\)). This requires a different strategy, since it is not clear how to define the analogue of the multicouplings at that level of abstraction. However, it is important to point out that when there is more structure in *Λ*, multicouplings can be defined as laws of stochastic processes; see Pass [102] for a detailed account of the problem in this case.

In analogy with (3.1), we define:

### Definition 3.2.1 (Population Fréchet Mean)

*Let Λ be a random measure in*\(\mathcal W_2(\mathcal X)\)

*. The Fréchet mean of Λ is the minimiser (if it exists and is unique) of the Fréchet functional*

Since *W*_{2} is continuous and nonnegative, the expectation is well-defined.

### 3.2.1 Existence, Uniqueness, and Continuity

Existence and uniqueness of Fréchet means on a general metric space *M* are rather delicate questions. Usually, existence proofs are easier: for example, since the Fréchet functional *F* is continuous on *M* (as we show below), one often invokes local compactness of *M* in order to establish existence of a minimiser. Unfortunately, a different strategy is needed when \(M=\mathcal W_2(\mathcal X)\), because the Wasserstein space is not locally compact (Proposition 2.2.9).

The first thing to notice is that *F* is indeed continuous (this is clear for the empirical version). This is a consequence of the triangle inequality and holds when \(\mathcal W_2(\mathcal X)\) is replaced by any metric space.

### Lemma 3.2.2 (Finiteness of the Fréchet Functional)

*If F is not identically infinite, then it is finite and locally Lipschitz everywhere on* \(\mathcal W_2(\mathcal X)\).

### Proof

*F*is finite at

*γ*. If

*θ*is any other measure in \(\mathcal W_2(\mathcal X)\), write

*x*≤ 1 +

*x*

^{2}for all

*x*, the triangle inequality in \(\mathcal W_2(\mathcal X)\) yields

*F*(

*γ*) <

*∞*, this shows that

*F*is finite everywhere and the right-hand side vanishes as

*θ*→

*γ*in \(\mathcal W_2(\mathcal X)\). Now that we know that

*F*is continuous, the same upper bound shows that it is in fact locally Lipschitz.

*a*

_{n}) be a sequence of positive numbers that sum up to one. Let

*x*

_{n}= 1∕

*a*

_{n}and suppose that

*Λ*equals \(\delta \{x_n\}\in \mathcal W_2(\mathbb {R})\) with probability

*a*

_{n}. Then

*F*is identically infinite. Henceforth, we say that

*F*is finite when the condition in Lemma 3.2.2 holds.

Using the lower semicontinuity ( 2.5), one can prove existence on \(\mathbb {R}^d\) rather easily. (The empirical means exist even in infinite dimensions by Corollary 3.1.3.)

### Proposition 3.2.3 (Existence of Fréchet Means)

*The Fréchet functional associated with any random measure Λ in* \(\mathcal W_2(\mathbb {R}^d)\) *admits a minimiser.*

### Proof

*F*is identically infinite. Otherwise, let (

*γ*

_{n}) be a minimising sequence. We wish to show that the sequence is tight. Define

*L*=sup

_{n}

*F*(

*γ*

_{n}) <

*∞*and observe that since

*x*≤ 1 +

*x*

^{2}for all \(x\in \mathbb {R}\),

*n*

*γ*

_{n}) is a tight sequence. We may assume that

*γ*

_{n}→

*γ*weakly, then use ( 2.5) and Fatou’s lemma to obtain

*γ*is a minimiser of

*F*, and existence is established.

When \(\mathcal X\) is an infinite-dimensional Hilbert space, existence still holds under a compactness assumption. We first prove a result about the support of the Fréchet mean. At the empirical level, one can say more about the support (see Corollary 3.1.4).

### Proposition 3.2.4 (Support of Fréchet Mean)

*Let Λ be a random measure in* \(\mathcal W_2(\mathcal X)\) *and let* \(K\subseteq \mathcal X\) *be a convex closed set such that* \(\mathbb {P}[\varLambda (K)=1]=1\)*. If γ minimises F, then γ*(*K*) = 1.

### Remark 3.2.5

*For any closed* \(K\subseteq \mathcal X\) *and any α *∈ [0, 1]*, the set* \(\{\varLambda \in \mathcal W_p(\mathcal X):\varLambda (K)\ge \alpha \}\) *is closed in* \(\mathcal W_p(\mathcal X)\) *because* \(\{\varLambda \in P(\mathcal X):\varLambda (K)\ge \alpha \}\) *is weakly closed by the portmanteau lemma (Lemma* *1.7.1*).

The proof amounts to a simple projection argument; see page 79 in the supplement.

### Corollary 3.2.6

*If there exists a compact convex K satisfying the hypothesis of Proposition* 3.2.4 *, then the Fréchet functional admits a minimiser supported on K.*

### Proof

Proposition 3.2.4 allows us to restrict the domain of *F* to \(\mathcal W_2(K)\), the collection of probability measures supported on *K*. Since this set is compact in \(\mathcal W_2(\mathcal X)\) (Corollary 2.2.5), the result follows from continuity of *F*.

From the convexity (3.2), one obtains a simple criterion for uniqueness. See Definition 1.6.4 for absolute continuity in infinite dimensions.

### Proposition 3.2.7 (Uniqueness of Fréchet Means)

*Let Λ be a random measure in* \(\mathcal W_2(\mathcal X)\) *with finite Fréchet functional. If Λ is absolutely continuous with positive (inner) probability, then the Fréchet mean of Λ is unique (if it exists).*

### Remark 3.2.8

*It is not obvious that the set of absolutely continuous measures is measurable in* \(\mathcal W_2(\mathcal X)\) *. We assume that there exists a Borel set* \(A\subset \mathcal W_2(\mathcal X)\) *such that* \(\mathbb {P}(\varLambda \in A)>0\) *and all measures in A are absolutely continuous.*

### Proof

*F*is convex on \(\mathcal W_2(\mathcal X)\) with respect to linear interpolants. From Proposition 3.1.6, we conclude that

*F*was already shown to be weakly convex in any case, it follows that

We state without proof an important consistency result (Le Gouic and Loubes [87, Theorem 3]). Since \(\mathcal W_2(\mathcal X)\) is a complete and separable metric space, we can define the “second degree” Wasserstein space \(\mathcal W_2(\mathcal W_2(\mathcal X))\). The law of a random measure *Λ* is in \(\mathcal W_2(\mathcal W_2(\mathcal X))\) if and only if the corresponding Fréchet functional is finite.

### Theorem 3.2.9 (Consistency of Fréchet Means)

*Let Λ*_{n}, *Λ be random measures in* \(\mathcal W_2(\mathbb {R}^d)\) *with finite Fréchet functionals and laws* \(\mathbb {P}_n,\mathbb {P}\in \mathcal W_2(\mathcal W_2(\mathbb {R}^d))\)*. If* \(\mathbb {P}_n\to \mathbb {P}\) *in* \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\)*, then any sequence λ*_{n} *of Fréchet means of Λ*_{n} *has a W*_{2}*-limit point λ, which is a Fréchet mean of Λ.*

See the Bibliographical Notes for a more general formulation.

### Corollary 3.2.10 (Wasserstein Law of Large Numbers)

*Let Λ be a random measure in* \(\mathcal W_2(\mathbb {R}^d)\) *with finite Fréchet functional and let Λ*_{1}, … *be a sample from Λ. Assume λ is the unique Fréchet mean of Λ (see Proposition* 3.2.7)*. Then almost surely, the sequence of empirical Fréchet means of Λ*_{1}, …, *Λ*_{n} *converges to λ.*

### Proof

Let \(\mathbb {P}\) be the law of *Λ* and let \(\mathbb {P}_n\) be its empirical counterpart (a random element in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\). Like in the proof of Proposition 2.2.6 (with \(\mathcal X\) replaced by the complete separable metric space \(\mathcal W_2(\mathbb {R}^d)\)), almost surely \(\mathbb {P}_n\to \mathbb {P}\) in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\) and Theorem 3.2.9 applies.

Under a compactness assumption, one can give a direct proof for the law of large numbers as in Theorem 3.1.5. This is done on page 80 in the supplement.

### 3.2.2 The One-Dimensional Case

As a generalisation of the empirical version, we have:

### Theorem 3.2.11 (Fréchet Means in \(\mathcal W_2(\mathbb {R})\))

*Let Λ be a random measure in* \(\mathcal W_2(\mathbb {R})\) *with finite Fréchet functional. Then the Fréchet mean of Λ is the unique measure λ with quantile function* \(F_\lambda ^{-1}(t)=\mathbb {E} F_\varLambda ^{-1}(t)\)*, t *∈ (0, 1).

### Proof

Since *L*_{2}(0, 1) is a Hilbert space, the random element \(F_\varLambda ^{-1}\in L_2(0,1)\) has a unique Fréchet mean *g* ∈ *L*_{2}(0, 1), defined by the relations \({\left \langle {g},{f}\right \rangle } =\mathbb {E}{\left \langle {F_\varLambda ^{-1}},{f}\right \rangle } \) for all *f* ∈ *L*_{2}(0, 1). On page 80 of the supplement, we show that *g* can be identified with \(F_\lambda ^{-1}\).

Interestingly, no regularity is needed in order for the Fréchet mean to be unique. This is not the case for higher dimensions, see Proposition 3.2.7. If there is some regularity, then one can state Theorem 3.2.11 in terms of optimal maps, because \(F_\lambda ^{-1}\) is the optimal map from Leb|_{[0,1]} to *Λ*. If \(\gamma \in \mathcal W_2(\mathbb {R})\) is any absolutely continuous (or even just continuous) measure, then Theorem 3.2.11 can be stated as follows: the Fréchet mean of *Λ* is the measure \([\mathbb {E}{\mathbf t_{\gamma }^{\varLambda }}]\#\gamma \). A generalisation of this result to compatible measures (Definition 2.3.1) can be carried out in the same way, since compatible measures are imbedded in a Hilbert space, using the Bochner integrals for the definition of the expected optimal maps (see Sect. 2.4).

### 3.2.3 Differentiability of the Population Fréchet Functional

### Proposition 3.2.12 (Slope Bounds)

*Let θ*

_{0}

*, Λ, and θ be probability measures with θ*

_{0}

*absolutely continuous, and set δ*=

*W*

_{2}(

*θ*,

*θ*

_{0})

*. Then*

*where u is defined by* (3.4)*. If the measures are compatible in the sense of Definition* *2.3.1**, then u*(*θ*, *Λ*) =* δ*∕2.

The proof is a slight variation of Ambrosio et al. [12, Theorem 10.2.2 and Proposition 10.2.6], and the details are given on page 81 of the supplement.

### Theorem 3.2.13 (Population Fréchet Gradient)

*Let Λ be a random measure with finite Fréchet functional F. Then F is Fréchet-differentiable at any absolutely continuous θ*

_{0}

*in the Wasserstein space, and*\(F'(\theta _0)=\mathbb {E}{\mathbf t_{\theta _0}^{\varLambda }}-\mathbf i\in \mathcal {L}_2(\theta _0)\)

*. More precisely,*

*Thus, the Fréchet derivative of F can be identified with the map* \(-(\mathbb {E}{\mathbf t_{\theta _0}^{\varLambda }}-\mathbf i)\) *in the tangent space at θ*_{0}*, a subspace of* \(\mathcal {L}_2(\theta _0)\).

### Proof

*u*(

*θ*,

*Λ*) defined by (3.4). Then for all

*Λ*,

*u*(

*θ*,

*Λ*) → 0 as

*W*

_{2}(

*θ*,

*θ*

_{0}) → 0, by the differentiability properties established above. Let us show that \(\mathbb {E} u(\theta ,\varLambda )\to 0\) as well. By Proposition 3.2.12, the expectation of

*u*is bounded above by a constant that does not depend on

*Λ*, and below by the negative of

*Λ*because the Fréchet functional is finite. The dominated convergence theorem yields

### Proposition 3.2.14

*Let Λ be a random measure in* \(\mathcal W_2(\mathcal X)\) *with finite Fréchet functional F, and let γ be absolutely continuous in* \(\mathcal W_2(\mathcal X)\) *. Then γ is a Karcher mean of Λ if and only if* \(\mathbb {E}{\mathbf t_{\gamma }^{\varLambda }}-\mathbf i=0\) *in* \(\mathcal {L}_2(\gamma )\) *. Furthermore, if γ is a Fréchet mean of Λ, then it is also a Karcher mean.*

The characterisation of Karcher means follows immediately from Theorem 3.2.13. The other statement is that the derivative vanishes at the minimum, which is fairly obvious intuitively; see page 82 in the supplement.

## 3.3 Bibliographical Notes

Proposition 3.1.2 is essentially due to Agueh and Carlier [2, Proposition 4.2], who show it on \(\mathbb {R}^d\) (see also Zemel and Panaretos [134, Theorem 2]). An earlier result in a compact setting can be found in Carlier and Ekeland [33]. The formulation given here is from Masarotto et al. [91]. A more general version is provided by Le Gouic and Loubes [87, Theorem 8].

Lemmata 3.1.11 and 3.1.12 are from [135], but were known earlier (e.g., Bonneel et al. [30]).

Proposition 3.1.6 is a simplified version of Álvarez-Esteban et al. [8, Theorem 2.8] (see [8, Corollary 2.9]).

Propositions 3.2.3 and 3.2.7 are from Bigot and Klein [22], who also show the law of large numbers (Corollary 3.2.10) and deal with the one-dimensional setup (Theorem 3.2.11) in a compact setting. Section 2.4 appears to be new, but see the discussion in its beginning for other measurability results.

Barycentres can be defined for any *p* ≥ 1 as the measures minimising \(\mu \mapsto \mathbb {E} W_p^p(\varLambda ,\mu )\). (Strictly speaking, these are not Fréchet means unless *p* = 2.) Le Gouic and Loubes [87] show Proposition 3.2.3 and Theorem 3.2.9 in this more general setup, where \(\mathbb {R}^d\) can be replaced by any separable locally compact geodesic space.

## Footnotes

- 1.It should be remarked that this is a Hilbertian property (or at least a property linked to an inner product), not merely a linear property. In other words, it does not extend to Banach spaces. As an example, let \(H=\mathbb {R}^2\) with the
*L*_{1}norm and consider the vertices (0, 0), (0, 1), and (1, 0) of the unit simplex. The mean of these is (1∕3, 1∕3) but for (*x*,*y*) in the triangle,$$\displaystyle \begin{aligned} F(x,y) =(x+y)^2 + (x+1-y)^2 + (1-x+y)^2 =2 + x^2 + y^2 + (x-y)^2 \end{aligned}$$is minimised at (0, 0).

- 2.
- 3.
The notion of Fréchet derivative is also named after Maurice Fréchet, but is not directly related to Fréchet means.

## References

- 2.M. Agueh, G. Carlier, Barycenters in the Wasserstein space. Soc. Indus. Appl. Math.
**43**(2), 904–924 (2011)MathSciNetzbMATHGoogle Scholar - 8.P.C. Álvarez-Esteban, E. del Barrio, J.A. Cuesta-Albertos, C. Matrán, Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincaré Probab. Stat.
**47**(2), 358–375 (2011)MathSciNetCrossRefGoogle Scholar - 9.P.C. Álvarez-Esteban, E. del Barrio, J.A. Cuesta-Albertos, C. Matrán, A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appl.
**441**(2), 744–762 (2016)MathSciNetCrossRefGoogle Scholar - 10.L. Ambrosio, N. Gigli, A user’s guide to optimal transport, in Modelling and Optimisation of Flows on Networks (Springer, Berlin, 2013), pp. 1–155CrossRefGoogle Scholar
- 12.L. Ambrosio, N. Gigli, G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Zürich, 2nd edn. (Springer, Berlin, 2008)Google Scholar
- 22.J. Bigot, T. Klein, Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probab. Stat.
**22**, 35–57 (2018)MathSciNetCrossRefGoogle Scholar - 25.S. Bobkov, M. Ledoux, One-Dimensional Empirical Measures, Order Statistics and Kantorovich Transport Distances, vol. 261, no. 1259 (Memoirs of the American Mathematical Society, Providence, 2019). https://doi.org/10.1090/memo/1259
- 28.E. Boissard, T. Le Gouic, J.-M. Loubes, Distribution‘s template estimate with Wasserstein metrics. Bernoulli
**21**(2), 740–759 (2015)MathSciNetCrossRefGoogle Scholar - 30.N. Bonneel, J. Rabin, G. Peyré, H. Pfister, Sliced and radon Wasserstein barycenters of measures. J. Math. Imag. Vis.
**51**(1), 22–45 (2015)MathSciNetCrossRefGoogle Scholar - 33.G. Carlier, I. Ekeland, Matching for teams. Econ. Theory
**42**(2), 397–418 (2010)MathSciNetCrossRefGoogle Scholar - 44.D. Dowson, B. Landau, The Fréchet distance between multivariate normal distributions. J. Multivar. Anal.
**12**(3), 450–455 (1982)CrossRefGoogle Scholar - 55.M. Fréchet, Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. Inst. Henri Poincaré
**10**(4), 215–310 (1948)MathSciNetzbMATHGoogle Scholar - 56.M. Fréchet, Sur la distance de deux lois de probabilité. C.R. Hebd. Seances Acad. Sci.
**244**(6), 689–692 (1957)Google Scholar - 60.W. Gangbo, A. Świȩch, Optimal maps for the multidimensional Monge–Kantorovich problem. Comm. Pure Appl. Math.
**51**(1), 23–45 (1998)Google Scholar - 78.H. Karcher, Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math.
**30**(5), 509–541 (1977)MathSciNetCrossRefGoogle Scholar - 87.T. Le Gouic, J.-M. Loubes, Existence and consistency of Wasserstein barycenters. Prob. Theory Relat. Fields
**168**(3–4), 901–917 (2017)MathSciNetzbMATHGoogle Scholar - 91.V. Masarotto, V.M. Panaretos, Y. Zemel, Procrustes metrics on covariance operators and optimal transportation of Gaussian processes. Sankhyā A
**81**, 172–213 (2019) (Invited Paper, Special Issue on Statistics on non-Euclidean Spaces and Manifolds)Google Scholar - 102.B. Pass, Optimal transportation with infinitely many marginals. J. Funct. Anal.
**264**(4), 947–963 (2013)MathSciNetCrossRefGoogle Scholar - 134.Y. Zemel, V.M. Panaretos, Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli
**25**(2), 932–976 (2019)MathSciNetCrossRefGoogle Scholar - 135.Y. Zemel, V.M. Panaretos, Supplement to “Fréchet means and Procrustes analysis in Wasserstein space” (2019)Google Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.