Advertisement

Fréchet Means in the Wasserstein Space \(\mathcal W_2\)

  • Victor M. Panaretos
  • Yoav Zemel
Open Access
Chapter
  • 262 Downloads
Part of the SpringerBriefs in Probability and Mathematical Statistics book series (SBPMS )

Abstract

If H is a Hilbert space (or a closed convex subspace thereof) and x1, …, xN ∈ H, then the empirical mean \(\overline x_N=N^{-1}\sum x_i\) is the unique element of H that minimises the sum of squared distances from the xi’s.

If H is a Hilbert space (or a closed convex subspace thereof) and x1, …, xN ∈ H, then the empirical mean \(\overline x_N=N^{-1}\sum x_i\) is the unique element of H that minimises the sum of squared distances from the xi’s.1 That is, if we define
$$\displaystyle \begin{aligned} F(\theta) =\sum_{i=1}^N \|\theta - x_i\|{}^2 ,\qquad \theta\in H, \end{aligned}$$
then \(\theta =\overline x_N\) is the unique minimiser of F. This is easily seen by “opening the squares” and writing
$$\displaystyle \begin{aligned} F(\theta) =F(\overline x_N) + N\|\theta - \overline x_N\|{}^2. \end{aligned}$$
The concept of a Fréchet mean (Fréchet [55]) generalises the notion of mean to a more general metric space by replacing the usual “sum of squares” with a “sum of squared distances”, giving rise to the so-called Fréchet functional. A closely related notion is that of a Karcher mean (Karcher [78]), a term that describes stationary points of the sum of squares functional, when the latter is differentiable (see Sect. 3.1.6). Population versions of Fréchet means, assuming the space is endowed with a probability law, can also be defined, replacing summation by expectation with respect to that law.

Fréchet means are perhaps the most basic object of statistical interest, and this chapter studies such means when the underlying space is the Wasserstein space \(\mathcal W_2\). In general, existence and uniqueness of a Fréchet mean can be subtle, but we will see that the nature of optimal transport allows for rather clean statements in the case of Wasserstein space.

3.1 Empirical Fréchet Means in \(\mathcal W_2\)

3.1.1 The Fréchet Functional

As foretold in the preceding paragraph, the definition of a Fréchet mean requires the definition of an appropriate sum-of-squares functional, the Fréchet functional:

Definition 3.1.1 (Empirical Fréchet Functional and Mean)

The Fréchet functi- onal associated with measures \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) is
$$\displaystyle \begin{aligned} F:\mathcal W_2(\mathcal X)\to\mathbb{R} \qquad F(\gamma) =\frac 1{2N} \sum_{i=1}^N W_2^2(\gamma,\mu^i), \qquad \gamma\in \mathcal W_2(\mathcal X). \end{aligned} $$
(3.1)

A Fréchet mean of (μ1, …, μN) is a minimiser of F in \(\mathcal W_2(\mathcal X)\) (if it exists).

In analysis, a Fréchet mean is often called a barycentre. We shall use the terminology of “Fréchet mean” that is arguably more popular in statistics.2

The factor 1∕(2N) is irrelevant for the definition of Fréchet mean. It is introduced in order to have simpler expressions for the derivatives (Theorems 3.1.14 and 3.2.13) and to be compatible with a population version \(\mathbb {E} W_2^2(\gamma ,\varLambda )/2\) (see (3.3)).

The first reference that deals with empirical Fréchet means in \(\mathcal W_2(\mathbb {R}^d)\) is the seminal paper of Agueh and Carlier [2]. They treat the more general weighted Fréchet functional
$$\displaystyle \begin{aligned} F(\gamma) =\frac 1{2} \sum_{i=1}^Nw_iW_2^2(\gamma,\mu^i), \qquad 0\le w_i, \quad \sum_{i=1}^Nw_i=1, \end{aligned}$$
but, for simplicity, we shall focus on the case of equal weights. (If all the wi’s are rational, then the weighted functional can be encompassed in (3.1) by taking some of the μi’s to be the same. The case of irrational wi’s is then treated with continuity arguments. Moreover, (3.3) encapsulates (3.1) as well as the weighted version when Λ can take finitely many values.)

3.1.2 Multimarginal Formulation, Existence, and Continuity

In [60], Gangbo and Świȩch consider the following multimarginal Monge– Kantorovich problem. Let μ1, …, μN be N measures in \(\mathcal W_2(\mathcal X)\) and let Π(μ1, …, μN) be the set of probability measures in \(\mathcal X^N\) having \(\{\mu ^i\}_{i=1}^N\) as marginals. The problem is to minimise
$$\displaystyle \begin{aligned} G(\pi) =\frac 1 {2N^2} {\int_{\mathcal X^N} \! \sum_{i<j} \|x_i - x_j\|{}^2 \, \mathrm{d}\pi(x_1,\dots,x_N)} ,\qquad \mathrm{over}\quad \pi\in \varPi(\mu^1,\dots,\mu^N). \end{aligned}$$
The factor 1∕(2N2) is of course irrelevant for the minimisation and its purpose will be clarified shortly. If N = 2, we obtain the Kantorovich problem with quadratic cost. The probabilistic interpretation (as in Sect.  1.2) is that one is given random variables X1, …, XN with marginal probability laws μ1, …, μN and seeks to construct a random vector Y = (Y1, …, YN) on \(\mathcal X^N\) such that \(X_i\stackrel {d}{=}Y_i\) and
$$\displaystyle \begin{aligned} \frac 1 {2N^2} \mathbb{E} \sum_{i<j}\|Y_i - Y_j\|{}^2\leq \frac 1 {2N^2} \mathbb{E} \sum_{i<j}\|Z_i - Z_j\|{}^2. \end{aligned}$$
for any other random vector Z = (Z1, …, ZN) such that \(X_i\stackrel {d}{=}Z_i\). Intuitively, we seek a random vector with prescribed marginals but maximally correlated entries.

We refer to elements of Π(μ1, …, μN) (equivalently, joint laws of X1, …, XN) as multicouplings (of μ1, …, μN). Just like in the Kantorovich problem, there always exists an optimal multicoupling π.

Let us now show how the multimarginal problem is equivalent to the problem of finding the Fréchet mean of μ1, …, μN. The first thing to observe is that the objective function can be written as
$$\displaystyle \begin{aligned} G(\pi) ={\int_{\mathcal X^N} \! \frac 1{2N} \sum_{i=1}^N \|x_i - M(x)\|{}^2 \, \mathrm{d}\pi(x)} ,\qquad M(x)=M(x_1,\dots,x_n)=\frac 1N \sum_{i=1}^N x_i. \end{aligned}$$
The next result shows that the Fréchet mean and the multicoupling problems are essentially the same.

Proposition 3.1.2 (Fréchet Means and Multicouplings)

Let \(\mu ^1\!,\dots ,\mu ^N\!\in \mathcal W(\mathcal X)\). Then μ is a Fréchet mean of (μ1, …, μN) if and only if there exists an optimal multicoupling \(\pi \in \mathcal W(\mathcal X^N)\) of (μ1, …, μN) such that μ = M#π, and furthermore F(μ) = G(π).

Proof

Let π be an arbitrary multicoupling of (μ1, …, μN) and set μ = M#π. Then (xxi, M) is a coupling of μi and μ, and therefore
$$\displaystyle \begin{aligned} {\int_{\mathcal X^N} \! \|x_i - M(x)\|{}^2 \, \mathrm{d}\pi(x)} \ge W^2(\mu,\mu_i). \end{aligned}$$
Summation over i gives F(μ) ≤ G(π) and so \(\inf F\le \inf G\).

For the other inequality, let \(\mu \in \mathcal W(\mathcal X)\) be arbitrary. For each i, let πi be an optimal coupling between μ and μi. Invoking the gluing lemma (Ambrosio and Gigli [10, Lemma 2.1]), we may glue all πi’s using their common marginal μ. This procedure constructs a measure η on \(\mathcal X^{N+1}\) with marginals μ1, …, μN, μ and its relevant projection π is then a multicoupling of μ1, …, μN.

Since \(\mathcal X\) is a Hilbert space, the minimiser of \(y\mapsto \sum \|x_i - y\|{ }^2\) is y = M(x). Thus
$$\displaystyle \begin{aligned} F(\mu) {\kern-1pt}={\kern-1pt} \frac 1{2N} {\int_{\mathcal X^{N+1}} \! \sum_{i=1}^N \|x_i {\kern-1pt}-{\kern-1pt} y\|{}^2 \, \mathrm{d}\eta(x,y)} {\ge} \frac 1{2N} {\int_{\mathcal X^{N+1}} \! \sum_{i=1}^N \|x_i {\kern-1pt}-{\kern-1pt} M(x)\|{}^2 \, \mathrm{d}\eta(x,y)} {\kern-1pt}={\kern-1pt} G(\pi). \end{aligned}$$
In particular, \(\inf F\ge \inf G\) and combining this with the established converse inequality we see that \(\inf F=\inf G\). Observe also that the last displayed inequality holds as equality if and only if y = M(x) η-almost surely, in which case μ = M#π. Therefore, if μ does not equal M#π, then F(μ) > G(π) ≥ F(M#π), and μ cannot be optimal. Finally, if π is optimal, then
$$\displaystyle \begin{aligned} F(M\#\pi) \le G(\pi) =\inf G =\inf F \end{aligned}$$
establishing optimality of μ = M#π and completing the proof.

Since optimal couplings exist, we deduce that so do Fréchet means.

Corollary 3.1.3 (Fréchet Means and Moments)

Any finite collection of measures \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) admits a Fréchet mean μ, for all p ≥ 1
$$\displaystyle \begin{aligned} {\int_{\mathcal X} \! \|x\|{}^p \, \mathrm{d}\mu(x)} \le \frac 1N \sum_{i=1}^N {\int_{\mathcal X} \! \|x\|{}^p \, \mathrm{d}\mu^i(x)}, \end{aligned}$$

and when p > 1 equality holds if and only if μ1 = ⋯ = μN.

Proof

Let π be a multicoupling of μ1, …, μN such that μ = MN (Proposition 3.1.2). Then
$$\displaystyle \begin{aligned} {\int_{\mathcal X} \! \|x\|{}^p \, \mathrm{d}\mu(x)} &={\int_{\mathcal X^N} \! \left\|\frac 1N \sum_{i=1}^N x_i\right\|{}^p \, \mathrm{d}\pi(x)} \le \frac 1N \sum_{i=1}^N {\int_{\mathcal X^N} \! \|x_i\|{}^p \, \mathrm{d}\pi(x)}\\ &=\frac 1N \sum_{i=1}^N {\int_{\mathcal X} \! \|x\|{}^p \, \mathrm{d}\mu^i(x)}. \end{aligned} $$
The statement about equality follows from strict convexity of x↦∥xp if p > 1.

A further corollary of Proposition 3.1.2 is a bound on the support:

Corollary 3.1.4

The support of any Fréchet mean is included in the set
$$\displaystyle \begin{aligned} \frac{\mathrm{supp}\mu^1+\dots+\mathrm{supp}\mu^N}N =\left\{\frac{x_1+\dots+x_N}N:x_i\in\mathrm{supp}\mu^i\right\} \subseteq\mathrm{conv} \left(\bigcup_{i=1}^N \mathrm{supp} \mu^i\right). \end{aligned}$$

In particular, if all the μ i ’s are supported on a common convex set K, then so is any of their Fréchet means.

The multimarginal formulation also yields a continuity property for the empirical Fréchet mean. Conditions for uniqueness will be given in the next subsection.

Theorem 3.1.5 (Continuity of Fréchet Means)

Suppose that \(W_2(\mu _k^i,\mu ^i)\to 0\) for i = 1, …, N and let \(\overline \mu _k\) denote any Fréchet mean of \((\mu _k^1,\dots ,\mu _k^N)\). Then \((\overline \mu _k)\) stays in a compact set of \(\mathcal W_2(\mathcal X)\), and any limit point is a Fréchet mean of (μ1, …, μN).

In particular, if μ1, …, μN have a unique Fréchet mean \(\overline \mu \), then \(\overline \mu _k\to \overline \mu \) in \(\mathcal W_2(\mathcal X)\).

Proof

We sketch the steps of the proof here, with the full details given on page 70 of the supplement.

Step 1: tightness of \((\overline \mu _k)\). This is true because the collection of multicouplings is tight, and the mean function M is continuous.

Step 2: weak limits are limits in \(\mathcal W_2(\mathcal X)\). This holds because the mean function has linear growth.

Step 3: the limit is a Fréchet mean of (μ1, …, μN). From Corollary 3.1.3, it follows that \(\overline \mu _k\) must be sought on some fixed bounded set in \(\mathcal W_2(\mathcal X)\). On such sets, the Fréchet functionals are uniformly Lipschitz, so their minimisers converge as well.

3.1.3 Uniqueness and Regularity

A general situation in which Fréchet means are unique is when the Fréchet functional is strictly convex. In the Wasserstein space, this requires some regularity, but weak convexity holds in general. Absolutely continuous measures on infinite-dimensional \(\mathcal X\) are defined in Definition  1.6.4.

Proposition 3.1.6 (Convexity of the Fréchet Functional)

Let \(\varLambda ,\gamma _i\in \mathcal W_2(\mathcal X)\) and t ∈ [0, 1]. Then
$$\displaystyle \begin{aligned} W_2^2(t\gamma_1+(1-t)\gamma_2,\varLambda) \le tW_2^2(\gamma_1,\varLambda) +(1-t)W_2^2(\gamma_2,\varLambda). \end{aligned} $$
(3.2)

When Λ is absolutely continuous, the inequality is strict unless t ∈{0, 1} or γ1 = γ2.

Remark 3.1.7

The Wasserstein distance is not convex along geodesics. That is, if we replace the linear interpolant tγ1 + (1 − t)γ2 by McCann’s interpolant, then \(t\mapsto W_2^2(\gamma _t,\varLambda )\) is not necessarily convex (Ambrosio et al. [12, Example 9.1.5]).

Proof

Let πi ∈ Π(γi, Λ) be optimal and notice that the linear interpolant 1 + (1 − t)π2 ∈ Π(1 + (1 − t)γ2, Λ), so that
$$\displaystyle \begin{aligned} W_2^2(t\gamma_1+(1-t)\gamma_2,\varLambda) \le {\int_{\mathcal X^2} \! \|x - y\|{}^2 \, \mathrm{d}[t\pi_1+(1-t)\pi_2](x,y)}, \end{aligned}$$
which is (3.2). When Λ is absolutely continuous and t ∈ (0, 1), equality in (3.2) holds if and only if \(\pi _t=t\pi _1+(1-t)\pi _2=({\mathbf t_{\varLambda }^{t\gamma _1+(1-t)\gamma _2}}\times \mathbf i)\#\varLambda \). But πt is supported on the graphs of two functions: \({\mathbf t_{\varLambda }^{\gamma _1}}\) and \({\mathbf t_{\varLambda }^{\gamma _2}}\). Consequently, equality can hold only if these two maps equal Λ-almost surely, or, equivalently, if γ1 = γ2.

As a corollary, we deduce that the Fréchet mean is unique if one of the measures μi is absolutely continuous, and this extends to the population version (see Proposition 3.2.7).

We conclude this subsection by stating an important regularity property in the Euclidean case. See Agueh and Carlier [2, Proposition 5.1] for a proof.

Proposition 3.1.8 (L-Regularity of Fréchet Means)

Let \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R}^d)\) and suppose that μ1 is absolutely continuous with density bounded by M. Then the Fréchet mean of {μi} is absolutely continuous with density bounded by NdM and is consequently a Karcher mean.

In Theorem  5.5.2, we extend Proposition 3.1.8 to the population level.

3.1.4 The One-Dimensional and the Compatible Case

When \(\mathcal X=\mathbb {R}\), there is a simple expression for the Fréchet mean because \(\mathcal W_2(\mathbb {R})\) can be imbedded in a Hilbert space. Indeed, recall that
$$\displaystyle \begin{aligned} W_2(\mu,\nu) =\|F_\mu^{-1} - F_\nu^{-1}\|{}_{L_2(0,1)} \end{aligned}$$
(see Sect.  2.3.2 or  1.5). In view of that, \(\mathcal W_2(\mathbb {R})\) can be seen as the convex closed subset of L2(0, 1) formed by equivalence classes of left-continuous nondecreasing functions on (0, 1): any quantile function is left-continuous and nondecreasing, and any such function G can be seen to be the inverse function of the distribution function, the right-continuous inverse of G
$$\displaystyle \begin{aligned} F(x) =\inf\{t\in(0,1):G(t)>x\} =\sup\{t\in (0,1):G(t)\le x\}.\end{aligned} $$
(See, for example, Bobkov and Ledoux [25, Appendix A].) Therefore, the Fréchet mean of \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R})\) is the measure μ having quantile function
$$\displaystyle \begin{aligned} F_\mu^{-1} =\frac 1N \sum_{i=1}^N F_{\mu^i}.\end{aligned} $$
The Fréchet mean is thus unique. This is no longer true in higher dimension, unless some regularity is imposed on the measures (Proposition 3.2.7).
Boissard et al. [28] noticed that compatibility of μ1, …, μN according to Definition  2.3.1 allows for a simple solution to the Fréchet mean problem, as in the one-dimensional case. Recall from Proposition 3.1.2 that this is equivalent to the multimarginal problem. Returning to the original form of G, we obtain an easy lower bound for any π ∈ Π(μ1, …, μN):
$$\displaystyle \begin{aligned} G(\pi)= \frac 1 {2N^2} {\int_{\mathcal X^N} \! \sum_{i<j} \|x_i - x_j\|{}^2 \, \mathrm{d}\pi(x_1,\dots,x_N)} \ge \frac 1 {2N^2} \sum_{i<j}W_2^2(\mu^i,\mu^j), \end{aligned}$$
because the (i, j)-th marginal of π is a coupling of μi and μj. Thus, if equality above holds for π, then π is optimal and M#π is the Fréchet mean by Proposition 3.1.2. This is indeed the case for \(\pi =(\mathbf i,\mathbf t_{\mu ^1}^{\mu ^2},\dots ,\mathbf t_{\mu ^1}^{\mu ^N})\#\mu ^1\) because the compatibility gives:
$$\displaystyle \begin{aligned} {\int_{\mathcal X^N} \! \|x_i - x_j\|{}^2 \, \mathrm{d}\pi(x_1,\dots,x_N)} &={\int_{\mathcal X} \! \left\|\mathbf t_{\mu^1}^{\mu^i} - \mathbf t_{\mu^1}^{\mu^j}\right\|{}^2 \, \mathrm{d}\mu^1}\\ {} &={\int_{\mathcal X} \! \left\|\mathbf t_{\mu^1}^{\mu^i}\circ\mathbf t_{\mu^j}^{\mu^i} - \mathbf i\right\| \, \mathrm{d}\mu^j} =W_2^2(\mu^i,\mu^j). \end{aligned} $$
We may thus conclude, in a slightly more general form (γ was μ1 above):

Theorem 3.1.9 (Fréchet Mean of Compatible Measures)

Suppose that {γ, μ1, …, μN} are compatible measures. Then
$$\displaystyle \begin{aligned} \left[\frac 1N \sum_{i=1}^N {\mathbf t_{\gamma}^{\mu^i}}\right]\#\gamma \end{aligned}$$

is the Fréchet mean of (μ1, …, μN).

A population version is given in Theorem  5.5.3.

3.1.5 The Agueh–Carlier Characterisation

Agueh and Carlier [2] provide a useful sufficient condition for γ to be the Fréchet mean. When \(\mathcal X=\mathbb {R}^d\), this condition is also necessary [2, Proposition 3.8], hence characterising Fréchet means in \(\mathbb {R}^d\). It will allow us to easily deduce some equivariance results for Fréchet means with respect to independence (Lemma 3.1.11) and rotations (3.1.12). More importantly, it provides a sufficient condition under which a local minimum of F is a global minimum (Theorem 3.1.15) and the same idea can be used to relate the population Fréchet mean to the expected value of the optimal maps (Theorem  4.2.4). Recall that ϕ denotes the Legendre transform of ϕ, as defined on page 14.

Proposition 3.1.10 (Fréchet Means and Potentials)

Let \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) be absolutely continuous, let \(\gamma \in \mathcal W_2(\mathcal X)\) and denote by \(\phi ^*_i\) the convex potentials of \({\mathbf t_{\mu ^i}^{\gamma }}\) . If \(\phi _i=\phi _i^{**}\) are such that
$$\displaystyle \begin{aligned} \frac 1N \sum_{i=1}^N \phi_i(x) \le \frac 12\|x\|{}^2 ,\qquad \forall x\in\mathcal X ,\qquad \mathrm{with equality} \gamma\mathrm{-almost surely}, \end{aligned}$$

then γ is the unique Fréchet mean of μ1, …, μN.

Proof

Uniqueness follows from Proposition 3.2.7. If \(\theta \in \mathcal W_2(\mathcal X)\) is any measure, then the Kantorovich duality yields
$$\displaystyle \begin{aligned} W_2^2(\gamma,\mu^i) &= {{\int_{\mathcal X} \! {\left(\frac 12\|x\|{}^2 - \phi_i(x)\right)} \, \mathrm{d}{\gamma(x)}}} + {{\int_{\mathcal X} \! {\left(\frac 12\|y\|{}^2 - \phi^*_i(y)\right)} \, \mathrm{d}{\mu^i(y)}}} ;\\ W_2^2(\theta,\mu^i) &\ge {{\int_{\mathcal X} \! {\left(\frac 12\|x\|{}^2 - \phi_i(x)\right)} \, \mathrm{d}{\theta(x)}}} + {{\int_{\mathcal X} \! {\left(\frac 12\|y\|{}^2 - \phi^*_i(y)\right)} \, \mathrm{d}{\mu^i(y)}}} . \end{aligned} $$
Summation over i gives the result.

A population version of this result, based on similar calculations, is given in Theorem  4.2.4.

The next two results are formulated in \(\mathbb {R}^d\) because then the converse of Proposition 3.1.10 is proven to be true. If one could extend [2, Proposition 3.8] to any separable Hilbert \(\mathcal X\), then the two lemmata below will hold with \(\mathbb {R}^d\) replaced by \(\mathcal X\). The simple proofs are given on page 74 of the supplement.

Lemma 3.1.11 (Independent Fréchet Means)

Let μ1, …, μN and ν1, …, νN be absolutely continuous measures in \(\mathcal W_2(\mathbb {R}^{d_1})\) and \(\mathcal W_2(\mathbb {R}^{d_2})\) with Fréchet means μ and ν, respectively. Then the independent coupling μ  ν is the Fréchet mean of μ1 ⊗ ν1, …, μN ⊗ νN.

By induction (or a straightforward modification of the proof), one can show that the Fréchet mean of (μi ⊗ νi ⊗ ρi) is μ ⊗ ν ⊗ ρ, and so on.

Lemma 3.1.12 (Rotated Fréchet Means)

If μ is the Fréchet mean of the absolutely continuous measures μ1, …, μN and U is orthogonal, then U#μ is the Fréchet mean of U#μ1, …, U#μN.

3.1.6 Differentiability of the Fréchet Functional and Karcher Means

Since we seek to minimise the Fréchet functional F, it would be helpful if F were differentiable, because we could then find at least local minima by solving the equation F′ = 0. This observation of Karcher [78] leads to the notion of Karcher mean.

Definition 3.1.13 (Karcher Mean)

Let F be a Fréchet functional associated with some random measure Λ in \(\mathcal W_2(\mathcal X)\). Then γ is a Karcher mean for Λ if F is differentiable at γ and F′(γ) = 0.

Of course, if γ is a Fréchet mean for the random measure Λ and F is differentiable at γ, then F′(γ) must vanish. In this subsection, we build upon the work of Ambrosio et al. [12] and determine the derivative of the Fréchet functional. This will not only allow for a simple characterisation of Karcher means in terms of the optimal maps \({\mathbf t_{\gamma }^{\varLambda }}\) (Proposition 3.2.14), but will also be the cornerstone of the construction of a steepest descent algorithm for empirical calculation of Fréchet means. The differentiability holds at the population level too (Theorem 3.2.13).

It turns out that the tangent bundle structure described in Sect.  2.3 gives rise to a differentiable structure in the Wasserstein space. Fix \(\mu ^0\in \mathcal W_2(\mathcal X)\) and consider the function
$$\displaystyle \begin{aligned} F_0:\mathcal W_2(\mathcal X)\to\mathbb{R}, \qquad F_0(\gamma)=\frac 12 W_2^2(\gamma,\mu^0). \end{aligned}$$
Ambrosio et al. [12, Corollary 10.2.7] show that when γ is absolutely continuous,
$$\displaystyle \begin{aligned} \lim_{W_2(\nu,\gamma)\to0}\frac{F_0(\nu)- F_0(\gamma) +\displaystyle {\int_{\mathcal X} \! \left\langle {{\mathbf t_{\gamma}^{\mu^0}}(x)-x},{{\mathbf t_{\gamma}^{\nu}}(x)-x}\right\rangle \, \mathrm{d}\gamma(x)}} {W_2(\nu,\gamma)} =0. \end{aligned}$$
Parts of the proof of this result (the limit superior above is ≤ 0; the limit inferior is bounded below) are reproduced in Proposition 3.2.12. The integral above can be seen as the inner product
$$\displaystyle \begin{aligned} \left\langle {{\mathbf t_{\gamma}^{\mu^0}} - \mathbf i},{{\mathbf t_{\gamma}^{\nu}} - \mathbf i}\right\rangle \end{aligned}$$
in the space \(\mathcal {L}_2(\gamma )\) that includes as a (closed) subspace the tangent space Tanγ. In terms of this inner product and the log map, we can write
$$\displaystyle \begin{aligned} F_0(\nu) - F_0(\gamma) =-\left\langle {\log_{\gamma}(\mu^0)},{\log_{\gamma}(\nu)}\right\rangle +o(W_2(\nu,\gamma)), \qquad \nu\to\gamma \quad \mathrm{in} \mathcal W_2, \end{aligned}$$
so that F0 is Fréchet-differentiable3 at γ with derivative
$$\displaystyle \begin{aligned} F_0^{\prime}(\gamma) =- \log_{\gamma}(\mu^0) =-\left({\mathbf t_{\gamma}^{\mu^0}} - \mathbf i\right) \in \mathrm{Tan}_{\gamma}. \end{aligned}$$
By linearity, one immediately obtains:

Theorem 3.1.14 (Gradient of the Fréchet Functional)

Fix a collection of measures \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathcal X)\) . When \(\gamma \in \mathcal W_2(\mathcal X)\) is absolutely continuous, the Fréchet functional
$$\displaystyle \begin{aligned} F(\gamma) =\frac 1{2N} \sum_{i=1}^NW_2^2(\gamma,\mu^i), \qquad \gamma\in \mathcal W_2(\mathcal X) \end{aligned}$$
is Fréchet-differentiable and
$$\displaystyle \begin{aligned} F'(\gamma) =- \frac 1N \sum_{i=1}^N \log_{\gamma}(\mu^i) =- \frac 1N \sum_{i=1}^N \left(\mathbf t_{\gamma}^{\mu_i}-\mathbf i\right). \end{aligned}$$

It follows from this that an absolutely continuous \(\gamma \in \mathcal W_2(\mathcal X)\) is a Karcher mean if and only if the average of the optimal maps is the identity. If in addition one μi is absolutely continuous with bounded density, then the Fréchet mean \(\overline \mu \) is absolutely continuous by Proposition 3.1.8, so it is a Karcher mean. The result extends to the population version; see Proposition 3.2.14.

It may happen that a collection μ1, …, μN of absolutely continuous measures have a Karcher mean that is not a Fréchet mean; see Álvarez-Esteban et al. [9, Example 3.1] for an example in \(\mathbb {R}^2\). But a Karcher mean γ is “almost” a Fréchet mean in the following sense. By Proposition 3.2.14, \(N^{-1}\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)=x\) for γ-almost all x. If, on the other hand, the equality holds for all \(x\in \mathcal X\), then γ is the Fréchet mean by taking integrals and applying Proposition 3.1.10. One can hope that under regularity conditions, the γ-almost sure equality can be upgraded to equality everywhere. Indeed, this is the case:

Theorem 3.1.15 (Optimality Criterion for Karcher Means)

Let \(U\subseteq \mathbb {R}^d\) be an open convex set and let \(\mu ^1,\dots ,\mu ^N\in \mathcal W_2(\mathbb {R}^d)\) be probability measures on U with bounded strictly positive densities g1, …, gN. Suppose that an absolutely continuous Karcher mean γ is supported on U with bounded strictly positive density f there. Then γ is the Fréchet mean of μ1, …, μN if one of the following holds:
  1. 1.

    \(U=\mathbb {R}^d\) and the densities f, g1, …, gN are of class C0, α for some α > 0;

     
  2. 2.

    U is bounded and the densities f, g1, …, gN are bounded below on U.

     

Proof

The result exploits Caffarelli’s regularity theory for Monge–Ampère equations in the form of Theorem  1.6.7. In the first case, there exist C1 (in fact, C2, α) convex potentials φi on \(\mathbb {R}^d\) with \({\mathbf t_{\gamma }^{\mu ^i}}=\nabla \varphi _i\), so that \({\mathbf t_{\gamma }^{{\mu ^i}}}(x)\) is a singleton for all \(x\in \mathbb {R}^d\). The set \(\{x\in \mathbb {R}^d:\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)/N\ne x\}\) is γ-negligible (and hence Lebesgue negligible) and open by continuity. It is therefore empty, so F′(γ) = 0 everywhere, and γ is the Fréchet mean (see the discussion before the theorem).

In the second case, by the same argument we have \(\sum {\mathbf t_{\gamma }^{\mu ^i}}(x)/N=x\) for all x ∈ U. Since U is convex, there must exist a constant C such that \(\sum \varphi _i(x)=C+N\|x\|{ }^2/2\) for all x ∈ U, and we may assume without loss of generality that C = 0. If one repeats the proof of Proposition 3.1.10, then F(γ) ≤ F(θ) for all θ ∈ P(U). By continuity considerations, the inequality holds for all \(\theta \in P(\overline U)\) (Theorem  2.2.7) and since \(\overline U\) is closed and convex, γ is the Fréchet mean by Corollary 3.1.3.

3.2 Population Fréchet Means

In this section, we extend the notion of empirical Fréchet mean to the population level, where Λ is a random element in \(\mathcal W_2(\mathcal X)\) (a measurable mapping from a probability space to \(\mathcal W_2(\mathcal X)\)). This requires a different strategy, since it is not clear how to define the analogue of the multicouplings at that level of abstraction. However, it is important to point out that when there is more structure in Λ, multicouplings can be defined as laws of stochastic processes; see Pass [102] for a detailed account of the problem in this case.

In analogy with (3.1), we define:

Definition 3.2.1 (Population Fréchet Mean)

Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) . The Fréchet mean of Λ is the minimiser (if it exists and is unique) of the Fréchet functional
$$\displaystyle \begin{aligned} F(\gamma) =\frac 1{2} \mathbb{E} W_2^2(\gamma,\varLambda), \qquad \gamma\in \mathcal W_2(\mathcal X). \end{aligned} $$
(3.3)

Since W2 is continuous and nonnegative, the expectation is well-defined.

3.2.1 Existence, Uniqueness, and Continuity

Existence and uniqueness of Fréchet means on a general metric space M are rather delicate questions. Usually, existence proofs are easier: for example, since the Fréchet functional F is continuous on M (as we show below), one often invokes local compactness of M in order to establish existence of a minimiser. Unfortunately, a different strategy is needed when \(M=\mathcal W_2(\mathcal X)\), because the Wasserstein space is not locally compact (Proposition  2.2.9).

The first thing to notice is that F is indeed continuous (this is clear for the empirical version). This is a consequence of the triangle inequality and holds when \(\mathcal W_2(\mathcal X)\) is replaced by any metric space.

Lemma 3.2.2 (Finiteness of the Fréchet Functional)

If F is not identically infinite, then it is finite and locally Lipschitz everywhere on \(\mathcal W_2(\mathcal X)\).

Proof

Assume that F is finite at γ. If θ is any other measure in \(\mathcal W_2(\mathcal X)\), write
$$\displaystyle \begin{aligned} 2F(\gamma) - 2F(\theta) =\mathbb{E}[W_2(\gamma,\varLambda) - W_2(\theta,\varLambda)] [W_2(\gamma,\varLambda) + W_2(\theta,\varLambda)]. \end{aligned}$$
Since x ≤ 1 + x2 for all x, the triangle inequality in \(\mathcal W_2(\mathcal X)\) yields
$$\displaystyle \begin{aligned} 2|F(\gamma) - F(\theta)| &\le W_2(\gamma,\theta) [2\mathbb{E} W_2(\gamma,\varLambda) + W_2(\theta,\gamma)]\\ &\le W_2(\gamma,\theta) [2\mathbb{E} W_2^2(\gamma,\varLambda) + 2 + W_2(\theta,\gamma)]. \end{aligned} $$
Since F(γ) < , this shows that F is finite everywhere and the right-hand side vanishes as θ → γ in \(\mathcal W_2(\mathcal X)\). Now that we know that F is continuous, the same upper bound shows that it is in fact locally Lipschitz.
Example: let (an) be a sequence of positive numbers that sum up to one. Let xn = 1∕an and suppose that Λ equals \(\delta \{x_n\}\in \mathcal W_2(\mathbb {R})\) with probability an. Then
$$\displaystyle \begin{aligned} \mathbb{E} W_2^2(\varLambda,\delta_0) =\sum_{n=1}^\infty a_nx_n^2 =\sum_{n=1}^\infty 1/a_n =\infty, \end{aligned}$$
and by Lemma 3.2.2 F is identically infinite. Henceforth, we say that F is finite when the condition in Lemma 3.2.2 holds.

Using the lower semicontinuity ( 2.5), one can prove existence on \(\mathbb {R}^d\) rather easily. (The empirical means exist even in infinite dimensions by Corollary 3.1.3.)

Proposition 3.2.3 (Existence of Fréchet Means)

The Fréchet functional associated with any random measure Λ in \(\mathcal W_2(\mathbb {R}^d)\) admits a minimiser.

Proof

The assertion is clear if F is identically infinite. Otherwise, let (γn) be a minimising sequence. We wish to show that the sequence is tight. Define L =supnF(γn) <  and observe that since x ≤ 1 + x2 for all \(x\in \mathbb {R}\),
$$\displaystyle \begin{aligned} \mathbb{E} W_2(\gamma_n,\varLambda) \le 1+\mathbb{E} W_2^2(\gamma_n,\varLambda) \le 2L+1 ,\qquad n=1,2,\dots. \end{aligned}$$
By the triangle inequality
$$\displaystyle \begin{aligned} L' =\mathbb{E} W_2(\delta_0,\varLambda) \le W_2(\delta_0,\gamma_1) + \mathbb{E} W_2(\gamma_1,\varLambda) \le W_2(\delta_0,\gamma_1) + 2L + 1 \end{aligned}$$
so that for all n
$$\displaystyle \begin{aligned} \left( {\int_{\mathbb{R}^d} \! \|x\|{}^2 \, \mathrm{d}\gamma_n(x)} \right)^{1/2} =W_2(\gamma_n,\delta_0) \le \mathbb{E} W_2(\gamma_n,\varLambda) + \mathbb{E} W_2(\varLambda,\delta_0) \le 2L+1 + L' <\infty. \end{aligned}$$
Since closed and bounded sets in \(\mathbb {R}^d\) are compact, it follows that (γn) is a tight sequence. We may assume that γn → γ weakly, then use ( 2.5) and Fatou’s lemma to obtain
$$\displaystyle \begin{aligned} 2F(\gamma)= \mathbb{E} W_2^2(\gamma,\varLambda) \le \mathbb{E} \liminf_{n\to\infty} W_2^2(\gamma_n,\varLambda) \le \liminf_{n\to\infty} \mathbb{E} W_2^2(\gamma_n,\varLambda) =2\inf F. \end{aligned}$$
Thus, γ is a minimiser of F, and existence is established.

When \(\mathcal X\) is an infinite-dimensional Hilbert space, existence still holds under a compactness assumption. We first prove a result about the support of the Fréchet mean. At the empirical level, one can say more about the support (see Corollary 3.1.4).

Proposition 3.2.4 (Support of Fréchet Mean)

Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) and let \(K\subseteq \mathcal X\) be a convex closed set such that \(\mathbb {P}[\varLambda (K)=1]=1\). If γ minimises F, then γ(K) = 1.

Remark 3.2.5

For any closed \(K\subseteq \mathcal X\) and any α ∈ [0, 1], the set \(\{\varLambda \in \mathcal W_p(\mathcal X):\varLambda (K)\ge \alpha \}\) is closed in \(\mathcal W_p(\mathcal X)\) because \(\{\varLambda \in P(\mathcal X):\varLambda (K)\ge \alpha \}\) is weakly closed by the portmanteau lemma (Lemma  1.7.1).

The proof amounts to a simple projection argument; see page 79 in the supplement.

Corollary 3.2.6

If there exists a compact convex K satisfying the hypothesis of Proposition 3.2.4 , then the Fréchet functional admits a minimiser supported on K.

Proof

Proposition 3.2.4 allows us to restrict the domain of F to \(\mathcal W_2(K)\), the collection of probability measures supported on K. Since this set is compact in \(\mathcal W_2(\mathcal X)\) (Corollary  2.2.5), the result follows from continuity of F.

From the convexity (3.2), one obtains a simple criterion for uniqueness. See Definition  1.6.4 for absolute continuity in infinite dimensions.

Proposition 3.2.7 (Uniqueness of Fréchet Means)

Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) with finite Fréchet functional. If Λ is absolutely continuous with positive (inner) probability, then the Fréchet mean of Λ is unique (if it exists).

Remark 3.2.8

It is not obvious that the set of absolutely continuous measures is measurable in \(\mathcal W_2(\mathcal X)\) . We assume that there exists a Borel set \(A\subset \mathcal W_2(\mathcal X)\) such that \(\mathbb {P}(\varLambda \in A)>0\) and all measures in A are absolutely continuous.

Proof

By taking expectations in (3.2), one sees that F is convex on \(\mathcal W_2(\mathcal X)\) with respect to linear interpolants. From Proposition 3.1.6, we conclude that
$$\displaystyle \begin{aligned} \varLambda \mathrm{ absolutely continuous} \quad \Longrightarrow\quad \gamma\mapsto \frac 12W_2^2(\gamma,\varLambda) \mathrm{ strictly convex}. \end{aligned}$$
As F was already shown to be weakly convex in any case, it follows that
$$\displaystyle \begin{aligned} \mathbb{P}(\varLambda \mathrm{ absolutely continuous})>0 \quad \Longrightarrow\quad F\mathrm{ strictly convex}. \end{aligned}$$
Since strictly convex functionals have at most one minimiser, this completes the proof.

We state without proof an important consistency result (Le Gouic and Loubes [87, Theorem 3]). Since \(\mathcal W_2(\mathcal X)\) is a complete and separable metric space, we can define the “second degree” Wasserstein space \(\mathcal W_2(\mathcal W_2(\mathcal X))\). The law of a random measure Λ is in \(\mathcal W_2(\mathcal W_2(\mathcal X))\) if and only if the corresponding Fréchet functional is finite.

Theorem 3.2.9 (Consistency of Fréchet Means)

Let Λn, Λ be random measures in \(\mathcal W_2(\mathbb {R}^d)\) with finite Fréchet functionals and laws \(\mathbb {P}_n,\mathbb {P}\in \mathcal W_2(\mathcal W_2(\mathbb {R}^d))\). If \(\mathbb {P}_n\to \mathbb {P}\) in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\), then any sequence λn of Fréchet means of Λn has a W2-limit point λ, which is a Fréchet mean of Λ.

See the Bibliographical Notes for a more general formulation.

Corollary 3.2.10 (Wasserstein Law of Large Numbers)

Let Λ be a random measure in \(\mathcal W_2(\mathbb {R}^d)\) with finite Fréchet functional and let Λ1, … be a sample from Λ. Assume λ is the unique Fréchet mean of Λ (see Proposition 3.2.7). Then almost surely, the sequence of empirical Fréchet means of Λ1, …, Λn converges to λ.

Proof

Let \(\mathbb {P}\) be the law of Λ and let \(\mathbb {P}_n\) be its empirical counterpart (a random element in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\). Like in the proof of Proposition  2.2.6 (with \(\mathcal X\) replaced by the complete separable metric space \(\mathcal W_2(\mathbb {R}^d)\)), almost surely \(\mathbb {P}_n\to \mathbb {P}\) in \(\mathcal W_2(\mathcal W_2(\mathbb {R}^d))\) and Theorem 3.2.9 applies.

Under a compactness assumption, one can give a direct proof for the law of large numbers as in Theorem 3.1.5. This is done on page 80 in the supplement.

3.2.2 The One-Dimensional Case

As a generalisation of the empirical version, we have:

Theorem 3.2.11 (Fréchet Means in \(\mathcal W_2(\mathbb {R})\))

Let Λ be a random measure in \(\mathcal W_2(\mathbb {R})\) with finite Fréchet functional. Then the Fréchet mean of Λ is the unique measure λ with quantile function \(F_\lambda ^{-1}(t)=\mathbb {E} F_\varLambda ^{-1}(t)\), t ∈ (0, 1).

Proof

Since L2(0, 1) is a Hilbert space, the random element \(F_\varLambda ^{-1}\in L_2(0,1)\) has a unique Fréchet mean g ∈ L2(0, 1), defined by the relations \({\left \langle {g},{f}\right \rangle } =\mathbb {E}{\left \langle {F_\varLambda ^{-1}},{f}\right \rangle } \) for all f ∈ L2(0, 1). On page 80 of the supplement, we show that g can be identified with \(F_\lambda ^{-1}\).

Interestingly, no regularity is needed in order for the Fréchet mean to be unique. This is not the case for higher dimensions, see Proposition 3.2.7. If there is some regularity, then one can state Theorem 3.2.11 in terms of optimal maps, because \(F_\lambda ^{-1}\) is the optimal map from Leb|[0,1] to Λ. If \(\gamma \in \mathcal W_2(\mathbb {R})\) is any absolutely continuous (or even just continuous) measure, then Theorem 3.2.11 can be stated as follows: the Fréchet mean of Λ is the measure \([\mathbb {E}{\mathbf t_{\gamma }^{\varLambda }}]\#\gamma \). A generalisation of this result to compatible measures (Definition  2.3.1) can be carried out in the same way, since compatible measures are imbedded in a Hilbert space, using the Bochner integrals for the definition of the expected optimal maps (see Sect.  2.4).

3.2.3 Differentiability of the Population Fréchet Functional

We now use the Fubini result (Proposition  2.4.9) in order to extend the differentiability of the Fréchet functional to the population version. This will follow immediately if we can interchange the expectation and the derivative in the form
$$\displaystyle \begin{aligned} F'(\gamma) =\frac 12 (\mathbb{E} W_2^2)'(\gamma,\varLambda) =\mathbb{E} \left(\frac 12 W_2^2\right)'(\gamma,\varLambda) =-\mathbb{E} ({\mathbf t_{\gamma}^{\varLambda}} - \mathbf i). \end{aligned}$$
In order to do this, we will use dominated convergence in conjunction with uniform bounds on the slopes
$$\displaystyle \begin{aligned} u(\theta,\varLambda) {\kern-1pt}={\kern-1pt} \frac{0.5W_2^2(\theta,\varLambda) {\kern-1pt}-{\kern-1pt} 0.5W_2^2(\theta_0,\varLambda) {\kern-1pt}+{\kern-1pt} {{\int_{\mathcal X} \! {\langle\mathbf t_{\theta_0}^\varLambda {\kern-1pt}-{\kern-1pt} i,\mathbf t_{\theta_0}^\theta {\kern-1pt}-{\kern-1pt} i\rangle} \, \mathrm{d}{\theta_0}}} }{W_2(\theta,\theta_0)}, \qquad u(\theta_0,\varLambda)=0. \end{aligned} $$
(3.4)

Proposition 3.2.12 (Slope Bounds)

Let θ0, Λ, and θ be probability measures with θ0 absolutely continuous, and set δ = W2(θ, θ0). Then
$$\displaystyle \begin{aligned} \frac 12\delta - W_2(\theta_0,\varLambda) - \sqrt{2W_2^2(\theta_0,\delta_0)+2W_2^2(\varLambda,\delta_0)} \le u(\theta,\varLambda) \le \frac 12\delta, \end{aligned}$$

where u is defined by (3.4). If the measures are compatible in the sense of Definition  2.3.1, then u(θ, Λ) = δ∕2.

The proof is a slight variation of Ambrosio et al. [12, Theorem 10.2.2 and Proposition 10.2.6], and the details are given on page 81 of the supplement.

Theorem 3.2.13 (Population Fréchet Gradient)

Let Λ be a random measure with finite Fréchet functional F. Then F is Fréchet-differentiable at any absolutely continuous θ 0 in the Wasserstein space, and \(F'(\theta _0)=\mathbb {E}{\mathbf t_{\theta _0}^{\varLambda }}-\mathbf i\in \mathcal {L}_2(\theta _0)\) . More precisely,
$$\displaystyle \begin{aligned} \frac{F(\theta) - F(\theta_0) + {{\int_{\mathcal X} \! {\langle \mathbb{E} {\mathbf t_{\theta_0}^{\varLambda}} - \mathbf i,{\mathbf t_{\theta_0}^{\theta}} - \mathbf i\rangle} \, \mathrm{d}{\theta_0}}} }{W_2(\theta,\theta_0)} \to 0 ,\qquad \theta\to\theta_0 \quad \mathrm{in} \mathcal W_2. \end{aligned}$$

Thus, the Fréchet derivative of F can be identified with the map \(-(\mathbb {E}{\mathbf t_{\theta _0}^{\varLambda }}-\mathbf i)\) in the tangent space at θ0, a subspace of \(\mathcal {L}_2(\theta _0)\).

Proof

Introduce the slopes u(θ, Λ) defined by (3.4). Then for all Λ,u(θ, Λ) → 0 as W2(θ, θ0) → 0, by the differentiability properties established above. Let us show that \(\mathbb {E} u(\theta ,\varLambda )\to 0\) as well. By Proposition 3.2.12, the expectation of u is bounded above by a constant that does not depend on Λ, and below by the negative of
$$\displaystyle \begin{aligned} & \mathbb{E} W_2(\theta_0,\varLambda) + \mathbb{E}\sqrt{2W_2^2(\theta_0,\delta_0)+2W_2^2(\varLambda,\delta_0)}\\ {} & \quad \le \sqrt 2W_2(\theta_0,\delta_0) + \mathbb{E} W_2(\theta_0,\varLambda) + \sqrt2\mathbb{E} W_2(\varLambda,\delta_0). \end{aligned} $$
Both expectations are finite by the hypothesis on Λ because the Fréchet functional is finite. The dominated convergence theorem yields
$$\displaystyle \begin{aligned} \mathbb{E} u(\theta,\varLambda) = \frac{F(\theta) - F(\theta_0) + \mathbb{E}{{\int_{\mathcal X} \! {\langle\mathbf t_{\theta_0}^\varLambda - i,\mathbf t_{\theta_0}^\theta - i\rangle} \, \mathrm{d}{\theta_0}}} }{W_2(\theta,\theta_0)} \to0 ,\qquad W_2(\theta_0,\theta)\to0. \end{aligned}$$
The measurability of the integral and the result then follow from Fubini’s theorem (see Proposition  2.4.9).

Proposition 3.2.14

Let Λ be a random measure in \(\mathcal W_2(\mathcal X)\) with finite Fréchet functional F, and let γ be absolutely continuous in \(\mathcal W_2(\mathcal X)\) . Then γ is a Karcher mean of Λ if and only if \(\mathbb {E}{\mathbf t_{\gamma }^{\varLambda }}-\mathbf i=0\) in \(\mathcal {L}_2(\gamma )\) . Furthermore, if γ is a Fréchet mean of Λ, then it is also a Karcher mean.

The characterisation of Karcher means follows immediately from Theorem 3.2.13. The other statement is that the derivative vanishes at the minimum, which is fairly obvious intuitively; see page 82 in the supplement.

3.3 Bibliographical Notes

Proposition 3.1.2 is essentially due to Agueh and Carlier [2, Proposition 4.2], who show it on \(\mathbb {R}^d\) (see also Zemel and Panaretos [134, Theorem 2]). An earlier result in a compact setting can be found in Carlier and Ekeland [33]. The formulation given here is from Masarotto et al. [91]. A more general version is provided by Le Gouic and Loubes [87, Theorem 8].

Lemmata 3.1.11 and 3.1.12 are from [135], but were known earlier (e.g., Bonneel et al. [30]).

Proposition 3.1.6 is a simplified version of Álvarez-Esteban et al. [8, Theorem 2.8] (see [8, Corollary 2.9]).

Propositions 3.2.3 and 3.2.7 are from Bigot and Klein [22], who also show the law of large numbers (Corollary 3.2.10) and deal with the one-dimensional setup (Theorem 3.2.11) in a compact setting. Section  2.4 appears to be new, but see the discussion in its beginning for other measurability results.

Barycentres can be defined for any p ≥ 1 as the measures minimising \(\mu \mapsto \mathbb {E} W_p^p(\varLambda ,\mu )\). (Strictly speaking, these are not Fréchet means unless p = 2.) Le Gouic and Loubes [87] show Proposition 3.2.3 and Theorem 3.2.9 in this more general setup, where \(\mathbb {R}^d\) can be replaced by any separable locally compact geodesic space.

Footnotes

  1. 1.
    It should be remarked that this is a Hilbertian property (or at least a property linked to an inner product), not merely a linear property. In other words, it does not extend to Banach spaces. As an example, let \(H=\mathbb {R}^2\) with the L1 norm and consider the vertices (0, 0), (0, 1), and (1, 0) of the unit simplex. The mean of these is (1∕3, 1∕3) but for (x, y) in the triangle,
    $$\displaystyle \begin{aligned} F(x,y) =(x+y)^2 + (x+1-y)^2 + (1-x+y)^2 =2 + x^2 + y^2 + (x-y)^2 \end{aligned}$$

    is minimised at (0, 0).

  2. 2.

    Interestingly, Fréchet himself [56] considered the Wasserstein metric between probability measures on \(\mathbb {R}\), and some refer to this as the Fréchet distance (e.g., Dowson and Landau [44]), which is another reason to use this terminology.

  3. 3.

    The notion of Fréchet derivative is also named after Maurice Fréchet, but is not directly related to Fréchet means.

References

  1. 2.
    M. Agueh, G. Carlier, Barycenters in the Wasserstein space. Soc. Indus. Appl. Math. 43(2), 904–924 (2011)MathSciNetzbMATHGoogle Scholar
  2. 8.
    P.C. Álvarez-Esteban, E. del Barrio, J.A. Cuesta-Albertos, C. Matrán, Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincaré Probab. Stat. 47(2), 358–375 (2011)MathSciNetCrossRefGoogle Scholar
  3. 9.
    P.C. Álvarez-Esteban, E. del Barrio, J.A. Cuesta-Albertos, C. Matrán, A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appl. 441(2), 744–762 (2016)MathSciNetCrossRefGoogle Scholar
  4. 10.
    L. Ambrosio, N. Gigli, A user’s guide to optimal transport, in Modelling and Optimisation of Flows on Networks (Springer, Berlin, 2013), pp. 1–155CrossRefGoogle Scholar
  5. 12.
    L. Ambrosio, N. Gigli, G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Zürich, 2nd edn. (Springer, Berlin, 2008)Google Scholar
  6. 22.
    J. Bigot, T. Klein, Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probab. Stat. 22, 35–57 (2018)MathSciNetCrossRefGoogle Scholar
  7. 25.
    S. Bobkov, M. Ledoux, One-Dimensional Empirical Measures, Order Statistics and Kantorovich Transport Distances, vol. 261, no. 1259 (Memoirs of the American Mathematical Society, Providence, 2019).  https://doi.org/10.1090/memo/1259
  8. 28.
    E. Boissard, T. Le Gouic, J.-M. Loubes, Distribution‘s template estimate with Wasserstein metrics. Bernoulli 21(2), 740–759 (2015)MathSciNetCrossRefGoogle Scholar
  9. 30.
    N. Bonneel, J. Rabin, G. Peyré, H. Pfister, Sliced and radon Wasserstein barycenters of measures. J. Math. Imag. Vis. 51(1), 22–45 (2015)MathSciNetCrossRefGoogle Scholar
  10. 33.
    G. Carlier, I. Ekeland, Matching for teams. Econ. Theory 42(2), 397–418 (2010)MathSciNetCrossRefGoogle Scholar
  11. 44.
    D. Dowson, B. Landau, The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982)CrossRefGoogle Scholar
  12. 55.
    M. Fréchet, Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. Inst. Henri Poincaré 10(4), 215–310 (1948)MathSciNetzbMATHGoogle Scholar
  13. 56.
    M. Fréchet, Sur la distance de deux lois de probabilité. C.R. Hebd. Seances Acad. Sci. 244(6), 689–692 (1957)Google Scholar
  14. 60.
    W. Gangbo, A. Świȩch, Optimal maps for the multidimensional Monge–Kantorovich problem. Comm. Pure Appl. Math. 51(1), 23–45 (1998)Google Scholar
  15. 78.
    H. Karcher, Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 30(5), 509–541 (1977)MathSciNetCrossRefGoogle Scholar
  16. 87.
    T. Le Gouic, J.-M. Loubes, Existence and consistency of Wasserstein barycenters. Prob. Theory Relat. Fields 168(3–4), 901–917 (2017)MathSciNetzbMATHGoogle Scholar
  17. 91.
    V. Masarotto, V.M. Panaretos, Y. Zemel, Procrustes metrics on covariance operators and optimal transportation of Gaussian processes. Sankhyā A 81, 172–213 (2019) (Invited Paper, Special Issue on Statistics on non-Euclidean Spaces and Manifolds)Google Scholar
  18. 102.
    B. Pass, Optimal transportation with infinitely many marginals. J. Funct. Anal. 264(4), 947–963 (2013)MathSciNetCrossRefGoogle Scholar
  19. 134.
    Y. Zemel, V.M. Panaretos, Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli 25(2), 932–976 (2019)MathSciNetCrossRefGoogle Scholar
  20. 135.
    Y. Zemel, V.M. Panaretos, Supplement to “Fréchet means and Procrustes analysis in Wasserstein space” (2019)Google Scholar

Copyright information

© The Author(s) 2020

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Victor M. Panaretos
    • 1
  • Yoav Zemel
    • 2
  1. 1.Institute of MathematicsEPFLLausanneSwitzerland
  2. 2.Statistical LaboratoryUniversity of CambridgeCambridgeUK

Personalised recommendations