Convergence in the Wasserstein Distance

Douc, Randal; Moulines, Eric; Priouret, Pierre; Soulier, Philippe

doi:10.1007/978-3-319-97704-1_20

Randal Douc⁸,
Eric Moulines⁹,
Pierre Priouret¹⁰ &
…
Philippe Soulier¹¹

Part of the book series: Springer Series in Operations Research and Financial Engineering ((ORFE))

5594 Accesses

Abstract

In the previous chapters, we obtained rates of convergence in the total variation distance of the iterates \(P^n\) of an irreducible positive Markov kernel P to its unique invariant measure \(\pi \) for \(\pi \)-almost every \(x \in \mathsf {X}\) and for all \(x \in \mathsf {X}\) if the kernel P is irreducible and positive Harris recurrent. Conversely, convergence in the total variation distance for all \(x\in \mathsf {X}\) entails irreducibility and that \(\pi \) be a maximal irreducibility measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Département CITI, Telecom SudParis, Évry, France
Randal Douc
Centre de Mathématiques Appliquées, Ecole Ploytechnique, Palaiseau, France
Eric Moulines
Université Pierre et Marie Curie, Paris, France
Pierre Priouret
Université Paris Nanterre, Nanterre, France
Philippe Soulier

Authors

Randal Douc
View author publications
You can also search for this author in PubMed Google Scholar
Eric Moulines
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Priouret
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Soulier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Randal Douc .

20.A Complements on the Wasserstein Distance

In this section, we complement and prove some results of Section 20.1.

Theorem 20.A.1.

is a distance on the Wasserstein space

Proof.

If \(\xi =\xi '\), then since we can choose the diagonal coupling, that is, \(\gamma \) is the distribution of (X, X) where X has distribution \(\xi \). Conversely, if then there exists a pair of random variables \((X, X')\) defined on a probability space \((\varOmega ,\mathscr {F},\mathbb {P})\) with marginal distribution \(\xi \) and \(\xi '\) such that which implies hence \(\xi =\xi '\).

Since obviously holds, the proof will be completed if we prove the triangle inequality. Let \(\varepsilon >0\) and By definition, there exist \(\gamma _1\in \mathscr {C}(\mu _1,\mu _2)\) and \(\gamma _2\in \mathscr {C}(\mu _2,\mu _3)\) such that

By the gluing lemma, Lemma B.3.12 (which assumes that \(\mathsf {X}\) is a Polish space), we can choose \((Z_1,Z_2,Z_3)\) such that and This implies that and Thus

Since \(\varepsilon \) is arbitrary, the triangle inequality holds. \({\Box }\)

The following result relates the Wasserstein distance and the Prokhorov metric and shows that convergence in the Wasserstein distance implies weak convergence.

Proposition 20.A.2

Let \(\mu \), \(\nu \) be two probability measures on \(\mathsf {X}\). Then

(20.A.1)

Let be a sequence of probability measures on \(\mathsf {X}\). For \(p\ge 1\), if then converges weakly to \(\mu \).

Proof.

Without loss of generality, we assume that and set For \(A \in \mathscr {X}\), define and let \(A^a\) be the a-enlargement of A. Then \(\mathbbm {1}_A \le f_a \le \mathbbm {1}_{A^a}\) and for all \((x, y) \in \mathsf {X}\times \mathsf {X}\). Let \(\gamma \) be the optimal coupling of \(\mu \) and \(\nu \). This yields

By definition of the Prokhorov metric, this proves that and hence (20.A.1) by the choice of a. Since the Prokhorov metric metrizes weak convergence by Theorem C.2.7 and for all \(p\ge 1\) by (20.1.14), we obtain that convergence with respect to the Wasserstein distance implies weak convergence. \({\Box }\)

Proof

(of Theorem 20.1.8). Let be a Cauchy sequence for By Proposition 20.A.2, it is also a Cauchy sequence for the Prokhorov metric, and by Theorem C.2.7, there exists a probability measure \(\mu \) such that \(\mu _n{\mathop {\Rightarrow }\limits ^{\text {w}}}\mu \). We must prove that Fix \(x_0\in \mathsf {X}\). For every \(M>0\), the function is continuous. Thus, there exists N such that

By the monotone convergence theorem, this proves that and thus is complete.

We now prove the density of the distributions with finite support. Fix an arbitrary \(a_0\in \mathsf {X}\). For all \(n\ge 1\), by Lemma B.1.3, there exists a partition of \(\mathsf {X}\) by Borel sets such that \(\mathrm {diam}(A_{n, k}) \le 1/n\) for all k. Choose now, for each \(n, k\ge 1\), a point \(a_{n,k}\in A_{n, k}\). Set \(B_{n, k} = \bigcup _{j=1}^k A_{n, j}\). Then \(B_{n, k}^c\) is a decreasing sequence of Borel sets and \(\bigcap _{k\ge 0} B_{n, k}^c = \emptyset \). Let Then by dominated convergence, We may thus choose \(k_0\) large enough that Let X be a random variable with distribution \(\mu \). Define the random variable \(Y_n\) by

Let \(\nu _n\) be the distribution of \(Y_n\). Then

This proves that the set of probability measures that are finite convex combinations of the Dirac measures \(\delta _{a_0}\) and \(\delta _{a_{n, k}}\), \(n, k\ge 1\), are dense in Restricting to combinations with rational weights proves that is separable.

Assume now that (i) holds. Then \(\mu _n{\mathop {\Rightarrow }\limits ^{\text {w}}}\mu _0\) by Proposition 20.A.2. Applying (20.1.15) and the triangle inequality, we obtain

Since \(\mu _n{\mathop {\Rightarrow }\limits ^{\text {w}}}\mu _0\), it follows that

for all M such that This proves (ii).

Conversely, if (ii) holds, then by Skorokhod’s representation theorem, Theorem B.3.18, there exists a sequence of random elements defined on a common probability space \((\varOmega ,\mathscr {A},\mathbb {P})\) such that the distribution of \(X_n\) is \(\mu _n\) for all \(n\in \mathbb {N}\) and \(X_n\rightarrow X_0\) This yields by Lebesgue’s dominated convergence theorem,

By (ii), we also have

Altogether, we have shown that

This proves (i).

\({\Box }\)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Douc, R., Moulines, E., Priouret, P., Soulier, P. (2018). Convergence in the Wasserstein Distance. In: Markov Chains. Springer Series in Operations Research and Financial Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-97704-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-97704-1_20
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97703-4
Online ISBN: 978-3-319-97704-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Convergence in the Wasserstein Distance

Abstract

Access this chapter

Author information

Authors and Affiliations

Corresponding author

20.A Complements on the Wasserstein Distance

20.A Complements on the Wasserstein Distance

Theorem 20.A.1.

Proof.

Proposition 20.A.2

Proof.

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation