Abstract
In the previous chapters, we obtained rates of convergence in the total variation distance of the iterates \(P^n\) of an irreducible positive Markov kernel P to its unique invariant measure \(\pi \) for \(\pi \)-almost every \(x \in \mathsf {X}\) and for all \(x \in \mathsf {X}\) if the kernel P is irreducible and positive Harris recurrent. Conversely, convergence in the total variation distance for all \(x\in \mathsf {X}\) entails irreducibility and that \(\pi \) be a maximal irreducibility measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Corresponding author
20.A Complements on the Wasserstein Distance
20.A Complements on the Wasserstein Distance
In this section, we complement and prove some results of Section 20.1.
Theorem 20.A.1.
is a distance on the Wasserstein space
Proof.
If \(\xi =\xi '\), then since we can choose the diagonal coupling, that is, \(\gamma \) is the distribution of (X, X) where X has distribution \(\xi \). Conversely, if then there exists a pair of random variables \((X, X')\) defined on a probability space \((\varOmega ,\mathscr {F},\mathbb {P})\) with marginal distribution \(\xi \) and \(\xi '\) such that which implies hence \(\xi =\xi '\).
Since obviously holds, the proof will be completed if we prove the triangle inequality. Let \(\varepsilon >0\) and By definition, there exist \(\gamma _1\in \mathscr {C}(\mu _1,\mu _2)\) and \(\gamma _2\in \mathscr {C}(\mu _2,\mu _3)\) such that
By the gluing lemma, Lemma B.3.12 (which assumes that \(\mathsf {X}\) is a Polish space), we can choose \((Z_1,Z_2,Z_3)\) such that and This implies that and Thus
Since \(\varepsilon \) is arbitrary, the triangle inequality holds. \({\Box }\)
The following result relates the Wasserstein distance and the Prokhorov metric and shows that convergence in the Wasserstein distance implies weak convergence.
Proposition 20.A.2
Let \(\mu \), \(\nu \) be two probability measures on \(\mathsf {X}\). Then
Let be a sequence of probability measures on \(\mathsf {X}\). For \(p\ge 1\), if then converges weakly to \(\mu \).
Proof.
Without loss of generality, we assume that and set For \(A \in \mathscr {X}\), define and let \(A^a\) be the a-enlargement of A. Then \(\mathbbm {1}_A \le f_a \le \mathbbm {1}_{A^a}\) and for all \((x, y) \in \mathsf {X}\times \mathsf {X}\). Let \(\gamma \) be the optimal coupling of \(\mu \) and \(\nu \). This yields
By definition of the Prokhorov metric, this proves that and hence (20.A.1) by the choice of a. Since the Prokhorov metric metrizes weak convergence by Theorem C.2.7 and for all \(p\ge 1\) by (20.1.14), we obtain that convergence with respect to the Wasserstein distance implies weak convergence. \({\Box }\)
Proof
(of Theorem 20.1.8). Let be a Cauchy sequence for By Proposition 20.A.2, it is also a Cauchy sequence for the Prokhorov metric, and by Theorem C.2.7, there exists a probability measure \(\mu \) such that \(\mu _n{\mathop {\Rightarrow }\limits ^{\text {w}}}\mu \). We must prove that Fix \(x_0\in \mathsf {X}\). For every \(M>0\), the function is continuous. Thus, there exists N such that
By the monotone convergence theorem, this proves that and thus is complete.
We now prove the density of the distributions with finite support. Fix an arbitrary \(a_0\in \mathsf {X}\). For all \(n\ge 1\), by Lemma B.1.3, there exists a partition of \(\mathsf {X}\) by Borel sets such that \(\mathrm {diam}(A_{n, k}) \le 1/n\) for all k. Choose now, for each \(n, k\ge 1\), a point \(a_{n,k}\in A_{n, k}\). Set \(B_{n, k} = \bigcup _{j=1}^k A_{n, j}\). Then \(B_{n, k}^c\) is a decreasing sequence of Borel sets and \(\bigcap _{k\ge 0} B_{n, k}^c = \emptyset \). Let Then by dominated convergence, We may thus choose \(k_0\) large enough that Let X be a random variable with distribution \(\mu \). Define the random variable \(Y_n\) by
Let \(\nu _n\) be the distribution of \(Y_n\). Then
This proves that the set of probability measures that are finite convex combinations of the Dirac measures \(\delta _{a_0}\) and \(\delta _{a_{n, k}}\), \(n, k\ge 1\), are dense in Restricting to combinations with rational weights proves that is separable.
Assume now that (i) holds. Then \(\mu _n{\mathop {\Rightarrow }\limits ^{\text {w}}}\mu _0\) by Proposition 20.A.2. Applying (20.1.15) and the triangle inequality, we obtain
Since \(\mu _n{\mathop {\Rightarrow }\limits ^{\text {w}}}\mu _0\), it follows that
for all M such that This proves (ii).
Conversely, if (ii) holds, then by Skorokhod’s representation theorem, Theorem B.3.18, there exists a sequence of random elements defined on a common probability space \((\varOmega ,\mathscr {A},\mathbb {P})\) such that the distribution of \(X_n\) is \(\mu _n\) for all \(n\in \mathbb {N}\) and \(X_n\rightarrow X_0\) This yields by Lebesgue’s dominated convergence theorem,
By (ii), we also have
Altogether, we have shown that
This proves (i).
\({\Box }\)
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Douc, R., Moulines, E., Priouret, P., Soulier, P. (2018). Convergence in the Wasserstein Distance. In: Markov Chains. Springer Series in Operations Research and Financial Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-97704-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-97704-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97703-4
Online ISBN: 978-3-319-97704-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)