On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods

Vialaret, Marie; Maire, Florian

doi:10.1007/s11009-019-09766-w

On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods

Published: 15 February 2020

Volume 22, pages 1349–1387, (2020)
Cite this article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

169 Accesses
4 Citations
Explore all metrics

Abstract

It is commonly admitted that non-reversible Markov chain Monte Carlo (MCMC) algorithms usually yield more accurate MCMC estimators than their reversible counterparts. In this note, we show that in addition to their variance reduction effect, some non-reversible MCMC algorithms have also the undesirable property to slow down the convergence of the Markov chain. This point, which has been overlooked by the literature, has obvious practical implications. We illustrate this phenomenon for different non-reversible versions of the Metropolis-Hastings algorithm on several discrete state space examples and discuss ways to mitigate the risk of a small asymptotic variance/slow convergence scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Convergence of Reversible Samplers

Article 13 June 2016

Luc Rey-Bellet & Konstantinos Spiliopoulos

On Approximating the Stationary Distribution of Time-Reversible Markov Chains

Article 05 April 2019

Marco Bressan, Enoch Peserico & Luca Pretto

Non-reversible Metropolis-Hastings

Article Open access 18 August 2015

Joris Bierkens

References

Andrieu C, Durmus A, Nüsken N, Roussel J (2018) Hypercoercivity of piecewise deterministic markov process-monte carlo. arXiv:1808.08592
Andrieu C, Livingstone S (2019) Peskun-Tierney ordering for Markov chain and process Monte Carlo: beyond the reversible scenario. arXiv:1906.06197
Bierkens J (2016) Non-reversible Metropolis-Hastings. Stat Comput 26(6):1213–1228. https://doi.org/10.1007/s11222-015-9598-x
Article MathSciNet MATH Google Scholar
Bierkens J, Fearnhead P, Roberts G (2019) The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann Stat 47(3):1288–1320
Article MathSciNet MATH Google Scholar
Bouchard-Côté A, Vollmer SJ, Doucet A (2017) The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. Journal of the American Statistical Association
Chen F, Lovász L, Pak I (1999) Lifting Markov chains to speed up mixing. In: STOC’99. Citeseer
Chen T-L, Hwang C-R (2013) Accelerating reversible Markov chains. Stat Probabil Lett 83(9):1956–1962
Article MathSciNet MATH Google Scholar
Diaconis P, Holmes S, Neal RM (2000) Analysis of a nonreversible Markov chain sampler. Ann Appl Probab 10(3):726–752. http://www.jstor.org/stable/2667319
Article MathSciNet MATH Google Scholar
Diaconis P, Miclo L (2013) On the spectral analysis of second-order Markov chains. Annales de la faculté des sciences de toulouse: Mathématiques 22:573–621
Article MathSciNet MATH Google Scholar
Diaconis P, Stroock D et al (1991) Geometric bounds for eigenvalues of Markov chains. Ann Appl Probab 1(1):36–61
Article MathSciNet MATH Google Scholar
Duncan A, Nüsken N, Pavliotis G (2017) Using perturbed underdamped langevin dynamics to efficiently sample from probability distributions. J Stat Phys 169 (6):1098–1131
Article MathSciNet MATH Google Scholar
Fill JA et al (1991) Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. Ann Appl Probab 1(1):62–87
Article MathSciNet MATH Google Scholar
Gadat S, Miclo L (2013) Spectral decompositions and l2-operator norms of toy hypocoercive semi-groups. Kinet Relat Mod 6(2):317–372
Article MATH Google Scholar
Gustafson P (1998) A guided walk Metropolis algorithm. Stat Comput 8(4):357–364
Article Google Scholar
Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
Article MathSciNet MATH Google Scholar
Horowitz AM (1991) A generalized guided Monte Carlo algorithm. Phys Lett B 268(2):247–252
Article Google Scholar
Hwang C-R, Hwang-Ma S-Y, Sheu S-J, et al (2005) Accelerating diffusions. Ann Appl Probab 15(2):1433–1444
Article MathSciNet MATH Google Scholar
Hwang C-R, Normand R, Wu S-J (2015) Variance reduction for diffusions. Stoch Process Appl 125(9):3522–3540
Article MathSciNet MATH Google Scholar
Iosifescu M (2014) Finite Markov processes and their applications. Courier Corporation
Łatuszyński K, Miasojedow B, Niemiro W et al (2013) Nonasymptotic bounds on the estimation error of MCMC algorithms. Bernoulli 19(5A):2033–2066
Article MathSciNet MATH Google Scholar
Lelièvre T, Nier F, Pavliotis GA (2013) Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J Stat Phys 152(2):237–274
Article MathSciNet MATH Google Scholar
Ma Y-A, Fox EB, Chen T, Wu L (2019) Irreversible samplers from jump and continuous Markov processes. Stat Comput 29(1):177–202
Article MathSciNet MATH Google Scholar
Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
Article MATH Google Scholar
Meyn SP, Tweedie RL et al (1994) Computable bounds for geometric convergence rates of Markov chains. Ann Appl Probab 4(4):981–1011
Article MathSciNet MATH Google Scholar
Miclo L, Monmarché P (2013) Étude spectrale minutieuse de processus moins indécis que les autres. In: Séminaire de Probabilités XLV. Springer, pp 459–481
Mira A, Geyer CJ (2000) On non-reversible markov chains. Monte Carlo Methods. Fields Institute/AMS, pp 95–110
Neal RM (2004) Improving asymptotic variance of MCMC estimators: Non-reversible chains are better. arXiv:math/0407281
Plummer M, Best N, Cowles K, Vines K (2006) CODA: Convergence diagnosis and output analysis for MCMC. R news 6(1):7–11
Google Scholar
Poncet R (2017) Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833
Ramanan K, Smith A (2018) Bounds on lifting continuous-state Markov chains to speed up mixing. J Theor Probab 31(3):1647–1678
Article MathSciNet MATH Google Scholar
Rosenthal JS (1995) Minorization conditions and convergence rates for Markov chain Monte Carlo. J Am Stat Assoc 90(430):558–566
Article MathSciNet MATH Google Scholar
Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98(461):169–177
Article MathSciNet MATH Google Scholar
Sakai Y, Hukushima K (2016) Eigenvalue analysis of an irreversible random walk with skew detailed balance conditions. Phys Rev E 93(4):043318
Article Google Scholar
Sherlock C, Thiery AH (2017) A discrete bouncy particle sampler. arXiv:1707.05200
Sun Y, Schmidhuber J, Gomez FJ (2010) Improving the asymptotic performance of Markov chain Monte Carlo by inserting vortices. In: Advances in Neural Information Processing Systems. pp 2235–2243
Tierney Luke (1998) A note on Metropolis-Hastings kernels for general state spaces. Annals of applied probability, 1–9
Turitsyn KS, Chertkov M, Vucelja M (2011) Irreversible monte carlo algorithms for efficient sampling. Physica D: Nonlinear Phenomena 240(4-5):410–414
Article MATH Google Scholar
Vanetti P, Bouchard-Côté A, Deligiannidis G, Doucet A (2018) Piecewise-deterministic Markov Chain Monte Carlo. arXiv:1707.05296
Vucelja M (2016) Lifting – A nonreversible Markov chain Monte Carlo algorithm. Am J Phys 84(958). https://doi.org/10.1119/1.4961596
Yuen WK (2000) Applications of geometric bounds to the convergence rate of Markov chains on rn. Stochastic Processes and Their Applications 87:1–23
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research work was been partially funded by ENSAE ParisTech, the Insight Center for Data Analytics at University College Dublin and NSERC of Canada. The Authors thank the editors and two anonymous referees for many constructive comments that improved the article.

Author information

Authors and Affiliations

ENSAE, Université Paris-Saclay, Saint-Aubin, France
Marie Vialaret
Département de Mathématiques et de Statistique, Université de Montréal, Montreal, Canada
Florian Maire

Authors

Marie Vialaret
View author publications
You can also search for this author in PubMed Google Scholar
Florian Maire
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Maire.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Lifted non-reversible Markov chain

Appendix B: Marginal non-reversible Markov chain

Appendix C: Proof of Proposition 3

We first need to proof the following Lemma.

Lemma 9

The conductance of the MH Markov chain of Example 2 satisfies

$$ \frac{1+\sqrt{1+2S(S+1)}}{S(S+1)}\leq h(P)\leq \frac{2}{S+1} . $$

(28)

Proof

Let for all $A\in \mathfrak {S}$, $\psi (A):={{\sum }_{x\in A}\pi (x)P(x,\bar {A})}\slash {\pi (A)\wedge (1-\pi (A))}$ be the quantity to minimize. A close analysis of the MH Markov chain displayed at the top panel of Fig. 14 shows that the set A which minimizes ψ(A) has the form A = (a₁, a₁ + 1, … , a₂) for some S ≥ a₂ ≥ a₁ ≥ 1. Indeed, since the Markov chain moves to neighbouring states only there are only two ways to exit A for each transition. Since each way to exit A contributes at the same order of magnitude to the numerator, taking contiguous states minimizes it and in particular

$$ \sum\limits_{x\in A}\pi(x)P(x,\bar{A})=\pi(a_{1})\frac{1\vee (a_{1}-1)}{2a_{1}}+\pi(a_{2})\frac{1}{2}=\frac{1\wedge (a_{1}-1)+a_{2}}{S(S+1)} , $$

so that for any a₁ < a₂ satisfying π(A) < 1/2, we have:

$$ \psi(A)\geq \frac{1\vee (a_{1}-1)+a_{2}}{a_{2}(a_{2}+1)-a_{1}(a_{1}-1)} $$

(29)

since

$$ \pi(A)=\frac{2}{S(S+1)}\sum\limits_{k=a_{1}}^{a_{2}}k=\frac{a_{2}(a_{2}+1)-a_{1}(a_{1}-1)}{S(S+1)} . $$

Fix a₁ and treat a₂ as a function of a₁ satisfying π(A) < 1/2. On the one hand, note that for all a₁ the function mapping a₂ to the RHS of Eq. (29) is decreasing. On the other hand, we have that $\pi (A)<1/2\Leftrightarrow {a_{2}^{\ast }(a_{2}^{\ast }+1)-a_{1}(a_{1}-1)}<S(S+1)/2$, which yields

$$ a_{2}\leq a_{2}^{\ast}(a_{1}):=\left\lfloor\frac{-1+V(a_{1},S)}{2}\right\rfloor ,\quad V(a_{1},S):=\sqrt{1+2S(S+1)+4a_{1}(a_{1}-1)} . $$

Hence, for all a₁, the RHS of Eq. (29) is lower bounded by

$$ \begin{array}{@{}rcl@{}} \frac{4(1\vee (a_{1}-1))-2+2V(a_{1},S)}{(-1+V(a_{1},S))(1+V(a_{1},S))-4a_{1}(a_{1}-1)} =\frac{4(1\vee (a_{1}-1))-2+2V(a_{1},S)}{V(a_{1},S)^{2}-1-4a_{1}(a_{1}-1)}\\ =\frac{2(1\vee (a_{1}-1))-1+V(a_{1},S)}{S(S+1)} . \end{array} $$

Clearly, the numerator is an increasing function of a₁ and is thus minimized for a₁ = 1, which gives the lower bound of Eq. (28). Finally, by definition h(P) is upper bounded by ψ(A) for any $A\in \mathfrak {S}$ satisfying π(A) < 1/2. In particular, taking A = (1, 2, … , (S − 1)/2) gives the upper bound of Eq. (28). □

Proof

Since P_MH is reversible and aperiodic its spectrum is real with any eigenvalue different to one $\lambda \in {\Lambda }_{|\boldsymbol {1}^{\perp }}:=\text {Sp}(P_{\text {MH}})\backslash \{1\}$ satisfying − 1 < λ < 1. The norm of P_MH as an operator on the non-constant functions of L²(π) is $\gamma :=\max \limits \{\sup {\Lambda }_{|\boldsymbol {1}^{\perp }},|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\}$. It is well known (see e.g. Yuen 2000) that

$$ \|\delta_{1}P_{\text{MH}}^{t}-\pi\|_{2}\leq \|\delta_{1}-\pi\|_{2}\gamma^{t} . $$

It can be readily checked that ∥δ₁ − π∥₂ corresponds to the first factor on the RHS of Eq. (13). The tedious part of the proof is to bound γ. Using again the reversibility, the Cheeger’s inequality, (see e.g. Diaconis et al. (1991) for a proof), writes

$$ 1-2h(P)\leq \sup{\Lambda}\leq 1-h(P)^{2} , $$

(30)

where h(P) is the Markov chain conductance defined as

$$ h(P)=\underset{\pi(A)<1/2}{\underset{A\in\mathfrak{S}}{\inf}} \frac{{\sum}_{x\in A}\pi(x)P(x,\bar{A})}{\pi(A)} . $$

Combining Cheeger’s inequality and Lemma 9 yields

$$ \sup{\Lambda}\leq 1-\frac{2}{S(S+1)} . $$

(31)

However, to use the above bound to upper bound γ, we need to check that $\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq |\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|$. In general, bounding $|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|$ proves to be more challenging than $\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}$. However, in the context of this example, we can use the bound derived in Proposition 2 of Diaconis et al. (1991). It is based on a geometric interpretation of the Markov chain as a non bipartite graph with vertices (states) connected by edges (transitions), as illustrated in Fig. 14. More precisely, the main result of this work to our interest states that

$$ \inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq -1+\frac{2}{\iota(P)} , $$

(32)

with $\iota (P)=\max \limits _{e_{a,b}\in {\Gamma }}{\sum }_{\sigma _{x}\ni e_{a,b}}|\sigma _{x}|\pi (x)$, where

e_{a, b} is the edge corresponding to the transition from state a to b,
σ_x is a path of odd length going from state x to itself, including a self-loop provided that P(x, x) > 0, and more generally $\sigma _{x}=(e_{x,a_{1}},e_{a_{1},a_{2}},\ldots ,e_{a_{\ell },a_{x}})$ with ℓ even.
Γ is a collection of paths {σ₁, … , σ_S} including exactly one path for each state,
|σ_x| represents the “length” of path σ_x and is formally defined as
$$ |\sigma_{x}|=\sum\limits_{e_{a,b}\in\sigma_{x}}\frac{1}{\pi(a) P(a,b)} . $$

Let us consider the collection of paths Γ consisting of all the self loops for all states x ≥ 2. It can be readily checked that the length of such paths is

$$ |\sigma_{x}|=(\pi(x)P(x,x))^{-1}=\left( \frac{x}{\Delta}\frac{1}{2x}\right)^{-1}=S(S+1) . $$

For state x = 1, let us consider the path consisting of the walk around the circle σ₁ : (e_1,2, e_2,3, … , e_S,1). It may have been possible to take the path e_1,2, e_2,2, e_2,1, but it is unclear if paths using the same edge twice are permitted in the framework of Prop. 2 of Diaconis et al. (1991). The length of path σ₁ is

$$ \begin{array}{@{}rcl@{}} |\sigma_{1}|=\frac{1}{\pi(1)P(1,2)}+\cdots+\frac{1}{\pi(S)P(S,1)}\\ =S(S+1)+\frac{S(S+1)}{2}+{\cdots} \frac{S(S+1)}{S-1}+S(S+1)=S(S+1)\left( 1+\sum\limits_{k=1}^{S}\frac{1}{k}\right) . \end{array} $$

We are now in a position to calculate ι(P). First note that, by construction, each edge belonging to any path σ_k contained in Γ appears once and only once. Hence, the constant ι(P) simplifies to the maximum of the set {|σ_x|π(x), σ_x ∈Γ} that is

$$ \max\left\{2\left( 1+\sum\limits_{\ell=1}^{S}\frac{1}{\ell}\right),2k : 2\leq k\leq S\right\} =2S , $$

(33)

since on the one hand ${\sum }_{\ell =1}^{S} {1}\slash {\ell }\leq 1+\log (S)$ and on the other hand S ≥ 5. Combining Eqs. (32) and (33) yields to

$$ \inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq -1+\frac{1}{S} . $$

(34)

It comes that if $\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq 0$, then $\gamma \leq \sup {\Lambda }_{|\boldsymbol {1}^{\perp }}$ and otherwise we have

$$ 0>\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq-1+\frac{1}{S}\Leftrightarrow 0>\inf {\Lambda}_{|\boldsymbol{1}^{\perp}} \text{and} \left|\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\right|\leq1-\frac{1}{S} , $$

which combines with Eq. (31) to complete the proof as

$$ \max\{\sup{\Lambda}_{|\boldsymbol{1}^{\perp}},|\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}|\}\leq 1-\frac{1}{S}\vee 1-\frac{2}{S(S+1)} \leq 1-\frac{2}{S(S+1)} , $$

since S ≥ 5. □

Appendix D: Proof of Proposition 4

Proof

By straightforward calculation we have:

$$ \left\|\delta_{1} P_{\text{GW}}^{(S-1)}(\cdot \times \{-1,1\})-\pi\right\|_{2}^{2}=1-\frac{8}{3S}+o(1/S) . $$

(35)

Using Proposition 3, we have that

$$ \begin{array}{@{}rcl@{}} &&\left\|\delta_{1} P_{\text{MH}}^{(S-1)}-\pi\right\|_{2}^{2}\leq \left\{1-\frac{4}{S(S+1)}+\frac{2(2S+1)}{3S(S+1)}\right\} \left( 1-\frac{2}{S(S+1)}\right)^{S-1}\\ &=&1-\frac{2}{3S}+o(1/S) . \end{array} $$

Comparing the complexity of the former bound with Eq. (35), the inequality of Eq. (14) cannot be concluded. In fact, we need to refine the bound for the MH convergence. Analysing the proof of Lemma 9, the lower bound of the conductance seems rather tight as resulting from taking the real bound on $a_{2}^{\ast }(a_{1})$ as opposed to the floor of it. To illustrate this statement, the value of the bound is compared to the actual conductance for some moderate size of S, the calculation being otherwise too costly. Then, we calculated the numerical value of $\sup {\Lambda }_{|1^{\perp }}$ for S ≤ 500 and compared with the lower bound derived from Cheeger’s inequality in the proof of Prop. 3. It appears that the Cheeger’s bound is in this example too lose to justify Eq. (14). However, taking a finer lower bound such as

$$ \sup{\Lambda}_{|1^{\perp}}\leq 1- 8/S^{2} , $$

yields

$$ \left\|\delta_{1} P_{\text{MH}}^{(S-1)}-\pi\right\|_{2}^{2}\leq1-\frac{20}{3S}+o(1/S) $$

which concludes the proof. □

Appendix E: Proof of Proposition 6

Proof

First, denote by R the mixture of the two NRMH kernels with weight 1/2. We start by showing that this kernel is π-reversible. Indeed, the subkernel of R satisfies:

$$ \begin{array}{@{}rcl@{}} &&\pi(\mathrm{d} x) Q(x,\mathrm{d} y)(A_{\Gamma}(x,y)+A_{-{\Gamma}}(x,y)) \\ &=& \mathrm{d} x\mathrm{d} y \big(\left[\pi(x)Q(x,y)\wedge\pi(y)Q(y,x)+{\Gamma}(x,y)\right]\\ &&+\left[\pi(x)Q(x,y)\wedge\pi(y)Q(y,x)-{\Gamma}(x,y)\right]\big)\\ &=&\mathrm{d} x\mathrm{d} y \big(\left[\pi(x)Q(x,y)-{\Gamma}(x,y)\wedge\pi(y)Q(y,x)\right]\\ &&+\left[\pi(x)Q(x,y)+{\Gamma}(x,y)\wedge\pi(y)Q(y,x)\right]\big)\\ &=&\pi(\mathrm{d} y)Q(y,\mathrm{d} x) \left( \left[\frac{\pi(x)Q(x,y)-{\Gamma}(x,y)}{\pi(y)Q(y,x)}\wedge 1\right]+ \left[\frac{\pi(x)Q(x,y)+{\Gamma}(x,y)}{\pi(y)Q(y,x)}\wedge1\right]\right) . \end{array} $$

Now, note that for all $x\in \mathcal {S}$ and all $A\in \mathfrak {S}$,

$$ R(x,A\backslash\{x\})=\frac{1}{2}{\int}_{A\backslash\{x\}}Q(x,\mathrm{d} z)(A_{\Gamma}(x,z)+A_{-{\Gamma}}(x,z)) $$

and since for any two positive number a and b, (1 ∧ a) + (1 ∧ b) ≤ 2 ∧ (a + b), we have all $(x,z)\in \mathcal {S}^{2}$,

$$ \begin{array}{@{}rcl@{}} A_{\Gamma}(x,z)+A_{-{\Gamma}}(x,z)&=&1\wedge \frac{\pi(y)Q(y,x)+{\Gamma}(x,y)}{\pi(x)Q(x,y)}+ 1\wedge \frac{\pi(y)Q(y,x)-{\Gamma}(x,y)}{\pi(x)Q(x,y)}\\ &\leq& 2\left( 1\wedge\frac{\pi(y)Q(y,x)}{\pi(x)Q(x,y)}\right) \end{array} $$

since by Assumption 2, π(y)Q(y, x) + Γ(x, y) ≥ 0 for all $(x,y)\in \mathcal {S}^{2}$. This yields a Peskun-Tierney ordering R ≺ P_MH, since

$$ R(x,A\backslash\{x\})\leq \frac{1}{2}\int Q(x,\mathrm{d} z)\left( 1\wedge\frac{\pi(y)Q(y,x)}{\pi(x)Q(x,y)}\right)=P_{\text{MH}}(x,A\backslash\{x\}) $$

and the proof is concluded by applying Theorem 4 of Tierney (1998). □

Appendix F: Proof of Proposition 7

Proof

Note that if Γ₁ satisfies Assumptions 1 and 2 then

$$ \begin{array}{@{}rcl@{}} &&\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\\ &=& {\Gamma}_{1}(x,y)+\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{1}(y,x)\right)\right] . \end{array} $$

Thus, if Γ₁ and Γ_− 1 satisfy Assumptions 1, 2 and 3 then

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{1}(x,y)&=&\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{-1}(y,x)\right)\right] \\ &&-\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{1}(y,x)\right)\right] . \end{array} $$

Hence, we have

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{1}(y,x)&=&\left[\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{-1}(x,y)\right)\right] \\ &&-\left[\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\right] ,\\ &=&\left[\left( \pi(x)Q(x,y)-{\Gamma}_{-1}(x,y)\right)\wedge \pi(y)Q(y,x)\right] \\ &&-\left[\left( \pi(x)Q(x,y)-{\Gamma}_{1}(x,y)\right)\wedge \pi(y)Q(y,x)\right]+{\Gamma}_{-1}(x,y)-{\Gamma}_{1}(x,y)\\ &=&{\Gamma}_{1}(x,y)+{\Gamma}_{-1}(x,y)-{\Gamma}_{1}(x,y) , \end{array} $$

and thus Γ_− 1 = −Γ₁, which replacing in Eq. (21) leads to

$$ \begin{array}{@{}rcl@{}} \pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\\ =\left( \pi(x)Q(x,y)+{\Gamma}_{1}(x,y)\right) \wedge \pi(y)Q(y,x) \end{array} $$

(36)

for all $(x,y)\in \mathcal {S}^{2}$. Conversely, it can be readily checked that if Γ₁ satisfies Assumptions 1, 2 and Eq. (36), then setting Γ_− 1 = −Γ₁ implies that Γ₁ and Γ_− 1 satisfy Assumptions 1, 2 and the skew-detailed balance equation (Eq. (21)). The proof is concluded by noting that Eq. (36) holds if and only if Γ₁ is the null operator on $\mathcal {S}\times \mathcal {S}$ or Q is π- reversible. □

Appendix G: Proof of Proposition 8

We prove Proposition 8 that states that the transition kernel (24) of the Markov chain generated by Algorithm 4 is $\tilde {\pi }$-invariant and is non-reversible if and only if Γ = 0.

Proof

To prove the invariance of K_ρ, we need to prove that

$$ \sum\limits_{y\in\mathcal{S},\eta\in\{-1,1\}} \tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi) = \tilde{\pi}(x,\xi) , $$

for all $(x,\xi )\in \mathcal {S}\times \{-1,1\}$ and ρ ∈ [0, 1].

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{y,\eta}\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi)\\ &=& \sum\limits_{y} \tilde{\pi}(y,\xi)K_{\rho}(y,\xi;x,\xi) + \sum\limits_{y} \tilde{\pi}(y,-\xi)K_{\rho}(y,-\xi;x,\xi) \\ &=& \tilde{\pi}(x,\xi)K_{\rho}(x,\xi;x,\xi) + \sum\limits_{y \neq x} \tilde{\pi}(y,\xi)K_{\rho}(y,\xi;x,\xi)+ \tilde{\pi}(x,{-\xi})K_{\rho}(x,{-\xi};x,\xi) \\ & =& \tilde{\pi}(x,\xi) \bigg\{ Q(x,x) + (1-\rho) \sum\limits_{z} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) \\ && + \rho\sum\limits_{z} Q(x,z)(1-A_{-\xi{\Gamma}}(x,z)) \bigg\} + \sum\limits_{y \neq x} \tilde{\pi}(y,\xi)Q(y,x)A_{\xi{\Gamma}}(y,x) \end{array} $$

(37)

the second equality coming from the fact that K_ρ(y, −ξ; x, ξ) ≠ 0 if and only if x = y and the third from the fact that $\tilde {\pi }(x,\xi ) = \tilde {\pi }(x,-\xi ) = \pi (x)/2$. Now, let $A(x,\xi ) := {\sum }_{y \neq x} \tilde {\pi }(y,\xi )Q(y,x)A_{\xi {\Gamma }}(y,x)$ and note that:

(38)

Assumption 2 together with the fact that π(x) > 0 for all $x\in \mathcal {S}$ yields π(y)Q(y, x) > 0 if and only if π(x)Q(x, y) > 0. It can also be noted that the lower-bound condition on Γ implies that Γ(x, y) = 0 if Q(x, y) = 0. This leads to

$$ \begin{array}{@{}rcl@{}} A(x,\xi) &=& (1/2) \underset{\pi(x)Q(x,y)>0}{\underset{y \neq x}{\sum}} \pi(x)Q(x,y) A_{\xi{\Gamma}}(x,y)+(\xi/2) \sum\limits_{y \neq x}{\Gamma}(y,x)\\ &=&\tilde{\pi}(x,\xi) \sum\limits_{y \neq x} Q(x,y)A_{\xi{\Gamma}}(x,y) \end{array} $$

(39)

since for all $x\in \mathcal {S}$, ${\sum }_{y\in \mathcal {S}}{\Gamma }(x,y)=0$. Similarly, define

$$ B(x,\xi):= \tilde{\pi}(x,\xi) \sum\limits_{z} Q(x,z) \left\{(1-\rho)(1-A_{\xi{\Gamma}}(x,z)) + \rho(1-A_{-\xi{\Gamma}}(x,z)) \right\} . $$

Using Lemma 10, we have:

$$ \begin{array}{@{}rcl@{}} B(x,\xi) &&= \tilde{\pi}(x,\xi) \sum\limits_{z\in\mathcal{S}} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) \end{array} $$

(40)

$$ \begin{array}{@{}rcl@{}} &&= \tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) , \\ &&= \tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z)-A(x,\xi) , \end{array} $$

(41)

where the penultimate equality follows from A_Γ(x, x) = 1 for all $x\in \mathcal {S}$. Finally, combining Eqs. (37) and (40), we obtain:

$$ \begin{array}{@{}rcl@{}} \sum\limits_{y,\eta}\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi)&&=\tilde{\pi}(x,\xi) Q(x,x)A_{\xi{\Gamma}}(x,x)+B(x,\xi)+A(x,\xi) ,\\ &&=\tilde{\pi}(x,\xi) Q(x,x)+\tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z) ,\\ &&=\tilde{\pi}(x,\xi) , \end{array} $$

since ${\sum }_{y\in \mathcal {S}}Q(x,y)=1$, for all $x\in \mathcal {S}$. We now study the $\tilde {\pi }$-reversibility of K_ρ, i.e. conditions on Γ^ξ such that for all $(x,y)\in \mathcal {S}^{2}$ and (ξ, η) ∈{− 1, 1}² such that (x, ξ)≠(y, η), we have:

$$ \tilde{\pi}(x,\xi)K_{\rho}(x,\xi;y,\eta)=\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi) . $$

(42)

First note that if x = y and ξ = −η, then Eq. (42) is equivalent to

$$ \sum\limits_{z\in\mathcal{S}}Q(x,z)\left( A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\right)=0 $$

which is true from Lemma 10 and the fact that π is non-zero almost everywhere. Second, for x ≠ y and ξ = −η, Eq. (42) is trivially true by definition of K_ρ, see (24). Hence, condition(s) on the vorticity matrix to ensure $\tilde {\pi }$-reversibility are to be investigated only for the case ξ = η and x≠y. In such a case Eq. (42) is equivalent to

$$ \pi(x)Q(x,y)A_{\xi{\Gamma}}(x,y)=\pi(y)Q(y,x)A_{-\xi{\Gamma}^{\xi}}(y,x) , $$

which is equivalent Γ = 0. Hence K_ρ is $\tilde {\pi }$-reversible if and only if Γ = 0. □

Lemma 10

Under the Assumptions of Proposition 7, we have for all$x\in \mathcal {S}$and ξ ∈ {− 1, 1}

$$ \pi(x)\sum\limits_{z\in\mathcal{S}} Q(x,z) \left\{A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\right\}=0 . $$

Proof

Using that for three real numbers a, b, c, we have a ∧ b = (a − c ∧ b − c) + c, together with the fact that Γ(x, y) = −Γ(y, x), we have:

$$ \begin{array}{@{}rcl@{}} \pi(x)Q(x,y)A_{\xi{\Gamma}}(x,y) && = \pi(x)Q(x,y) \left\{ 1 \wedge \frac{\xi{\Gamma}(x,y) + \pi(y)Q(y,x)}{\pi(x)Q(x,y)} \right\} , \\ && = \pi(y)Q(y,x) \left\{ 1 \wedge \frac{\xi{\Gamma}(y,x) + \pi(x)Q(x,y)}{\pi(y)Q(y,x)} \right\} + \xi{\Gamma}(x,y) , \\ && = \pi(y)Q(y,x)A_{\xi{\Gamma}}(y,x) + \xi{\Gamma}(x,y) . \end{array} $$

(43)

The proof follows from combining the skew-detailed balance Eqs. (21) and (43):

$$ \begin{array}{@{}rcl@{}} &&\pi(x)\sum\limits_{z\in\mathcal{S}} Q(x,z)\{A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\} \\ &=&\sum\limits_{z\in\mathcal{S}} \left\{\pi(x)Q(x,z)A_{\xi{\Gamma}}(x,z) - \pi(x)Q(x,z)A_{-\xi{\Gamma}}(x,z)\right\} , \\ & =& \sum\limits_{z\in\mathcal{S}} \left\{\pi(x)Q(x,z)A_{\xi{\Gamma}}(x,z) - \pi(z)Q(z,x)A_{\xi{\Gamma}}(z,x) \right\} ,\\ & =& \sum\limits_{z\in\mathcal{S}} \xi{\Gamma}(x,z) ,\\ &=&0 . \end{array} $$

□

Appendix H: Illustration of NRMHAV on Example 2

Appendix I: Generation of vorticity matrices on S × S grids

We detail a method to generate vorticity matrices satisfying Assumption 1 in the context of Example 4. In the general case of a random walk on an S × S grid, Γ_ζ is an S² × S² matrix that can be constructed systematically using the properties that Γ_ζ(x, y) = −Γ_ζ(y, x) for all $(x,y)\in \mathcal {S}^{2}$ and Γ_ζ1 = 0. It has a block-diagonal structure:

$$ {\Gamma}_{\zeta} = \left( \begin{array}{ccccc} B & 0 & 0 & {\cdots} & 0 \\ 0 & B & 0 & {\cdots} & 0 \\ 0 & 0 & B & & 0 \\ {\vdots} & {\vdots} & & {\ddots} & \vdots \end{array}\right) $$

(44)

where each 2S × 2S diagonal block B has the following structure:

$$ B = \left( \begin{array}{cc} B_{D} & B_{OD} \\ -B_{OD} & -B_{D} \end{array}\right) $$

(45)

where

$$ B_{D} = \left( \begin{array}{ccccccc} 0 & -\zeta & 0 & 0 & {\cdots} & 0 & 0\\ \zeta & 0 & -\zeta & 0 & {\cdots} & 0 & 0 \\ 0 & \zeta & 0 & -\zeta & 0 & {\cdots} & 0 \\ {\vdots} & {\ddots} & {\ddots} & {\ddots} & {\ddots} & {\ddots} & {\vdots} \\ 0 & {\cdots} & 0 & \zeta & 0 & -\zeta & 0 \\ 0 & 0 & {\cdots} & 0 & \zeta & 0 & -\zeta \\ 0 & 0 & {\cdots} & 0 & 0 & \zeta & 0 \end{array}\right) $$

and

$$ B_{OD} = \left( \begin{array}{ccccccc} \zeta & 0 & & & {\cdots} & & 0 \\ 0 & 0 &&& {\cdots} && 0 \\ {\vdots} &&& {\ddots} &&& {\vdots} \\ 0 &&& {\cdots} && 0 & 0 \\ 0 & & & {\cdots} & & 0 & -\zeta \end{array}\right) $$

and ζ is such that the MH ratio (22) is always non-negative. The vorticity matrix is of size S² × S², meaning that the number of diagonal blocks varies upon S:

ifSis even:$\exists k \in \mathbb {N} \text { s.t. } s = 2k ~ \Rightarrow ~ s^{2} = 4k^{2}$ and each block B is a square matrix of dimension 4k, then there are exactly kB-blocks in the vorticity matrix Γ_ζ;
ifSis odd:$\exists k \in \mathbb {N} \text { s.t. } s = 2k+1 ~ \Rightarrow ~ s^{2} = (2k+1)^{2}$ and each block B is a square matrix of dimension 2(2k + 1), then as $\frac {(2k+1)^{2}}{2(2k+1)} = k + \frac {1}{2}$, Γ_ζ is made of kB-blocks and the last terms of the diagonal are completed with zeros.

For instance, if S = 3 (resp. if S = 4), the vorticity matrix is given by ${\Gamma }_{\zeta }^{(3)}$ (resp. ${\Gamma }_{\zeta }^{(4)}$) as follows:

$$ {\Gamma}_{\zeta}^{(4)} = \left( \begin{array}{cc} B_{4} & \boldsymbol{0}_{8} \\ \boldsymbol{0}_{8} & B_{4} \end{array}\right) $$

where

$$\scriptsize B_{4} = \left( \begin{array}{cccccccc} 0 & -\zeta & 0 & 0 & \zeta & 0 & 0 & 0 \\ \zeta & 0 & -\zeta & 0 & 0 & 0 & 0 & 0 \\ 0 & \zeta & 0 & -\zeta & 0 & 0 & 0 & 0 \\ 0 & 0 & \zeta & 0 & 0 & 0 & 0 & -\zeta \\ -\zeta & 0 & 0 & 0 & 0 & \zeta & 0 & 0 \\ 0 & 0 & 0 & 0 & -\zeta & 0 & \zeta & 0 \\ 0 & 0 & 0 & 0 & 0 & -\zeta & 0 & \zeta \\ 0 & 0 & 0 & \zeta & 0 & 0 & -\zeta & 0 \end{array}\right) $$

and 0_m stands for the zero-matrix of size m × m.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vialaret, M., Maire, F. On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods. Methodol Comput Appl Probab 22, 1349–1387 (2020). https://doi.org/10.1007/s11009-019-09766-w

Download citation

Received: 17 September 2018
Revised: 04 December 2019
Accepted: 08 December 2019
Published: 15 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11009-019-09766-w

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods

Abstract

Access this article

Similar content being viewed by others

Improving the Convergence of Reversible Samplers