Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods

  • 10 Accesses

Abstract

It is commonly admitted that non-reversible Markov chain Monte Carlo (MCMC) algorithms usually yield more accurate MCMC estimators than their reversible counterparts. In this note, we show that in addition to their variance reduction effect, some non-reversible MCMC algorithms have also the undesirable property to slow down the convergence of the Markov chain. This point, which has been overlooked by the literature, has obvious practical implications. We illustrate this phenomenon for different non-reversible versions of the Metropolis-Hastings algorithm on several discrete state space examples and discuss ways to mitigate the risk of a small asymptotic variance/slow convergence scenario.

This is a preview of subscription content, log in to check access.

References

  1. Andrieu C, Durmus A, Nüsken N, Roussel J (2018) Hypercoercivity of piecewise deterministic markov process-monte carlo. arXiv:1808.08592

  2. Andrieu C, Livingstone S (2019) Peskun-Tierney ordering for Markov chain and process Monte Carlo: beyond the reversible scenario. arXiv:1906.06197

  3. Bierkens J (2016) Non-reversible Metropolis-Hastings. Stat Comput 26(6):1213–1228. https://doi.org/10.1007/s11222-015-9598-x

  4. Bierkens J, Fearnhead P, Roberts G (2019) The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann Stat 47(3):1288–1320

  5. Bouchard-Côté A, Vollmer SJ, Doucet A (2017) The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. Journal of the American Statistical Association

  6. Chen F, Lovász L, Pak I (1999) Lifting Markov chains to speed up mixing. In: STOC’99. Citeseer

  7. Chen T-L, Hwang C-R (2013) Accelerating reversible Markov chains. Stat Probabil Lett 83(9):1956–1962

  8. Diaconis P, Holmes S, Neal RM (2000) Analysis of a nonreversible Markov chain sampler. Ann Appl Probab 10(3):726–752. http://www.jstor.org/stable/2667319

  9. Diaconis P, Miclo L (2013) On the spectral analysis of second-order Markov chains. Annales de la faculté des sciences de toulouse: Mathématiques 22:573–621

  10. Diaconis P, Stroock D et al (1991) Geometric bounds for eigenvalues of Markov chains. Ann Appl Probab 1(1):36–61

  11. Duncan A, Nüsken N, Pavliotis G (2017) Using perturbed underdamped langevin dynamics to efficiently sample from probability distributions. J Stat Phys 169 (6):1098–1131

  12. Fill JA et al (1991) Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. Ann Appl Probab 1(1):62–87

  13. Gadat S, Miclo L (2013) Spectral decompositions and l2-operator norms of toy hypocoercive semi-groups. Kinet Relat Mod 6(2):317–372

  14. Gustafson P (1998) A guided walk Metropolis algorithm. Stat Comput 8(4):357–364

  15. Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

  16. Horowitz AM (1991) A generalized guided Monte Carlo algorithm. Phys Lett B 268(2):247–252

  17. Hwang C-R, Hwang-Ma S-Y, Sheu S-J, et al (2005) Accelerating diffusions. Ann Appl Probab 15(2):1433–1444

  18. Hwang C-R, Normand R, Wu S-J (2015) Variance reduction for diffusions. Stoch Process Appl 125(9):3522–3540

  19. Iosifescu M (2014) Finite Markov processes and their applications. Courier Corporation

  20. Łatuszyński K, Miasojedow B, Niemiro W et al (2013) Nonasymptotic bounds on the estimation error of MCMC algorithms. Bernoulli 19(5A):2033–2066

  21. Lelièvre T, Nier F, Pavliotis GA (2013) Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J Stat Phys 152(2):237–274

  22. Ma Y-A, Fox EB, Chen T, Wu L (2019) Irreversible samplers from jump and continuous Markov processes. Stat Comput 29(1):177–202

  23. Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092

  24. Meyn SP, Tweedie RL et al (1994) Computable bounds for geometric convergence rates of Markov chains. Ann Appl Probab 4(4):981–1011

  25. Miclo L, Monmarché P (2013) Étude spectrale minutieuse de processus moins indécis que les autres. In: Séminaire de Probabilités XLV. Springer, pp 459–481

  26. Mira A, Geyer CJ (2000) On non-reversible markov chains. Monte Carlo Methods. Fields Institute/AMS, pp 95–110

  27. Neal RM (2004) Improving asymptotic variance of MCMC estimators: Non-reversible chains are better. arXiv:math/0407281

  28. Plummer M, Best N, Cowles K, Vines K (2006) CODA: Convergence diagnosis and output analysis for MCMC. R news 6(1):7–11

  29. Poncet R (2017) Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833

  30. Ramanan K, Smith A (2018) Bounds on lifting continuous-state Markov chains to speed up mixing. J Theor Probab 31(3):1647–1678

  31. Rosenthal JS (1995) Minorization conditions and convergence rates for Markov chain Monte Carlo. J Am Stat Assoc 90(430):558–566

  32. Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98(461):169–177

  33. Sakai Y, Hukushima K (2016) Eigenvalue analysis of an irreversible random walk with skew detailed balance conditions. Phys Rev E 93(4):043318

  34. Sherlock C, Thiery AH (2017) A discrete bouncy particle sampler. arXiv:1707.05200

  35. Sun Y, Schmidhuber J, Gomez FJ (2010) Improving the asymptotic performance of Markov chain Monte Carlo by inserting vortices. In: Advances in Neural Information Processing Systems. pp 2235–2243

  36. Tierney Luke (1998) A note on Metropolis-Hastings kernels for general state spaces. Annals of applied probability, 1–9

  37. Turitsyn KS, Chertkov M, Vucelja M (2011) Irreversible monte carlo algorithms for efficient sampling. Physica D: Nonlinear Phenomena 240(4-5):410–414

  38. Vanetti P, Bouchard-Côté A, Deligiannidis G, Doucet A (2018) Piecewise-deterministic Markov Chain Monte Carlo. arXiv:1707.05296

  39. Vucelja M (2016) Lifting – A nonreversible Markov chain Monte Carlo algorithm. Am J Phys 84(958). https://doi.org/10.1119/1.4961596

  40. Yuen WK (2000) Applications of geometric bounds to the convergence rate of Markov chains on rn. Stochastic Processes and Their Applications 87:1–23

Download references

Acknowledgements

This research work was been partially funded by ENSAE ParisTech, the Insight Center for Data Analytics at University College Dublin and NSERC of Canada. The Authors thank the editors and two anonymous referees for many constructive comments that improved the article.

Author information

Correspondence to Florian Maire.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Lifted non-reversible Markov chain

Fig. 13
figure13

(Example 1) GW Markov chain transition. Each circle corresponds to one state with (x, ξ) ∈ {1, … , S} × {− 1, 1}. The top and bottom row correspond respectively to ξ = 1, i.e. counter-clockwise inertia and to ξ = − 1, i.e. clockwise inertia

Fig. 14
figure14

(Exemple 2) MH Markov chain (top) and Guided Walk Markov chain (bottom). For the MH chain, the probability to remain in each state is implicit. For the GW chain, the second coordinate of each state indicates the value of the auxiliary variable ξ

Fig. 15
figure15

(Exemple 2) Left: Comparison of the mixing time for GW and GW-lifted (see Andrieu and Livingstone 2019) in function of S. Here the mixing time τ is defined as \(\tau (P):=\inf \{t\in \mathbb {N} : \|\delta _{1} P^{t}-\pi \|\leq \epsilon \}\) with 𝜖 = 10− 5. Right: Comparison of GW and GW-lifted asymptotic variances for some test functions

Appendix B: Marginal non-reversible Markov chain

Fig. 16
figure16

(Example 1) Illustration of MH (Alg. 1) and non-reversible MH (Alg. 3) with S = 50, ρ = 0.1 and \(\zeta =\zeta _{\max \limits }\). First row – Convergence in total variation: the blue plot is MH and the red one is NRMH. The distribution of the Monte Carlo estimate was obtained using L = 1, 000 independent chains starting from π and length T = 10, 000 for both algorithms. Other test functions of the type for \(i\in \mathcal {S}\) gave similar results. Second row – Illustration of a particular sample path of length T = 10, 000 for both Markov chains. For better visibility of the two sample paths, the left and centre plots represent the function \(\{(1+t/T)\cos \limits (2\pi X_{t}/p),(1+t/T)\sin \limits (2\pi X_{t}/p)\}\) for t = 1, … , T for the MH and NRMH Markov chains, respectively. This shows that NRMH does explore the circle more efficiently

Appendix C: Proof of Proposition 3

We first need to proof the following Lemma.

Lemma 9

The conductance of the MH Markov chain of Example 2 satisfies

$$ \frac{1+\sqrt{1+2S(S+1)}}{S(S+1)}\leq h(P)\leq \frac{2}{S+1} . $$
(28)

Proof

Let for all \(A\in \mathfrak {S}\), \(\psi (A):={{\sum }_{x\in A}\pi (x)P(x,\bar {A})}\slash {\pi (A)\wedge (1-\pi (A))}\) be the quantity to minimize. A close analysis of the MH Markov chain displayed at the top panel of Fig. 14 shows that the set A which minimizes ψ(A) has the form A = (a1, a1 + 1, … , a2) for some Sa2a1 ≥ 1. Indeed, since the Markov chain moves to neighbouring states only there are only two ways to exit A for each transition. Since each way to exit A contributes at the same order of magnitude to the numerator, taking contiguous states minimizes it and in particular

$$ \sum\limits_{x\in A}\pi(x)P(x,\bar{A})=\pi(a_{1})\frac{1\vee (a_{1}-1)}{2a_{1}}+\pi(a_{2})\frac{1}{2}=\frac{1\wedge (a_{1}-1)+a_{2}}{S(S+1)} , $$

so that for any a1 < a2 satisfying π(A) < 1/2, we have:

$$ \psi(A)\geq \frac{1\vee (a_{1}-1)+a_{2}}{a_{2}(a_{2}+1)-a_{1}(a_{1}-1)} $$
(29)

since

$$ \pi(A)=\frac{2}{S(S+1)}\sum\limits_{k=a_{1}}^{a_{2}}k=\frac{a_{2}(a_{2}+1)-a_{1}(a_{1}-1)}{S(S+1)} . $$

Fix a1 and treat a2 as a function of a1 satisfying π(A) < 1/2. On the one hand, note that for all a1 the function mapping a2 to the RHS of Eq. (29) is decreasing. On the other hand, we have that \(\pi (A)<1/2\Leftrightarrow {a_{2}^{\ast }(a_{2}^{\ast }+1)-a_{1}(a_{1}-1)}<S(S+1)/2\), which yields

$$ a_{2}\leq a_{2}^{\ast}(a_{1}):=\left\lfloor\frac{-1+V(a_{1},S)}{2}\right\rfloor ,\quad V(a_{1},S):=\sqrt{1+2S(S+1)+4a_{1}(a_{1}-1)} . $$

Hence, for all a1, the RHS of Eq. (29) is lower bounded by

$$ \begin{array}{@{}rcl@{}} \frac{4(1\vee (a_{1}-1))-2+2V(a_{1},S)}{(-1+V(a_{1},S))(1+V(a_{1},S))-4a_{1}(a_{1}-1)} =\frac{4(1\vee (a_{1}-1))-2+2V(a_{1},S)}{V(a_{1},S)^{2}-1-4a_{1}(a_{1}-1)}\\ =\frac{2(1\vee (a_{1}-1))-1+V(a_{1},S)}{S(S+1)} . \end{array} $$

Clearly, the numerator is an increasing function of a1 and is thus minimized for a1 = 1, which gives the lower bound of Eq. (28). Finally, by definition h(P) is upper bounded by ψ(A) for any \(A\in \mathfrak {S}\) satisfying π(A) < 1/2. In particular, taking A = (1, 2, … , (S − 1)/2) gives the upper bound of Eq. (28). □

Proof

Since PMH is reversible and aperiodic its spectrum is real with any eigenvalue different to one \(\lambda \in {\Lambda }_{|\boldsymbol {1}^{\perp }}:=\text {Sp}(P_{\text {MH}})\backslash \{1\}\) satisfying − 1 < λ < 1. The norm of PMH as an operator on the non-constant functions of L2(π) is \(\gamma :=\max \limits \{\sup {\Lambda }_{|\boldsymbol {1}^{\perp }},|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\}\). It is well known (see e.g. Yuen 2000) that

$$ \|\delta_{1}P_{\text{MH}}^{t}-\pi\|_{2}\leq \|\delta_{1}-\pi\|_{2}\gamma^{t} . $$

It can be readily checked that ∥δ1π2 corresponds to the first factor on the RHS of Eq. (13). The tedious part of the proof is to bound γ. Using again the reversibility, the Cheeger’s inequality, (see e.g. Diaconis et al. (1991) for a proof), writes

$$ 1-2h(P)\leq \sup{\Lambda}\leq 1-h(P)^{2} , $$
(30)

where h(P) is the Markov chain conductance defined as

$$ h(P)=\underset{\pi(A)<1/2}{\underset{A\in\mathfrak{S}}{\inf}} \frac{{\sum}_{x\in A}\pi(x)P(x,\bar{A})}{\pi(A)} . $$

Combining Cheeger’s inequality and Lemma 9 yields

$$ \sup{\Lambda}\leq 1-\frac{2}{S(S+1)} . $$
(31)

However, to use the above bound to upper bound γ, we need to check that \(\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq |\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\). In general, bounding \(|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\) proves to be more challenging than \(\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\). However, in the context of this example, we can use the bound derived in Proposition 2 of Diaconis et al. (1991). It is based on a geometric interpretation of the Markov chain as a non bipartite graph with vertices (states) connected by edges (transitions), as illustrated in Fig. 14. More precisely, the main result of this work to our interest states that

$$ \inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq -1+\frac{2}{\iota(P)} , $$
(32)

with \(\iota (P)=\max \limits _{e_{a,b}\in {\Gamma }}{\sum }_{\sigma _{x}\ni e_{a,b}}|\sigma _{x}|\pi (x)\), where

  • ea, b is the edge corresponding to the transition from state a to b,

  • σx is a path of odd length going from state x to itself, including a self-loop provided that P(x, x) > 0, and more generally \(\sigma _{x}=(e_{x,a_{1}},e_{a_{1},a_{2}},\ldots ,e_{a_{\ell },a_{x}})\) with even.

  • Γ is a collection of paths {σ1, … , σS} including exactly one path for each state,

  • |σx| represents the “length” of path σx and is formally defined as

    $$ |\sigma_{x}|=\sum\limits_{e_{a,b}\in\sigma_{x}}\frac{1}{\pi(a) P(a,b)} . $$

Let us consider the collection of paths Γ consisting of all the self loops for all states x ≥ 2. It can be readily checked that the length of such paths is

$$ |\sigma_{x}|=(\pi(x)P(x,x))^{-1}=\left( \frac{x}{\Delta}\frac{1}{2x}\right)^{-1}=S(S+1) . $$

For state x = 1, let us consider the path consisting of the walk around the circle σ1 : (e1,2, e2,3, … , eS,1). It may have been possible to take the path e1,2, e2,2, e2,1, but it is unclear if paths using the same edge twice are permitted in the framework of Prop. 2 of Diaconis et al. (1991). The length of path σ1 is

$$ \begin{array}{@{}rcl@{}} |\sigma_{1}|=\frac{1}{\pi(1)P(1,2)}+\cdots+\frac{1}{\pi(S)P(S,1)}\\ =S(S+1)+\frac{S(S+1)}{2}+{\cdots} \frac{S(S+1)}{S-1}+S(S+1)=S(S+1)\left( 1+\sum\limits_{k=1}^{S}\frac{1}{k}\right) . \end{array} $$

We are now in a position to calculate ι(P). First note that, by construction, each edge belonging to any path σk contained in Γ appears once and only once. Hence, the constant ι(P) simplifies to the maximum of the set {|σx|π(x), σx ∈Γ} that is

$$ \max\left\{2\left( 1+\sum\limits_{\ell=1}^{S}\frac{1}{\ell}\right),2k : 2\leq k\leq S\right\} =2S , $$
(33)

since on the one hand \({\sum }_{\ell =1}^{S} {1}\slash {\ell }\leq 1+\log (S)\) and on the other hand S ≥ 5. Combining Eqs. (32) and (33) yields to

$$ \inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq -1+\frac{1}{S} . $$
(34)

It comes that if \(\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq 0\), then \(\gamma \leq \sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\) and otherwise we have

$$ 0>\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq-1+\frac{1}{S}\Leftrightarrow 0>\inf {\Lambda}_{|\boldsymbol{1}^{\perp}} \text{and} \left|\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\right|\leq1-\frac{1}{S} , $$

which combines with Eq. (31) to complete the proof as

$$ \max\{\sup{\Lambda}_{|\boldsymbol{1}^{\perp}},|\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}|\}\leq 1-\frac{1}{S}\vee 1-\frac{2}{S(S+1)} \leq 1-\frac{2}{S(S+1)} , $$

since S ≥ 5. □

Appendix D: Proof of Proposition 4

Proof

By straightforward calculation we have:

$$ \left\|\delta_{1} P_{\text{GW}}^{(S-1)}(\cdot \times \{-1,1\})-\pi\right\|_{2}^{2}=1-\frac{8}{3S}+o(1/S) . $$
(35)

Using Proposition 3, we have that

$$ \begin{array}{@{}rcl@{}} &&\left\|\delta_{1} P_{\text{MH}}^{(S-1)}-\pi\right\|_{2}^{2}\leq \left\{1-\frac{4}{S(S+1)}+\frac{2(2S+1)}{3S(S+1)}\right\} \left( 1-\frac{2}{S(S+1)}\right)^{S-1}\\ &=&1-\frac{2}{3S}+o(1/S) . \end{array} $$

Comparing the complexity of the former bound with Eq. (35), the inequality of Eq. (14) cannot be concluded. In fact, we need to refine the bound for the MH convergence. Analysing the proof of Lemma 9, the lower bound of the conductance seems rather tight as resulting from taking the real bound on \(a_{2}^{\ast }(a_{1})\) as opposed to the floor of it. To illustrate this statement, the value of the bound is compared to the actual conductance for some moderate size of S, the calculation being otherwise too costly. Then, we calculated the numerical value of \(\sup {\Lambda }_{|1^{\perp }}\) for S ≤ 500 and compared with the lower bound derived from Cheeger’s inequality in the proof of Prop. 3. It appears that the Cheeger’s bound is in this example too lose to justify Eq. (14). However, taking a finer lower bound such as

$$ \sup{\Lambda}_{|1^{\perp}}\leq 1- 8/S^{2} , $$

yields

$$ \left\|\delta_{1} P_{\text{MH}}^{(S-1)}-\pi\right\|_{2}^{2}\leq1-\frac{20}{3S}+o(1/S) $$

which concludes the proof. □

Fig. 17
figure17

(Example 2): Conductance h(P) and different approximations, including the lower bound \(\sqrt {2}/S\) derived in Lemma 9 (left) and comparison of \(\sup {\Lambda }_{|1^{\perp }}\) with the upper bound derived in Proposition 3 2/S2 and an estimated finer upper bound 8/S2. For readability, we have represented one minus these quantities and in log scale, hence upper bounds become lower bounds

Appendix E: Proof of Proposition 6

Proof

First, denote by R the mixture of the two NRMH kernels with weight 1/2. We start by showing that this kernel is π-reversible. Indeed, the subkernel of R satisfies:

$$ \begin{array}{@{}rcl@{}} &&\pi(\mathrm{d} x) Q(x,\mathrm{d} y)(A_{\Gamma}(x,y)+A_{-{\Gamma}}(x,y)) \\ &=& \mathrm{d} x\mathrm{d} y \big(\left[\pi(x)Q(x,y)\wedge\pi(y)Q(y,x)+{\Gamma}(x,y)\right]\\ &&+\left[\pi(x)Q(x,y)\wedge\pi(y)Q(y,x)-{\Gamma}(x,y)\right]\big)\\ &=&\mathrm{d} x\mathrm{d} y \big(\left[\pi(x)Q(x,y)-{\Gamma}(x,y)\wedge\pi(y)Q(y,x)\right]\\ &&+\left[\pi(x)Q(x,y)+{\Gamma}(x,y)\wedge\pi(y)Q(y,x)\right]\big)\\ &=&\pi(\mathrm{d} y)Q(y,\mathrm{d} x) \left( \left[\frac{\pi(x)Q(x,y)-{\Gamma}(x,y)}{\pi(y)Q(y,x)}\wedge 1\right]+ \left[\frac{\pi(x)Q(x,y)+{\Gamma}(x,y)}{\pi(y)Q(y,x)}\wedge1\right]\right) . \end{array} $$

Now, note that for all \(x\in \mathcal {S}\) and all \(A\in \mathfrak {S}\),

$$ R(x,A\backslash\{x\})=\frac{1}{2}{\int}_{A\backslash\{x\}}Q(x,\mathrm{d} z)(A_{\Gamma}(x,z)+A_{-{\Gamma}}(x,z)) $$

and since for any two positive number a and b, (1 ∧ a) + (1 ∧ b) ≤ 2 ∧ (a + b), we have all \((x,z)\in \mathcal {S}^{2}\),

$$ \begin{array}{@{}rcl@{}} A_{\Gamma}(x,z)+A_{-{\Gamma}}(x,z)&=&1\wedge \frac{\pi(y)Q(y,x)+{\Gamma}(x,y)}{\pi(x)Q(x,y)}+ 1\wedge \frac{\pi(y)Q(y,x)-{\Gamma}(x,y)}{\pi(x)Q(x,y)}\\ &\leq& 2\left( 1\wedge\frac{\pi(y)Q(y,x)}{\pi(x)Q(x,y)}\right) \end{array} $$

since by Assumption 2, π(y)Q(y, x) + Γ(x, y) ≥ 0 for all \((x,y)\in \mathcal {S}^{2}\). This yields a Peskun-Tierney ordering RPMH, since

$$ R(x,A\backslash\{x\})\leq \frac{1}{2}\int Q(x,\mathrm{d} z)\left( 1\wedge\frac{\pi(y)Q(y,x)}{\pi(x)Q(x,y)}\right)=P_{\text{MH}}(x,A\backslash\{x\}) $$

and the proof is concluded by applying Theorem 4 of Tierney (1998). □

Appendix F: Proof of Proposition 7

Proof

Note that if Γ1 satisfies Assumptions 1 and 2 then

$$ \begin{array}{@{}rcl@{}} &&\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\\ &=& {\Gamma}_{1}(x,y)+\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{1}(y,x)\right)\right] . \end{array} $$

Thus, if Γ1 and Γ− 1 satisfy Assumptions 1, 2 and 3 then

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{1}(x,y)&=&\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{-1}(y,x)\right)\right] \\ &&-\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{1}(y,x)\right)\right] . \end{array} $$

Hence, we have

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{1}(y,x)&=&\left[\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{-1}(x,y)\right)\right] \\ &&-\left[\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\right] ,\\ &=&\left[\left( \pi(x)Q(x,y)-{\Gamma}_{-1}(x,y)\right)\wedge \pi(y)Q(y,x)\right] \\ &&-\left[\left( \pi(x)Q(x,y)-{\Gamma}_{1}(x,y)\right)\wedge \pi(y)Q(y,x)\right]+{\Gamma}_{-1}(x,y)-{\Gamma}_{1}(x,y)\\ &=&{\Gamma}_{1}(x,y)+{\Gamma}_{-1}(x,y)-{\Gamma}_{1}(x,y) , \end{array} $$

and thus Γ− 1 = −Γ1, which replacing in Eq. (21) leads to

$$ \begin{array}{@{}rcl@{}} \pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\\ =\left( \pi(x)Q(x,y)+{\Gamma}_{1}(x,y)\right) \wedge \pi(y)Q(y,x) \end{array} $$
(36)

for all \((x,y)\in \mathcal {S}^{2}\). Conversely, it can be readily checked that if Γ1 satisfies Assumptions 1, 2 and Eq. (36), then setting Γ− 1 = −Γ1 implies that Γ1 and Γ− 1 satisfy Assumptions 1, 2 and the skew-detailed balance equation (Eq. (21)). The proof is concluded by noting that Eq. (36) holds if and only if Γ1 is the null operator on \(\mathcal {S}\times \mathcal {S}\) or Q is π- reversible. □

Appendix G: Proof of Proposition 8

We prove Proposition 8 that states that the transition kernel (24) of the Markov chain generated by Algorithm 4 is \(\tilde {\pi }\)-invariant and is non-reversible if and only if Γ = 0.

Proof

To prove the invariance of Kρ, we need to prove that

$$ \sum\limits_{y\in\mathcal{S},\eta\in\{-1,1\}} \tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi) = \tilde{\pi}(x,\xi) , $$

for all \((x,\xi )\in \mathcal {S}\times \{-1,1\}\) and ρ ∈ [0, 1].

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{y,\eta}\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi)\\ &=& \sum\limits_{y} \tilde{\pi}(y,\xi)K_{\rho}(y,\xi;x,\xi) + \sum\limits_{y} \tilde{\pi}(y,-\xi)K_{\rho}(y,-\xi;x,\xi) \\ &=& \tilde{\pi}(x,\xi)K_{\rho}(x,\xi;x,\xi) + \sum\limits_{y \neq x} \tilde{\pi}(y,\xi)K_{\rho}(y,\xi;x,\xi)+ \tilde{\pi}(x,{-\xi})K_{\rho}(x,{-\xi};x,\xi) \\ & =& \tilde{\pi}(x,\xi) \bigg\{ Q(x,x) + (1-\rho) \sum\limits_{z} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) \\ && + \rho\sum\limits_{z} Q(x,z)(1-A_{-\xi{\Gamma}}(x,z)) \bigg\} + \sum\limits_{y \neq x} \tilde{\pi}(y,\xi)Q(y,x)A_{\xi{\Gamma}}(y,x) \end{array} $$
(37)

the second equality coming from the fact that Kρ(y, −ξ; x, ξ) ≠ 0 if and only if x = y and the third from the fact that \(\tilde {\pi }(x,\xi ) = \tilde {\pi }(x,-\xi ) = \pi (x)/2\). Now, let \(A(x,\xi ) := {\sum }_{y \neq x} \tilde {\pi }(y,\xi )Q(y,x)A_{\xi {\Gamma }}(y,x)\) and note that:

(38)

Assumption 2 together with the fact that π(x) > 0 for all \(x\in \mathcal {S}\) yields π(y)Q(y, x) > 0 if and only if π(x)Q(x, y) > 0. It can also be noted that the lower-bound condition on Γ implies that Γ(x, y) = 0 if Q(x, y) = 0. This leads to

$$ \begin{array}{@{}rcl@{}} A(x,\xi) &=& (1/2) \underset{\pi(x)Q(x,y)>0}{\underset{y \neq x}{\sum}} \pi(x)Q(x,y) A_{\xi{\Gamma}}(x,y)+(\xi/2) \sum\limits_{y \neq x}{\Gamma}(y,x)\\ &=&\tilde{\pi}(x,\xi) \sum\limits_{y \neq x} Q(x,y)A_{\xi{\Gamma}}(x,y) \end{array} $$
(39)

since for all \(x\in \mathcal {S}\), \({\sum }_{y\in \mathcal {S}}{\Gamma }(x,y)=0\). Similarly, define

$$ B(x,\xi):= \tilde{\pi}(x,\xi) \sum\limits_{z} Q(x,z) \left\{(1-\rho)(1-A_{\xi{\Gamma}}(x,z)) + \rho(1-A_{-\xi{\Gamma}}(x,z)) \right\} . $$

Using Lemma 10, we have:

$$ \begin{array}{@{}rcl@{}} B(x,\xi) &&= \tilde{\pi}(x,\xi) \sum\limits_{z\in\mathcal{S}} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) \end{array} $$
(40)
$$ \begin{array}{@{}rcl@{}} &&= \tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) , \\ &&= \tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z)-A(x,\xi) , \end{array} $$
(41)

where the penultimate equality follows from AΓ(x, x) = 1 for all \(x\in \mathcal {S}\). Finally, combining Eqs. (37) and (40), we obtain:

$$ \begin{array}{@{}rcl@{}} \sum\limits_{y,\eta}\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi)&&=\tilde{\pi}(x,\xi) Q(x,x)A_{\xi{\Gamma}}(x,x)+B(x,\xi)+A(x,\xi) ,\\ &&=\tilde{\pi}(x,\xi) Q(x,x)+\tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z) ,\\ &&=\tilde{\pi}(x,\xi) , \end{array} $$

since \({\sum }_{y\in \mathcal {S}}Q(x,y)=1\), for all \(x\in \mathcal {S}\). We now study the \(\tilde {\pi }\)-reversibility of Kρ, i.e. conditions on Γξ such that for all \((x,y)\in \mathcal {S}^{2}\) and (ξ, η) ∈{− 1, 1}2 such that (x, ξ)≠(y, η), we have:

$$ \tilde{\pi}(x,\xi)K_{\rho}(x,\xi;y,\eta)=\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi) . $$
(42)

First note that if x = y and ξ = −η, then Eq. (42) is equivalent to

$$ \sum\limits_{z\in\mathcal{S}}Q(x,z)\left( A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\right)=0 $$

which is true from Lemma 10 and the fact that π is non-zero almost everywhere. Second, for xy and ξ = −η, Eq. (42) is trivially true by definition of Kρ, see (24). Hence, condition(s) on the vorticity matrix to ensure \(\tilde {\pi }\)-reversibility are to be investigated only for the case ξ = η and xy. In such a case Eq. (42) is equivalent to

$$ \pi(x)Q(x,y)A_{\xi{\Gamma}}(x,y)=\pi(y)Q(y,x)A_{-\xi{\Gamma}^{\xi}}(y,x) , $$

which is equivalent Γ = 0. Hence Kρ is \(\tilde {\pi }\)-reversible if and only if Γ = 0. □

Lemma 10

Under the Assumptions of Proposition 7, we have for all\(x\in \mathcal {S}\)and ξ ∈ {− 1, 1}

$$ \pi(x)\sum\limits_{z\in\mathcal{S}} Q(x,z) \left\{A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\right\}=0 . $$

Proof

Using that for three real numbers a, b, c, we have ab = (acbc) + c, together with the fact that Γ(x, y) = −Γ(y, x), we have:

$$ \begin{array}{@{}rcl@{}} \pi(x)Q(x,y)A_{\xi{\Gamma}}(x,y) && = \pi(x)Q(x,y) \left\{ 1 \wedge \frac{\xi{\Gamma}(x,y) + \pi(y)Q(y,x)}{\pi(x)Q(x,y)} \right\} , \\ && = \pi(y)Q(y,x) \left\{ 1 \wedge \frac{\xi{\Gamma}(y,x) + \pi(x)Q(x,y)}{\pi(y)Q(y,x)} \right\} + \xi{\Gamma}(x,y) , \\ && = \pi(y)Q(y,x)A_{\xi{\Gamma}}(y,x) + \xi{\Gamma}(x,y) . \end{array} $$
(43)

The proof follows from combining the skew-detailed balance Eqs. (21) and (43):

$$ \begin{array}{@{}rcl@{}} &&\pi(x)\sum\limits_{z\in\mathcal{S}} Q(x,z)\{A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\} \\ &=&\sum\limits_{z\in\mathcal{S}} \left\{\pi(x)Q(x,z)A_{\xi{\Gamma}}(x,z) - \pi(x)Q(x,z)A_{-\xi{\Gamma}}(x,z)\right\} , \\ & =& \sum\limits_{z\in\mathcal{S}} \left\{\pi(x)Q(x,z)A_{\xi{\Gamma}}(x,z) - \pi(z)Q(z,x)A_{\xi{\Gamma}}(z,x) \right\} ,\\ & =& \sum\limits_{z\in\mathcal{S}} \xi{\Gamma}(x,z) ,\\ &=&0 . \end{array} $$

Appendix H: Illustration of NRMHAV on Example 2

Fig. 18
figure18

(Example 2) Mixing time of NRMHAV (Alg. 4) in function of ϱ ∈ [0, 1] and for S ∈ {7, 21, 51, 101}. Top: convergence of the lifted Markov chain \(\{(X_{t},\zeta _{t}), t\in \mathbb {N}\}\) to \(\tilde {\pi }\) (right) and convergence of the marginal sequence \(\{X_{t}, t\in \mathbb {N}\}\) to π (left). Bottom: comparison of the convergence of \(\{X_{t}, t\in \mathbb {N}\}\) for MH (plain line), NRMH with Γ (dashed), NRMH with −Γ (dotted) and NRMHAV (dashed with points), for S = 7 (black) and S = 51 (green is that not red?)

Appendix I: Generation of vorticity matrices on S × S grids

We detail a method to generate vorticity matrices satisfying Assumption 1 in the context of Example 4. In the general case of a random walk on an S × S grid, Γζ is an S2 × S2 matrix that can be constructed systematically using the properties that Γζ(x, y) = −Γζ(y, x) for all \((x,y)\in \mathcal {S}^{2}\) and Γζ1 = 0. It has a block-diagonal structure:

$$ {\Gamma}_{\zeta} = \left( \begin{array}{ccccc} B & 0 & 0 & {\cdots} & 0 \\ 0 & B & 0 & {\cdots} & 0 \\ 0 & 0 & B & & 0 \\ {\vdots} & {\vdots} & & {\ddots} & \vdots \end{array}\right) $$
(44)

where each 2S × 2S diagonal block B has the following structure:

$$ B = \left( \begin{array}{cc} B_{D} & B_{OD} \\ -B_{OD} & -B_{D} \end{array}\right) $$
(45)

where

$$ B_{D} = \left( \begin{array}{ccccccc} 0 & -\zeta & 0 & 0 & {\cdots} & 0 & 0\\ \zeta & 0 & -\zeta & 0 & {\cdots} & 0 & 0 \\ 0 & \zeta & 0 & -\zeta & 0 & {\cdots} & 0 \\ {\vdots} & {\ddots} & {\ddots} & {\ddots} & {\ddots} & {\ddots} & {\vdots} \\ 0 & {\cdots} & 0 & \zeta & 0 & -\zeta & 0 \\ 0 & 0 & {\cdots} & 0 & \zeta & 0 & -\zeta \\ 0 & 0 & {\cdots} & 0 & 0 & \zeta & 0 \end{array}\right) $$

and

$$ B_{OD} = \left( \begin{array}{ccccccc} \zeta & 0 & & & {\cdots} & & 0 \\ 0 & 0 &&& {\cdots} && 0 \\ {\vdots} &&& {\ddots} &&& {\vdots} \\ 0 &&& {\cdots} && 0 & 0 \\ 0 & & & {\cdots} & & 0 & -\zeta \end{array}\right) $$

and ζ is such that the MH ratio (22) is always non-negative. The vorticity matrix is of size S2 × S2, meaning that the number of diagonal blocks varies upon S:

  • ifSis even:\(\exists k \in \mathbb {N} \text { s.t. } s = 2k ~ \Rightarrow ~ s^{2} = 4k^{2}\) and each block B is a square matrix of dimension 4k, then there are exactly kB-blocks in the vorticity matrix Γζ;

  • ifSis odd:\(\exists k \in \mathbb {N} \text { s.t. } s = 2k+1 ~ \Rightarrow ~ s^{2} = (2k+1)^{2}\) and each block B is a square matrix of dimension 2(2k + 1), then as \(\frac {(2k+1)^{2}}{2(2k+1)} = k + \frac {1}{2}\), Γζ is made of kB-blocks and the last terms of the diagonal are completed with zeros.

For instance, if S = 3 (resp. if S = 4), the vorticity matrix is given by \({\Gamma }_{\zeta }^{(3)}\) (resp. \({\Gamma }_{\zeta }^{(4)}\)) as follows:

$$ {\Gamma}_{\zeta}^{(4)} = \left( \begin{array}{cc} B_{4} & \boldsymbol{0}_{8} \\ \boldsymbol{0}_{8} & B_{4} \end{array}\right) $$

where

$$\scriptsize B_{4} = \left( \begin{array}{cccccccc} 0 & -\zeta & 0 & 0 & \zeta & 0 & 0 & 0 \\ \zeta & 0 & -\zeta & 0 & 0 & 0 & 0 & 0 \\ 0 & \zeta & 0 & -\zeta & 0 & 0 & 0 & 0 \\ 0 & 0 & \zeta & 0 & 0 & 0 & 0 & -\zeta \\ -\zeta & 0 & 0 & 0 & 0 & \zeta & 0 & 0 \\ 0 & 0 & 0 & 0 & -\zeta & 0 & \zeta & 0 \\ 0 & 0 & 0 & 0 & 0 & -\zeta & 0 & \zeta \\ 0 & 0 & 0 & \zeta & 0 & 0 & -\zeta & 0 \end{array}\right) $$

and 0m stands for the zero-matrix of size m × m.

Fig. 19
figure19

Illustration of the generic vorticity matrix specified by the previous Algorithm in the case S = 4

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vialaret, M., Maire, F. On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods. Methodol Comput Appl Probab (2020). https://doi.org/10.1007/s11009-019-09766-w

Download citation

Keywords

  • MCMC algorithms
  • Non-reversible Markov chain
  • Variance reduction
  • Convergence rate

Mathematics Subject Classification (2010)

  • 60J22
  • 60J10
  • 60J20
  • 65C05