Skip to main content
Log in

On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods

  • Published:
Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Abstract

It is commonly admitted that non-reversible Markov chain Monte Carlo (MCMC) algorithms usually yield more accurate MCMC estimators than their reversible counterparts. In this note, we show that in addition to their variance reduction effect, some non-reversible MCMC algorithms have also the undesirable property to slow down the convergence of the Markov chain. This point, which has been overlooked by the literature, has obvious practical implications. We illustrate this phenomenon for different non-reversible versions of the Metropolis-Hastings algorithm on several discrete state space examples and discuss ways to mitigate the risk of a small asymptotic variance/slow convergence scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Andrieu C, Durmus A, Nüsken N, Roussel J (2018) Hypercoercivity of piecewise deterministic markov process-monte carlo. arXiv:1808.08592

  • Andrieu C, Livingstone S (2019) Peskun-Tierney ordering for Markov chain and process Monte Carlo: beyond the reversible scenario. arXiv:1906.06197

  • Bierkens J (2016) Non-reversible Metropolis-Hastings. Stat Comput 26(6):1213–1228. https://doi.org/10.1007/s11222-015-9598-x

    Article  MathSciNet  MATH  Google Scholar 

  • Bierkens J, Fearnhead P, Roberts G (2019) The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann Stat 47(3):1288–1320

    Article  MathSciNet  MATH  Google Scholar 

  • Bouchard-Côté A, Vollmer SJ, Doucet A (2017) The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. Journal of the American Statistical Association

  • Chen F, Lovász L, Pak I (1999) Lifting Markov chains to speed up mixing. In: STOC’99. Citeseer

  • Chen T-L, Hwang C-R (2013) Accelerating reversible Markov chains. Stat Probabil Lett 83(9):1956–1962

    Article  MathSciNet  MATH  Google Scholar 

  • Diaconis P, Holmes S, Neal RM (2000) Analysis of a nonreversible Markov chain sampler. Ann Appl Probab 10(3):726–752. http://www.jstor.org/stable/2667319

    Article  MathSciNet  MATH  Google Scholar 

  • Diaconis P, Miclo L (2013) On the spectral analysis of second-order Markov chains. Annales de la faculté des sciences de toulouse: Mathématiques 22:573–621

    Article  MathSciNet  MATH  Google Scholar 

  • Diaconis P, Stroock D et al (1991) Geometric bounds for eigenvalues of Markov chains. Ann Appl Probab 1(1):36–61

    Article  MathSciNet  MATH  Google Scholar 

  • Duncan A, Nüsken N, Pavliotis G (2017) Using perturbed underdamped langevin dynamics to efficiently sample from probability distributions. J Stat Phys 169 (6):1098–1131

    Article  MathSciNet  MATH  Google Scholar 

  • Fill JA et al (1991) Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. Ann Appl Probab 1(1):62–87

    Article  MathSciNet  MATH  Google Scholar 

  • Gadat S, Miclo L (2013) Spectral decompositions and l2-operator norms of toy hypocoercive semi-groups. Kinet Relat Mod 6(2):317–372

    Article  MATH  Google Scholar 

  • Gustafson P (1998) A guided walk Metropolis algorithm. Stat Comput 8(4):357–364

    Article  Google Scholar 

  • Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

    Article  MathSciNet  MATH  Google Scholar 

  • Horowitz AM (1991) A generalized guided Monte Carlo algorithm. Phys Lett B 268(2):247–252

    Article  Google Scholar 

  • Hwang C-R, Hwang-Ma S-Y, Sheu S-J, et al (2005) Accelerating diffusions. Ann Appl Probab 15(2):1433–1444

    Article  MathSciNet  MATH  Google Scholar 

  • Hwang C-R, Normand R, Wu S-J (2015) Variance reduction for diffusions. Stoch Process Appl 125(9):3522–3540

    Article  MathSciNet  MATH  Google Scholar 

  • Iosifescu M (2014) Finite Markov processes and their applications. Courier Corporation

  • Łatuszyński K, Miasojedow B, Niemiro W et al (2013) Nonasymptotic bounds on the estimation error of MCMC algorithms. Bernoulli 19(5A):2033–2066

    Article  MathSciNet  MATH  Google Scholar 

  • Lelièvre T, Nier F, Pavliotis GA (2013) Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J Stat Phys 152(2):237–274

    Article  MathSciNet  MATH  Google Scholar 

  • Ma Y-A, Fox EB, Chen T, Wu L (2019) Irreversible samplers from jump and continuous Markov processes. Stat Comput 29(1):177–202

    Article  MathSciNet  MATH  Google Scholar 

  • Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092

    Article  MATH  Google Scholar 

  • Meyn SP, Tweedie RL et al (1994) Computable bounds for geometric convergence rates of Markov chains. Ann Appl Probab 4(4):981–1011

    Article  MathSciNet  MATH  Google Scholar 

  • Miclo L, Monmarché P (2013) Étude spectrale minutieuse de processus moins indécis que les autres. In: Séminaire de Probabilités XLV. Springer, pp 459–481

  • Mira A, Geyer CJ (2000) On non-reversible markov chains. Monte Carlo Methods. Fields Institute/AMS, pp 95–110

  • Neal RM (2004) Improving asymptotic variance of MCMC estimators: Non-reversible chains are better. arXiv:math/0407281

  • Plummer M, Best N, Cowles K, Vines K (2006) CODA: Convergence diagnosis and output analysis for MCMC. R news 6(1):7–11

    Google Scholar 

  • Poncet R (2017) Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833

  • Ramanan K, Smith A (2018) Bounds on lifting continuous-state Markov chains to speed up mixing. J Theor Probab 31(3):1647–1678

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenthal JS (1995) Minorization conditions and convergence rates for Markov chain Monte Carlo. J Am Stat Assoc 90(430):558–566

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98(461):169–177

    Article  MathSciNet  MATH  Google Scholar 

  • Sakai Y, Hukushima K (2016) Eigenvalue analysis of an irreversible random walk with skew detailed balance conditions. Phys Rev E 93(4):043318

    Article  Google Scholar 

  • Sherlock C, Thiery AH (2017) A discrete bouncy particle sampler. arXiv:1707.05200

  • Sun Y, Schmidhuber J, Gomez FJ (2010) Improving the asymptotic performance of Markov chain Monte Carlo by inserting vortices. In: Advances in Neural Information Processing Systems. pp 2235–2243

  • Tierney Luke (1998) A note on Metropolis-Hastings kernels for general state spaces. Annals of applied probability, 1–9

  • Turitsyn KS, Chertkov M, Vucelja M (2011) Irreversible monte carlo algorithms for efficient sampling. Physica D: Nonlinear Phenomena 240(4-5):410–414

    Article  MATH  Google Scholar 

  • Vanetti P, Bouchard-Côté A, Deligiannidis G, Doucet A (2018) Piecewise-deterministic Markov Chain Monte Carlo. arXiv:1707.05296

  • Vucelja M (2016) Lifting – A nonreversible Markov chain Monte Carlo algorithm. Am J Phys 84(958). https://doi.org/10.1119/1.4961596

  • Yuen WK (2000) Applications of geometric bounds to the convergence rate of Markov chains on rn. Stochastic Processes and Their Applications 87:1–23

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research work was been partially funded by ENSAE ParisTech, the Insight Center for Data Analytics at University College Dublin and NSERC of Canada. The Authors thank the editors and two anonymous referees for many constructive comments that improved the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Maire.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Lifted non-reversible Markov chain

Fig. 13
figure 13

(Example 1) GW Markov chain transition. Each circle corresponds to one state with (x, ξ) ∈ {1, … , S} × {− 1, 1}. The top and bottom row correspond respectively to ξ = 1, i.e. counter-clockwise inertia and to ξ = − 1, i.e. clockwise inertia

Fig. 14
figure 14

(Exemple 2) MH Markov chain (top) and Guided Walk Markov chain (bottom). For the MH chain, the probability to remain in each state is implicit. For the GW chain, the second coordinate of each state indicates the value of the auxiliary variable ξ

Fig. 15
figure 15

(Exemple 2) Left: Comparison of the mixing time for GW and GW-lifted (see Andrieu and Livingstone 2019) in function of S. Here the mixing time τ is defined as \(\tau (P):=\inf \{t\in \mathbb {N} : \|\delta _{1} P^{t}-\pi \|\leq \epsilon \}\) with 𝜖 = 10− 5. Right: Comparison of GW and GW-lifted asymptotic variances for some test functions

Appendix B: Marginal non-reversible Markov chain

Fig. 16
figure 16

(Example 1) Illustration of MH (Alg. 1) and non-reversible MH (Alg. 3) with S = 50, ρ = 0.1 and \(\zeta =\zeta _{\max \limits }\). First row – Convergence in total variation: the blue plot is MH and the red one is NRMH. The distribution of the Monte Carlo estimate was obtained using L = 1, 000 independent chains starting from π and length T = 10, 000 for both algorithms. Other test functions of the type for \(i\in \mathcal {S}\) gave similar results. Second row – Illustration of a particular sample path of length T = 10, 000 for both Markov chains. For better visibility of the two sample paths, the left and centre plots represent the function \(\{(1+t/T)\cos \limits (2\pi X_{t}/p),(1+t/T)\sin \limits (2\pi X_{t}/p)\}\) for t = 1, … , T for the MH and NRMH Markov chains, respectively. This shows that NRMH does explore the circle more efficiently

Appendix C: Proof of Proposition 3

We first need to proof the following Lemma.

Lemma 9

The conductance of the MH Markov chain of Example 2 satisfies

$$ \frac{1+\sqrt{1+2S(S+1)}}{S(S+1)}\leq h(P)\leq \frac{2}{S+1} . $$
(28)

Proof

Let for all \(A\in \mathfrak {S}\), \(\psi (A):={{\sum }_{x\in A}\pi (x)P(x,\bar {A})}\slash {\pi (A)\wedge (1-\pi (A))}\) be the quantity to minimize. A close analysis of the MH Markov chain displayed at the top panel of Fig. 14 shows that the set A which minimizes ψ(A) has the form A = (a1, a1 + 1, … , a2) for some Sa2a1 ≥ 1. Indeed, since the Markov chain moves to neighbouring states only there are only two ways to exit A for each transition. Since each way to exit A contributes at the same order of magnitude to the numerator, taking contiguous states minimizes it and in particular

$$ \sum\limits_{x\in A}\pi(x)P(x,\bar{A})=\pi(a_{1})\frac{1\vee (a_{1}-1)}{2a_{1}}+\pi(a_{2})\frac{1}{2}=\frac{1\wedge (a_{1}-1)+a_{2}}{S(S+1)} , $$

so that for any a1 < a2 satisfying π(A) < 1/2, we have:

$$ \psi(A)\geq \frac{1\vee (a_{1}-1)+a_{2}}{a_{2}(a_{2}+1)-a_{1}(a_{1}-1)} $$
(29)

since

$$ \pi(A)=\frac{2}{S(S+1)}\sum\limits_{k=a_{1}}^{a_{2}}k=\frac{a_{2}(a_{2}+1)-a_{1}(a_{1}-1)}{S(S+1)} . $$

Fix a1 and treat a2 as a function of a1 satisfying π(A) < 1/2. On the one hand, note that for all a1 the function mapping a2 to the RHS of Eq. (29) is decreasing. On the other hand, we have that \(\pi (A)<1/2\Leftrightarrow {a_{2}^{\ast }(a_{2}^{\ast }+1)-a_{1}(a_{1}-1)}<S(S+1)/2\), which yields

$$ a_{2}\leq a_{2}^{\ast}(a_{1}):=\left\lfloor\frac{-1+V(a_{1},S)}{2}\right\rfloor ,\quad V(a_{1},S):=\sqrt{1+2S(S+1)+4a_{1}(a_{1}-1)} . $$

Hence, for all a1, the RHS of Eq. (29) is lower bounded by

$$ \begin{array}{@{}rcl@{}} \frac{4(1\vee (a_{1}-1))-2+2V(a_{1},S)}{(-1+V(a_{1},S))(1+V(a_{1},S))-4a_{1}(a_{1}-1)} =\frac{4(1\vee (a_{1}-1))-2+2V(a_{1},S)}{V(a_{1},S)^{2}-1-4a_{1}(a_{1}-1)}\\ =\frac{2(1\vee (a_{1}-1))-1+V(a_{1},S)}{S(S+1)} . \end{array} $$

Clearly, the numerator is an increasing function of a1 and is thus minimized for a1 = 1, which gives the lower bound of Eq. (28). Finally, by definition h(P) is upper bounded by ψ(A) for any \(A\in \mathfrak {S}\) satisfying π(A) < 1/2. In particular, taking A = (1, 2, … , (S − 1)/2) gives the upper bound of Eq. (28). □

Proof

Since PMH is reversible and aperiodic its spectrum is real with any eigenvalue different to one \(\lambda \in {\Lambda }_{|\boldsymbol {1}^{\perp }}:=\text {Sp}(P_{\text {MH}})\backslash \{1\}\) satisfying − 1 < λ < 1. The norm of PMH as an operator on the non-constant functions of L2(π) is \(\gamma :=\max \limits \{\sup {\Lambda }_{|\boldsymbol {1}^{\perp }},|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\}\). It is well known (see e.g. Yuen 2000) that

$$ \|\delta_{1}P_{\text{MH}}^{t}-\pi\|_{2}\leq \|\delta_{1}-\pi\|_{2}\gamma^{t} . $$

It can be readily checked that ∥δ1π2 corresponds to the first factor on the RHS of Eq. (13). The tedious part of the proof is to bound γ. Using again the reversibility, the Cheeger’s inequality, (see e.g. Diaconis et al. (1991) for a proof), writes

$$ 1-2h(P)\leq \sup{\Lambda}\leq 1-h(P)^{2} , $$
(30)

where h(P) is the Markov chain conductance defined as

$$ h(P)=\underset{\pi(A)<1/2}{\underset{A\in\mathfrak{S}}{\inf}} \frac{{\sum}_{x\in A}\pi(x)P(x,\bar{A})}{\pi(A)} . $$

Combining Cheeger’s inequality and Lemma 9 yields

$$ \sup{\Lambda}\leq 1-\frac{2}{S(S+1)} . $$
(31)

However, to use the above bound to upper bound γ, we need to check that \(\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq |\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\). In general, bounding \(|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\) proves to be more challenging than \(\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\). However, in the context of this example, we can use the bound derived in Proposition 2 of Diaconis et al. (1991). It is based on a geometric interpretation of the Markov chain as a non bipartite graph with vertices (states) connected by edges (transitions), as illustrated in Fig. 14. More precisely, the main result of this work to our interest states that

$$ \inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq -1+\frac{2}{\iota(P)} , $$
(32)

with \(\iota (P)=\max \limits _{e_{a,b}\in {\Gamma }}{\sum }_{\sigma _{x}\ni e_{a,b}}|\sigma _{x}|\pi (x)\), where

  • ea, b is the edge corresponding to the transition from state a to b,

  • σx is a path of odd length going from state x to itself, including a self-loop provided that P(x, x) > 0, and more generally \(\sigma _{x}=(e_{x,a_{1}},e_{a_{1},a_{2}},\ldots ,e_{a_{\ell },a_{x}})\) with even.

  • Γ is a collection of paths {σ1, … , σS} including exactly one path for each state,

  • |σx| represents the “length” of path σx and is formally defined as

    $$ |\sigma_{x}|=\sum\limits_{e_{a,b}\in\sigma_{x}}\frac{1}{\pi(a) P(a,b)} . $$

Let us consider the collection of paths Γ consisting of all the self loops for all states x ≥ 2. It can be readily checked that the length of such paths is

$$ |\sigma_{x}|=(\pi(x)P(x,x))^{-1}=\left( \frac{x}{\Delta}\frac{1}{2x}\right)^{-1}=S(S+1) . $$

For state x = 1, let us consider the path consisting of the walk around the circle σ1 : (e1,2, e2,3, … , eS,1). It may have been possible to take the path e1,2, e2,2, e2,1, but it is unclear if paths using the same edge twice are permitted in the framework of Prop. 2 of Diaconis et al. (1991). The length of path σ1 is

$$ \begin{array}{@{}rcl@{}} |\sigma_{1}|=\frac{1}{\pi(1)P(1,2)}+\cdots+\frac{1}{\pi(S)P(S,1)}\\ =S(S+1)+\frac{S(S+1)}{2}+{\cdots} \frac{S(S+1)}{S-1}+S(S+1)=S(S+1)\left( 1+\sum\limits_{k=1}^{S}\frac{1}{k}\right) . \end{array} $$

We are now in a position to calculate ι(P). First note that, by construction, each edge belonging to any path σk contained in Γ appears once and only once. Hence, the constant ι(P) simplifies to the maximum of the set {|σx|π(x), σx ∈Γ} that is

$$ \max\left\{2\left( 1+\sum\limits_{\ell=1}^{S}\frac{1}{\ell}\right),2k : 2\leq k\leq S\right\} =2S , $$
(33)

since on the one hand \({\sum }_{\ell =1}^{S} {1}\slash {\ell }\leq 1+\log (S)\) and on the other hand S ≥ 5. Combining Eqs. (32) and (33) yields to

$$ \inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq -1+\frac{1}{S} . $$
(34)

It comes that if \(\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq 0\), then \(\gamma \leq \sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\) and otherwise we have

$$ 0>\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\geq-1+\frac{1}{S}\Leftrightarrow 0>\inf {\Lambda}_{|\boldsymbol{1}^{\perp}} \text{and} \left|\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}\right|\leq1-\frac{1}{S} , $$

which combines with Eq. (31) to complete the proof as

$$ \max\{\sup{\Lambda}_{|\boldsymbol{1}^{\perp}},|\inf {\Lambda}_{|\boldsymbol{1}^{\perp}}|\}\leq 1-\frac{1}{S}\vee 1-\frac{2}{S(S+1)} \leq 1-\frac{2}{S(S+1)} , $$

since S ≥ 5. □

Appendix D: Proof of Proposition 4

Proof

By straightforward calculation we have:

$$ \left\|\delta_{1} P_{\text{GW}}^{(S-1)}(\cdot \times \{-1,1\})-\pi\right\|_{2}^{2}=1-\frac{8}{3S}+o(1/S) . $$
(35)

Using Proposition 3, we have that

$$ \begin{array}{@{}rcl@{}} &&\left\|\delta_{1} P_{\text{MH}}^{(S-1)}-\pi\right\|_{2}^{2}\leq \left\{1-\frac{4}{S(S+1)}+\frac{2(2S+1)}{3S(S+1)}\right\} \left( 1-\frac{2}{S(S+1)}\right)^{S-1}\\ &=&1-\frac{2}{3S}+o(1/S) . \end{array} $$

Comparing the complexity of the former bound with Eq. (35), the inequality of Eq. (14) cannot be concluded. In fact, we need to refine the bound for the MH convergence. Analysing the proof of Lemma 9, the lower bound of the conductance seems rather tight as resulting from taking the real bound on \(a_{2}^{\ast }(a_{1})\) as opposed to the floor of it. To illustrate this statement, the value of the bound is compared to the actual conductance for some moderate size of S, the calculation being otherwise too costly. Then, we calculated the numerical value of \(\sup {\Lambda }_{|1^{\perp }}\) for S ≤ 500 and compared with the lower bound derived from Cheeger’s inequality in the proof of Prop. 3. It appears that the Cheeger’s bound is in this example too lose to justify Eq. (14). However, taking a finer lower bound such as

$$ \sup{\Lambda}_{|1^{\perp}}\leq 1- 8/S^{2} , $$

yields

$$ \left\|\delta_{1} P_{\text{MH}}^{(S-1)}-\pi\right\|_{2}^{2}\leq1-\frac{20}{3S}+o(1/S) $$

which concludes the proof. □

Fig. 17
figure 17

(Example 2): Conductance h(P) and different approximations, including the lower bound \(\sqrt {2}/S\) derived in Lemma 9 (left) and comparison of \(\sup {\Lambda }_{|1^{\perp }}\) with the upper bound derived in Proposition 3 2/S2 and an estimated finer upper bound 8/S2. For readability, we have represented one minus these quantities and in log scale, hence upper bounds become lower bounds

Appendix E: Proof of Proposition 6

Proof

First, denote by R the mixture of the two NRMH kernels with weight 1/2. We start by showing that this kernel is π-reversible. Indeed, the subkernel of R satisfies:

$$ \begin{array}{@{}rcl@{}} &&\pi(\mathrm{d} x) Q(x,\mathrm{d} y)(A_{\Gamma}(x,y)+A_{-{\Gamma}}(x,y)) \\ &=& \mathrm{d} x\mathrm{d} y \big(\left[\pi(x)Q(x,y)\wedge\pi(y)Q(y,x)+{\Gamma}(x,y)\right]\\ &&+\left[\pi(x)Q(x,y)\wedge\pi(y)Q(y,x)-{\Gamma}(x,y)\right]\big)\\ &=&\mathrm{d} x\mathrm{d} y \big(\left[\pi(x)Q(x,y)-{\Gamma}(x,y)\wedge\pi(y)Q(y,x)\right]\\ &&+\left[\pi(x)Q(x,y)+{\Gamma}(x,y)\wedge\pi(y)Q(y,x)\right]\big)\\ &=&\pi(\mathrm{d} y)Q(y,\mathrm{d} x) \left( \left[\frac{\pi(x)Q(x,y)-{\Gamma}(x,y)}{\pi(y)Q(y,x)}\wedge 1\right]+ \left[\frac{\pi(x)Q(x,y)+{\Gamma}(x,y)}{\pi(y)Q(y,x)}\wedge1\right]\right) . \end{array} $$

Now, note that for all \(x\in \mathcal {S}\) and all \(A\in \mathfrak {S}\),

$$ R(x,A\backslash\{x\})=\frac{1}{2}{\int}_{A\backslash\{x\}}Q(x,\mathrm{d} z)(A_{\Gamma}(x,z)+A_{-{\Gamma}}(x,z)) $$

and since for any two positive number a and b, (1 ∧ a) + (1 ∧ b) ≤ 2 ∧ (a + b), we have all \((x,z)\in \mathcal {S}^{2}\),

$$ \begin{array}{@{}rcl@{}} A_{\Gamma}(x,z)+A_{-{\Gamma}}(x,z)&=&1\wedge \frac{\pi(y)Q(y,x)+{\Gamma}(x,y)}{\pi(x)Q(x,y)}+ 1\wedge \frac{\pi(y)Q(y,x)-{\Gamma}(x,y)}{\pi(x)Q(x,y)}\\ &\leq& 2\left( 1\wedge\frac{\pi(y)Q(y,x)}{\pi(x)Q(x,y)}\right) \end{array} $$

since by Assumption 2, π(y)Q(y, x) + Γ(x, y) ≥ 0 for all \((x,y)\in \mathcal {S}^{2}\). This yields a Peskun-Tierney ordering RPMH, since

$$ R(x,A\backslash\{x\})\leq \frac{1}{2}\int Q(x,\mathrm{d} z)\left( 1\wedge\frac{\pi(y)Q(y,x)}{\pi(x)Q(x,y)}\right)=P_{\text{MH}}(x,A\backslash\{x\}) $$

and the proof is concluded by applying Theorem 4 of Tierney (1998). □

Appendix F: Proof of Proposition 7

Proof

Note that if Γ1 satisfies Assumptions 1 and 2 then

$$ \begin{array}{@{}rcl@{}} &&\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\\ &=& {\Gamma}_{1}(x,y)+\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{1}(y,x)\right)\right] . \end{array} $$

Thus, if Γ1 and Γ− 1 satisfy Assumptions 1, 2 and 3 then

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{1}(x,y)&=&\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{-1}(y,x)\right)\right] \\ &&-\left[\pi(y)Q(y,x)\wedge \left( \pi(x)Q(x,y)+{\Gamma}_{1}(y,x)\right)\right] . \end{array} $$

Hence, we have

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{1}(y,x)&=&\left[\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{-1}(x,y)\right)\right] \\ &&-\left[\pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\right] ,\\ &=&\left[\left( \pi(x)Q(x,y)-{\Gamma}_{-1}(x,y)\right)\wedge \pi(y)Q(y,x)\right] \\ &&-\left[\left( \pi(x)Q(x,y)-{\Gamma}_{1}(x,y)\right)\wedge \pi(y)Q(y,x)\right]+{\Gamma}_{-1}(x,y)-{\Gamma}_{1}(x,y)\\ &=&{\Gamma}_{1}(x,y)+{\Gamma}_{-1}(x,y)-{\Gamma}_{1}(x,y) , \end{array} $$

and thus Γ− 1 = −Γ1, which replacing in Eq. (21) leads to

$$ \begin{array}{@{}rcl@{}} \pi(x)Q(x,y)\wedge \left( \pi(y)Q(y,x)+{\Gamma}_{1}(x,y)\right)\\ =\left( \pi(x)Q(x,y)+{\Gamma}_{1}(x,y)\right) \wedge \pi(y)Q(y,x) \end{array} $$
(36)

for all \((x,y)\in \mathcal {S}^{2}\). Conversely, it can be readily checked that if Γ1 satisfies Assumptions 1, 2 and Eq. (36), then setting Γ− 1 = −Γ1 implies that Γ1 and Γ− 1 satisfy Assumptions 1, 2 and the skew-detailed balance equation (Eq. (21)). The proof is concluded by noting that Eq. (36) holds if and only if Γ1 is the null operator on \(\mathcal {S}\times \mathcal {S}\) or Q is π- reversible. □

Appendix G: Proof of Proposition 8

We prove Proposition 8 that states that the transition kernel (24) of the Markov chain generated by Algorithm 4 is \(\tilde {\pi }\)-invariant and is non-reversible if and only if Γ = 0.

Proof

To prove the invariance of Kρ, we need to prove that

$$ \sum\limits_{y\in\mathcal{S},\eta\in\{-1,1\}} \tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi) = \tilde{\pi}(x,\xi) , $$

for all \((x,\xi )\in \mathcal {S}\times \{-1,1\}\) and ρ ∈ [0, 1].

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{y,\eta}\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi)\\ &=& \sum\limits_{y} \tilde{\pi}(y,\xi)K_{\rho}(y,\xi;x,\xi) + \sum\limits_{y} \tilde{\pi}(y,-\xi)K_{\rho}(y,-\xi;x,\xi) \\ &=& \tilde{\pi}(x,\xi)K_{\rho}(x,\xi;x,\xi) + \sum\limits_{y \neq x} \tilde{\pi}(y,\xi)K_{\rho}(y,\xi;x,\xi)+ \tilde{\pi}(x,{-\xi})K_{\rho}(x,{-\xi};x,\xi) \\ & =& \tilde{\pi}(x,\xi) \bigg\{ Q(x,x) + (1-\rho) \sum\limits_{z} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) \\ && + \rho\sum\limits_{z} Q(x,z)(1-A_{-\xi{\Gamma}}(x,z)) \bigg\} + \sum\limits_{y \neq x} \tilde{\pi}(y,\xi)Q(y,x)A_{\xi{\Gamma}}(y,x) \end{array} $$
(37)

the second equality coming from the fact that Kρ(y, −ξ; x, ξ) ≠ 0 if and only if x = y and the third from the fact that \(\tilde {\pi }(x,\xi ) = \tilde {\pi }(x,-\xi ) = \pi (x)/2\). Now, let \(A(x,\xi ) := {\sum }_{y \neq x} \tilde {\pi }(y,\xi )Q(y,x)A_{\xi {\Gamma }}(y,x)\) and note that:

(38)

Assumption 2 together with the fact that π(x) > 0 for all \(x\in \mathcal {S}\) yields π(y)Q(y, x) > 0 if and only if π(x)Q(x, y) > 0. It can also be noted that the lower-bound condition on Γ implies that Γ(x, y) = 0 if Q(x, y) = 0. This leads to

$$ \begin{array}{@{}rcl@{}} A(x,\xi) &=& (1/2) \underset{\pi(x)Q(x,y)>0}{\underset{y \neq x}{\sum}} \pi(x)Q(x,y) A_{\xi{\Gamma}}(x,y)+(\xi/2) \sum\limits_{y \neq x}{\Gamma}(y,x)\\ &=&\tilde{\pi}(x,\xi) \sum\limits_{y \neq x} Q(x,y)A_{\xi{\Gamma}}(x,y) \end{array} $$
(39)

since for all \(x\in \mathcal {S}\), \({\sum }_{y\in \mathcal {S}}{\Gamma }(x,y)=0\). Similarly, define

$$ B(x,\xi):= \tilde{\pi}(x,\xi) \sum\limits_{z} Q(x,z) \left\{(1-\rho)(1-A_{\xi{\Gamma}}(x,z)) + \rho(1-A_{-\xi{\Gamma}}(x,z)) \right\} . $$

Using Lemma 10, we have:

$$ \begin{array}{@{}rcl@{}} B(x,\xi) &&= \tilde{\pi}(x,\xi) \sum\limits_{z\in\mathcal{S}} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) \end{array} $$
(40)
$$ \begin{array}{@{}rcl@{}} &&= \tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z)(1-A_{\xi{\Gamma}}(x,z)) , \\ &&= \tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z)-A(x,\xi) , \end{array} $$
(41)

where the penultimate equality follows from AΓ(x, x) = 1 for all \(x\in \mathcal {S}\). Finally, combining Eqs. (37) and (40), we obtain:

$$ \begin{array}{@{}rcl@{}} \sum\limits_{y,\eta}\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi)&&=\tilde{\pi}(x,\xi) Q(x,x)A_{\xi{\Gamma}}(x,x)+B(x,\xi)+A(x,\xi) ,\\ &&=\tilde{\pi}(x,\xi) Q(x,x)+\tilde{\pi}(x,\xi) \sum\limits_{z \neq x} Q(x,z) ,\\ &&=\tilde{\pi}(x,\xi) , \end{array} $$

since \({\sum }_{y\in \mathcal {S}}Q(x,y)=1\), for all \(x\in \mathcal {S}\). We now study the \(\tilde {\pi }\)-reversibility of Kρ, i.e. conditions on Γξ such that for all \((x,y)\in \mathcal {S}^{2}\) and (ξ, η) ∈{− 1, 1}2 such that (x, ξ)≠(y, η), we have:

$$ \tilde{\pi}(x,\xi)K_{\rho}(x,\xi;y,\eta)=\tilde{\pi}(y,\eta)K_{\rho}(y,\eta;x,\xi) . $$
(42)

First note that if x = y and ξ = −η, then Eq. (42) is equivalent to

$$ \sum\limits_{z\in\mathcal{S}}Q(x,z)\left( A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\right)=0 $$

which is true from Lemma 10 and the fact that π is non-zero almost everywhere. Second, for xy and ξ = −η, Eq. (42) is trivially true by definition of Kρ, see (24). Hence, condition(s) on the vorticity matrix to ensure \(\tilde {\pi }\)-reversibility are to be investigated only for the case ξ = η and xy. In such a case Eq. (42) is equivalent to

$$ \pi(x)Q(x,y)A_{\xi{\Gamma}}(x,y)=\pi(y)Q(y,x)A_{-\xi{\Gamma}^{\xi}}(y,x) , $$

which is equivalent Γ = 0. Hence Kρ is \(\tilde {\pi }\)-reversible if and only if Γ = 0. □

Lemma 10

Under the Assumptions of Proposition 7, we have for all\(x\in \mathcal {S}\)and ξ ∈ {− 1, 1}

$$ \pi(x)\sum\limits_{z\in\mathcal{S}} Q(x,z) \left\{A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\right\}=0 . $$

Proof

Using that for three real numbers a, b, c, we have ab = (acbc) + c, together with the fact that Γ(x, y) = −Γ(y, x), we have:

$$ \begin{array}{@{}rcl@{}} \pi(x)Q(x,y)A_{\xi{\Gamma}}(x,y) && = \pi(x)Q(x,y) \left\{ 1 \wedge \frac{\xi{\Gamma}(x,y) + \pi(y)Q(y,x)}{\pi(x)Q(x,y)} \right\} , \\ && = \pi(y)Q(y,x) \left\{ 1 \wedge \frac{\xi{\Gamma}(y,x) + \pi(x)Q(x,y)}{\pi(y)Q(y,x)} \right\} + \xi{\Gamma}(x,y) , \\ && = \pi(y)Q(y,x)A_{\xi{\Gamma}}(y,x) + \xi{\Gamma}(x,y) . \end{array} $$
(43)

The proof follows from combining the skew-detailed balance Eqs. (21) and (43):

$$ \begin{array}{@{}rcl@{}} &&\pi(x)\sum\limits_{z\in\mathcal{S}} Q(x,z)\{A_{\xi{\Gamma}}(x,z)-A_{-\xi{\Gamma}}(x,z)\} \\ &=&\sum\limits_{z\in\mathcal{S}} \left\{\pi(x)Q(x,z)A_{\xi{\Gamma}}(x,z) - \pi(x)Q(x,z)A_{-\xi{\Gamma}}(x,z)\right\} , \\ & =& \sum\limits_{z\in\mathcal{S}} \left\{\pi(x)Q(x,z)A_{\xi{\Gamma}}(x,z) - \pi(z)Q(z,x)A_{\xi{\Gamma}}(z,x) \right\} ,\\ & =& \sum\limits_{z\in\mathcal{S}} \xi{\Gamma}(x,z) ,\\ &=&0 . \end{array} $$

Appendix H: Illustration of NRMHAV on Example 2

Fig. 18
figure 18

(Example 2) Mixing time of NRMHAV (Alg. 4) in function of ϱ ∈ [0, 1] and for S ∈ {7, 21, 51, 101}. Top: convergence of the lifted Markov chain \(\{(X_{t},\zeta _{t}), t\in \mathbb {N}\}\) to \(\tilde {\pi }\) (right) and convergence of the marginal sequence \(\{X_{t}, t\in \mathbb {N}\}\) to π (left). Bottom: comparison of the convergence of \(\{X_{t}, t\in \mathbb {N}\}\) for MH (plain line), NRMH with Γ (dashed), NRMH with −Γ (dotted) and NRMHAV (dashed with points), for S = 7 (black) and S = 51 (green is that not red?)

Appendix I: Generation of vorticity matrices on S × S grids

We detail a method to generate vorticity matrices satisfying Assumption 1 in the context of Example 4. In the general case of a random walk on an S × S grid, Γζ is an S2 × S2 matrix that can be constructed systematically using the properties that Γζ(x, y) = −Γζ(y, x) for all \((x,y)\in \mathcal {S}^{2}\) and Γζ1 = 0. It has a block-diagonal structure:

$$ {\Gamma}_{\zeta} = \left( \begin{array}{ccccc} B & 0 & 0 & {\cdots} & 0 \\ 0 & B & 0 & {\cdots} & 0 \\ 0 & 0 & B & & 0 \\ {\vdots} & {\vdots} & & {\ddots} & \vdots \end{array}\right) $$
(44)

where each 2S × 2S diagonal block B has the following structure:

$$ B = \left( \begin{array}{cc} B_{D} & B_{OD} \\ -B_{OD} & -B_{D} \end{array}\right) $$
(45)

where

$$ B_{D} = \left( \begin{array}{ccccccc} 0 & -\zeta & 0 & 0 & {\cdots} & 0 & 0\\ \zeta & 0 & -\zeta & 0 & {\cdots} & 0 & 0 \\ 0 & \zeta & 0 & -\zeta & 0 & {\cdots} & 0 \\ {\vdots} & {\ddots} & {\ddots} & {\ddots} & {\ddots} & {\ddots} & {\vdots} \\ 0 & {\cdots} & 0 & \zeta & 0 & -\zeta & 0 \\ 0 & 0 & {\cdots} & 0 & \zeta & 0 & -\zeta \\ 0 & 0 & {\cdots} & 0 & 0 & \zeta & 0 \end{array}\right) $$

and

$$ B_{OD} = \left( \begin{array}{ccccccc} \zeta & 0 & & & {\cdots} & & 0 \\ 0 & 0 &&& {\cdots} && 0 \\ {\vdots} &&& {\ddots} &&& {\vdots} \\ 0 &&& {\cdots} && 0 & 0 \\ 0 & & & {\cdots} & & 0 & -\zeta \end{array}\right) $$

and ζ is such that the MH ratio (22) is always non-negative. The vorticity matrix is of size S2 × S2, meaning that the number of diagonal blocks varies upon S:

  • ifSis even:\(\exists k \in \mathbb {N} \text { s.t. } s = 2k ~ \Rightarrow ~ s^{2} = 4k^{2}\) and each block B is a square matrix of dimension 4k, then there are exactly kB-blocks in the vorticity matrix Γζ;

  • ifSis odd:\(\exists k \in \mathbb {N} \text { s.t. } s = 2k+1 ~ \Rightarrow ~ s^{2} = (2k+1)^{2}\) and each block B is a square matrix of dimension 2(2k + 1), then as \(\frac {(2k+1)^{2}}{2(2k+1)} = k + \frac {1}{2}\), Γζ is made of kB-blocks and the last terms of the diagonal are completed with zeros.

For instance, if S = 3 (resp. if S = 4), the vorticity matrix is given by \({\Gamma }_{\zeta }^{(3)}\) (resp. \({\Gamma }_{\zeta }^{(4)}\)) as follows:

$$ {\Gamma}_{\zeta}^{(4)} = \left( \begin{array}{cc} B_{4} & \boldsymbol{0}_{8} \\ \boldsymbol{0}_{8} & B_{4} \end{array}\right) $$

where

$$\scriptsize B_{4} = \left( \begin{array}{cccccccc} 0 & -\zeta & 0 & 0 & \zeta & 0 & 0 & 0 \\ \zeta & 0 & -\zeta & 0 & 0 & 0 & 0 & 0 \\ 0 & \zeta & 0 & -\zeta & 0 & 0 & 0 & 0 \\ 0 & 0 & \zeta & 0 & 0 & 0 & 0 & -\zeta \\ -\zeta & 0 & 0 & 0 & 0 & \zeta & 0 & 0 \\ 0 & 0 & 0 & 0 & -\zeta & 0 & \zeta & 0 \\ 0 & 0 & 0 & 0 & 0 & -\zeta & 0 & \zeta \\ 0 & 0 & 0 & \zeta & 0 & 0 & -\zeta & 0 \end{array}\right) $$

and 0m stands for the zero-matrix of size m × m.

Fig. 19
figure 19

Illustration of the generic vorticity matrix specified by the previous Algorithm in the case S = 4

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vialaret, M., Maire, F. On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods. Methodol Comput Appl Probab 22, 1349–1387 (2020). https://doi.org/10.1007/s11009-019-09766-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11009-019-09766-w

Keywords

Mathematics Subject Classification (2010)

Navigation