Abstract
It is commonly admitted that non-reversible Markov chain Monte Carlo (MCMC) algorithms usually yield more accurate MCMC estimators than their reversible counterparts. In this note, we show that in addition to their variance reduction effect, some non-reversible MCMC algorithms have also the undesirable property to slow down the convergence of the Markov chain. This point, which has been overlooked by the literature, has obvious practical implications. We illustrate this phenomenon for different non-reversible versions of the Metropolis-Hastings algorithm on several discrete state space examples and discuss ways to mitigate the risk of a small asymptotic variance/slow convergence scenario.
Similar content being viewed by others
References
Andrieu C, Durmus A, Nüsken N, Roussel J (2018) Hypercoercivity of piecewise deterministic markov process-monte carlo. arXiv:1808.08592
Andrieu C, Livingstone S (2019) Peskun-Tierney ordering for Markov chain and process Monte Carlo: beyond the reversible scenario. arXiv:1906.06197
Bierkens J (2016) Non-reversible Metropolis-Hastings. Stat Comput 26(6):1213–1228. https://doi.org/10.1007/s11222-015-9598-x
Bierkens J, Fearnhead P, Roberts G (2019) The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann Stat 47(3):1288–1320
Bouchard-Côté A, Vollmer SJ, Doucet A (2017) The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. Journal of the American Statistical Association
Chen F, Lovász L, Pak I (1999) Lifting Markov chains to speed up mixing. In: STOC’99. Citeseer
Chen T-L, Hwang C-R (2013) Accelerating reversible Markov chains. Stat Probabil Lett 83(9):1956–1962
Diaconis P, Holmes S, Neal RM (2000) Analysis of a nonreversible Markov chain sampler. Ann Appl Probab 10(3):726–752. http://www.jstor.org/stable/2667319
Diaconis P, Miclo L (2013) On the spectral analysis of second-order Markov chains. Annales de la faculté des sciences de toulouse: Mathématiques 22:573–621
Diaconis P, Stroock D et al (1991) Geometric bounds for eigenvalues of Markov chains. Ann Appl Probab 1(1):36–61
Duncan A, Nüsken N, Pavliotis G (2017) Using perturbed underdamped langevin dynamics to efficiently sample from probability distributions. J Stat Phys 169 (6):1098–1131
Fill JA et al (1991) Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. Ann Appl Probab 1(1):62–87
Gadat S, Miclo L (2013) Spectral decompositions and l2-operator norms of toy hypocoercive semi-groups. Kinet Relat Mod 6(2):317–372
Gustafson P (1998) A guided walk Metropolis algorithm. Stat Comput 8(4):357–364
Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
Horowitz AM (1991) A generalized guided Monte Carlo algorithm. Phys Lett B 268(2):247–252
Hwang C-R, Hwang-Ma S-Y, Sheu S-J, et al (2005) Accelerating diffusions. Ann Appl Probab 15(2):1433–1444
Hwang C-R, Normand R, Wu S-J (2015) Variance reduction for diffusions. Stoch Process Appl 125(9):3522–3540
Iosifescu M (2014) Finite Markov processes and their applications. Courier Corporation
Łatuszyński K, Miasojedow B, Niemiro W et al (2013) Nonasymptotic bounds on the estimation error of MCMC algorithms. Bernoulli 19(5A):2033–2066
Lelièvre T, Nier F, Pavliotis GA (2013) Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J Stat Phys 152(2):237–274
Ma Y-A, Fox EB, Chen T, Wu L (2019) Irreversible samplers from jump and continuous Markov processes. Stat Comput 29(1):177–202
Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
Meyn SP, Tweedie RL et al (1994) Computable bounds for geometric convergence rates of Markov chains. Ann Appl Probab 4(4):981–1011
Miclo L, Monmarché P (2013) Étude spectrale minutieuse de processus moins indécis que les autres. In: Séminaire de Probabilités XLV. Springer, pp 459–481
Mira A, Geyer CJ (2000) On non-reversible markov chains. Monte Carlo Methods. Fields Institute/AMS, pp 95–110
Neal RM (2004) Improving asymptotic variance of MCMC estimators: Non-reversible chains are better. arXiv:math/0407281
Plummer M, Best N, Cowles K, Vines K (2006) CODA: Convergence diagnosis and output analysis for MCMC. R news 6(1):7–11
Poncet R (2017) Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833
Ramanan K, Smith A (2018) Bounds on lifting continuous-state Markov chains to speed up mixing. J Theor Probab 31(3):1647–1678
Rosenthal JS (1995) Minorization conditions and convergence rates for Markov chain Monte Carlo. J Am Stat Assoc 90(430):558–566
Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98(461):169–177
Sakai Y, Hukushima K (2016) Eigenvalue analysis of an irreversible random walk with skew detailed balance conditions. Phys Rev E 93(4):043318
Sherlock C, Thiery AH (2017) A discrete bouncy particle sampler. arXiv:1707.05200
Sun Y, Schmidhuber J, Gomez FJ (2010) Improving the asymptotic performance of Markov chain Monte Carlo by inserting vortices. In: Advances in Neural Information Processing Systems. pp 2235–2243
Tierney Luke (1998) A note on Metropolis-Hastings kernels for general state spaces. Annals of applied probability, 1–9
Turitsyn KS, Chertkov M, Vucelja M (2011) Irreversible monte carlo algorithms for efficient sampling. Physica D: Nonlinear Phenomena 240(4-5):410–414
Vanetti P, Bouchard-Côté A, Deligiannidis G, Doucet A (2018) Piecewise-deterministic Markov Chain Monte Carlo. arXiv:1707.05296
Vucelja M (2016) Lifting – A nonreversible Markov chain Monte Carlo algorithm. Am J Phys 84(958). https://doi.org/10.1119/1.4961596
Yuen WK (2000) Applications of geometric bounds to the convergence rate of Markov chains on rn. Stochastic Processes and Their Applications 87:1–23
Acknowledgements
This research work was been partially funded by ENSAE ParisTech, the Insight Center for Data Analytics at University College Dublin and NSERC of Canada. The Authors thank the editors and two anonymous referees for many constructive comments that improved the article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Lifted non-reversible Markov chain
Appendix B: Marginal non-reversible Markov chain
Appendix C: Proof of Proposition 3
We first need to proof the following Lemma.
Lemma 9
The conductance of the MH Markov chain of Example 2 satisfies
Proof
Let for all \(A\in \mathfrak {S}\), \(\psi (A):={{\sum }_{x\in A}\pi (x)P(x,\bar {A})}\slash {\pi (A)\wedge (1-\pi (A))}\) be the quantity to minimize. A close analysis of the MH Markov chain displayed at the top panel of Fig. 14 shows that the set A which minimizes ψ(A) has the form A = (a1, a1 + 1, … , a2) for some S ≥ a2 ≥ a1 ≥ 1. Indeed, since the Markov chain moves to neighbouring states only there are only two ways to exit A for each transition. Since each way to exit A contributes at the same order of magnitude to the numerator, taking contiguous states minimizes it and in particular
so that for any a1 < a2 satisfying π(A) < 1/2, we have:
since
Fix a1 and treat a2 as a function of a1 satisfying π(A) < 1/2. On the one hand, note that for all a1 the function mapping a2 to the RHS of Eq. (29) is decreasing. On the other hand, we have that \(\pi (A)<1/2\Leftrightarrow {a_{2}^{\ast }(a_{2}^{\ast }+1)-a_{1}(a_{1}-1)}<S(S+1)/2\), which yields
Hence, for all a1, the RHS of Eq. (29) is lower bounded by
Clearly, the numerator is an increasing function of a1 and is thus minimized for a1 = 1, which gives the lower bound of Eq. (28). Finally, by definition h(P) is upper bounded by ψ(A) for any \(A\in \mathfrak {S}\) satisfying π(A) < 1/2. In particular, taking A = (1, 2, … , (S − 1)/2) gives the upper bound of Eq. (28). □
Proof
Since PMH is reversible and aperiodic its spectrum is real with any eigenvalue different to one \(\lambda \in {\Lambda }_{|\boldsymbol {1}^{\perp }}:=\text {Sp}(P_{\text {MH}})\backslash \{1\}\) satisfying − 1 < λ < 1. The norm of PMH as an operator on the non-constant functions of L2(π) is \(\gamma :=\max \limits \{\sup {\Lambda }_{|\boldsymbol {1}^{\perp }},|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\}\). It is well known (see e.g. Yuen 2000) that
It can be readily checked that ∥δ1 − π∥2 corresponds to the first factor on the RHS of Eq. (13). The tedious part of the proof is to bound γ. Using again the reversibility, the Cheeger’s inequality, (see e.g. Diaconis et al. (1991) for a proof), writes
where h(P) is the Markov chain conductance defined as
Combining Cheeger’s inequality and Lemma 9 yields
However, to use the above bound to upper bound γ, we need to check that \(\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq |\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\). In general, bounding \(|\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}|\) proves to be more challenging than \(\sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\). However, in the context of this example, we can use the bound derived in Proposition 2 of Diaconis et al. (1991). It is based on a geometric interpretation of the Markov chain as a non bipartite graph with vertices (states) connected by edges (transitions), as illustrated in Fig. 14. More precisely, the main result of this work to our interest states that
with \(\iota (P)=\max \limits _{e_{a,b}\in {\Gamma }}{\sum }_{\sigma _{x}\ni e_{a,b}}|\sigma _{x}|\pi (x)\), where
-
ea, b is the edge corresponding to the transition from state a to b,
-
σx is a path of odd length going from state x to itself, including a self-loop provided that P(x, x) > 0, and more generally \(\sigma _{x}=(e_{x,a_{1}},e_{a_{1},a_{2}},\ldots ,e_{a_{\ell },a_{x}})\) with ℓ even.
-
Γ is a collection of paths {σ1, … , σS} including exactly one path for each state,
-
|σx| represents the “length” of path σx and is formally defined as
$$ |\sigma_{x}|=\sum\limits_{e_{a,b}\in\sigma_{x}}\frac{1}{\pi(a) P(a,b)} . $$
Let us consider the collection of paths Γ consisting of all the self loops for all states x ≥ 2. It can be readily checked that the length of such paths is
For state x = 1, let us consider the path consisting of the walk around the circle σ1 : (e1,2, e2,3, … , eS,1). It may have been possible to take the path e1,2, e2,2, e2,1, but it is unclear if paths using the same edge twice are permitted in the framework of Prop. 2 of Diaconis et al. (1991). The length of path σ1 is
We are now in a position to calculate ι(P). First note that, by construction, each edge belonging to any path σk contained in Γ appears once and only once. Hence, the constant ι(P) simplifies to the maximum of the set {|σx|π(x), σx ∈Γ} that is
since on the one hand \({\sum }_{\ell =1}^{S} {1}\slash {\ell }\leq 1+\log (S)\) and on the other hand S ≥ 5. Combining Eqs. (32) and (33) yields to
It comes that if \(\inf {\Lambda }_{|\boldsymbol {1}^{\perp }}\geq 0\), then \(\gamma \leq \sup {\Lambda }_{|\boldsymbol {1}^{\perp }}\) and otherwise we have
which combines with Eq. (31) to complete the proof as
since S ≥ 5. □
Appendix D: Proof of Proposition 4
Proof
By straightforward calculation we have:
Using Proposition 3, we have that
Comparing the complexity of the former bound with Eq. (35), the inequality of Eq. (14) cannot be concluded. In fact, we need to refine the bound for the MH convergence. Analysing the proof of Lemma 9, the lower bound of the conductance seems rather tight as resulting from taking the real bound on \(a_{2}^{\ast }(a_{1})\) as opposed to the floor of it. To illustrate this statement, the value of the bound is compared to the actual conductance for some moderate size of S, the calculation being otherwise too costly. Then, we calculated the numerical value of \(\sup {\Lambda }_{|1^{\perp }}\) for S ≤ 500 and compared with the lower bound derived from Cheeger’s inequality in the proof of Prop. 3. It appears that the Cheeger’s bound is in this example too lose to justify Eq. (14). However, taking a finer lower bound such as
yields
which concludes the proof. □
Appendix E: Proof of Proposition 6
Proof
First, denote by R the mixture of the two NRMH kernels with weight 1/2. We start by showing that this kernel is π-reversible. Indeed, the subkernel of R satisfies:
Now, note that for all \(x\in \mathcal {S}\) and all \(A\in \mathfrak {S}\),
and since for any two positive number a and b, (1 ∧ a) + (1 ∧ b) ≤ 2 ∧ (a + b), we have all \((x,z)\in \mathcal {S}^{2}\),
since by Assumption 2, π(y)Q(y, x) + Γ(x, y) ≥ 0 for all \((x,y)\in \mathcal {S}^{2}\). This yields a Peskun-Tierney ordering R ≺ PMH, since
and the proof is concluded by applying Theorem 4 of Tierney (1998). □
Appendix F: Proof of Proposition 7
Proof
Note that if Γ1 satisfies Assumptions 1 and 2 then
Thus, if Γ1 and Γ− 1 satisfy Assumptions 1, 2 and 3 then
Hence, we have
and thus Γ− 1 = −Γ1, which replacing in Eq. (21) leads to
for all \((x,y)\in \mathcal {S}^{2}\). Conversely, it can be readily checked that if Γ1 satisfies Assumptions 1, 2 and Eq. (36), then setting Γ− 1 = −Γ1 implies that Γ1 and Γ− 1 satisfy Assumptions 1, 2 and the skew-detailed balance equation (Eq. (21)). The proof is concluded by noting that Eq. (36) holds if and only if Γ1 is the null operator on \(\mathcal {S}\times \mathcal {S}\) or Q is π- reversible. □
Appendix G: Proof of Proposition 8
We prove Proposition 8 that states that the transition kernel (24) of the Markov chain generated by Algorithm 4 is \(\tilde {\pi }\)-invariant and is non-reversible if and only if Γ = 0.
Proof
To prove the invariance of Kρ, we need to prove that
for all \((x,\xi )\in \mathcal {S}\times \{-1,1\}\) and ρ ∈ [0, 1].
the second equality coming from the fact that Kρ(y, −ξ; x, ξ) ≠ 0 if and only if x = y and the third from the fact that \(\tilde {\pi }(x,\xi ) = \tilde {\pi }(x,-\xi ) = \pi (x)/2\). Now, let \(A(x,\xi ) := {\sum }_{y \neq x} \tilde {\pi }(y,\xi )Q(y,x)A_{\xi {\Gamma }}(y,x)\) and note that:
Assumption 2 together with the fact that π(x) > 0 for all \(x\in \mathcal {S}\) yields π(y)Q(y, x) > 0 if and only if π(x)Q(x, y) > 0. It can also be noted that the lower-bound condition on Γ implies that Γ(x, y) = 0 if Q(x, y) = 0. This leads to
since for all \(x\in \mathcal {S}\), \({\sum }_{y\in \mathcal {S}}{\Gamma }(x,y)=0\). Similarly, define
Using Lemma 10, we have:
where the penultimate equality follows from AΓ(x, x) = 1 for all \(x\in \mathcal {S}\). Finally, combining Eqs. (37) and (40), we obtain:
since \({\sum }_{y\in \mathcal {S}}Q(x,y)=1\), for all \(x\in \mathcal {S}\). We now study the \(\tilde {\pi }\)-reversibility of Kρ, i.e. conditions on Γξ such that for all \((x,y)\in \mathcal {S}^{2}\) and (ξ, η) ∈{− 1, 1}2 such that (x, ξ)≠(y, η), we have:
First note that if x = y and ξ = −η, then Eq. (42) is equivalent to
which is true from Lemma 10 and the fact that π is non-zero almost everywhere. Second, for x ≠ y and ξ = −η, Eq. (42) is trivially true by definition of Kρ, see (24). Hence, condition(s) on the vorticity matrix to ensure \(\tilde {\pi }\)-reversibility are to be investigated only for the case ξ = η and x≠y. In such a case Eq. (42) is equivalent to
which is equivalent Γ = 0. Hence Kρ is \(\tilde {\pi }\)-reversible if and only if Γ = 0. □
Lemma 10
Under the Assumptions of Proposition 7, we have for all\(x\in \mathcal {S}\)and ξ ∈ {− 1, 1}
Proof
Using that for three real numbers a, b, c, we have a ∧ b = (a − c ∧ b − c) + c, together with the fact that Γ(x, y) = −Γ(y, x), we have:
The proof follows from combining the skew-detailed balance Eqs. (21) and (43):
□
Appendix H: Illustration of NRMHAV on Example 2
Appendix I: Generation of vorticity matrices on S × S grids
We detail a method to generate vorticity matrices satisfying Assumption 1 in the context of Example 4. In the general case of a random walk on an S × S grid, Γζ is an S2 × S2 matrix that can be constructed systematically using the properties that Γζ(x, y) = −Γζ(y, x) for all \((x,y)\in \mathcal {S}^{2}\) and Γζ1 = 0. It has a block-diagonal structure:
where each 2S × 2S diagonal block B has the following structure:
where
and
and ζ is such that the MH ratio (22) is always non-negative. The vorticity matrix is of size S2 × S2, meaning that the number of diagonal blocks varies upon S:
-
ifSis even:\(\exists k \in \mathbb {N} \text { s.t. } s = 2k ~ \Rightarrow ~ s^{2} = 4k^{2}\) and each block B is a square matrix of dimension 4k, then there are exactly kB-blocks in the vorticity matrix Γζ;
-
ifSis odd:\(\exists k \in \mathbb {N} \text { s.t. } s = 2k+1 ~ \Rightarrow ~ s^{2} = (2k+1)^{2}\) and each block B is a square matrix of dimension 2(2k + 1), then as \(\frac {(2k+1)^{2}}{2(2k+1)} = k + \frac {1}{2}\), Γζ is made of kB-blocks and the last terms of the diagonal are completed with zeros.
For instance, if S = 3 (resp. if S = 4), the vorticity matrix is given by \({\Gamma }_{\zeta }^{(3)}\) (resp. \({\Gamma }_{\zeta }^{(4)}\)) as follows:
where
and 0m stands for the zero-matrix of size m × m.
Rights and permissions
About this article
Cite this article
Vialaret, M., Maire, F. On the Convergence Time of Some Non-Reversible Markov Chain Monte Carlo Methods. Methodol Comput Appl Probab 22, 1349–1387 (2020). https://doi.org/10.1007/s11009-019-09766-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-019-09766-w