Skip to main content
Log in

Bounds on Lifting Continuous-State Markov Chains to Speed Up Mixing

  • Published:
Journal of Theoretical Probability Aims and scope Submit manuscript

Abstract

It is often possible to speed up the mixing of a Markov chain \(\{ X_{t} \}_{t \in \mathbb {N}}\) on a state space \(\Omega \) by lifting, that is, running a more efficient Markov chain \(\{ \widehat{X}_{t} \}_{t \in \mathbb {N}}\) on a larger state space \(\hat{\Omega } \supset \Omega \) that projects to \(\{ X_{t} \}_{t \in \mathbb {N}}\) in a certain sense. Chen et al. (Proceedings of the 31st annual ACM symposium on theory of computing. ACM, 1999) prove that for Markov chains on finite state spaces, the mixing time of any lift of a Markov chain is at least the square root of the mixing time of the original chain, up to a factor that depends on the stationary measure of \(\{X_t\}_{t \in \mathbb {N}}\). Unfortunately, this extra factor makes the bound in Chen et al. (1999) very loose for Markov chains on large state spaces and useless for Markov chains on continuous state spaces. In this paper, we develop an extension of the evolving set method that allows us to refine this extra factor and find bounds for Markov chains on continuous state spaces that are analogous to the bounds in Chen et al. (1999). These bounds also allow us to improve on the bounds in Chen et al. (1999) for some chains on finite state spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The constant \(\frac{1}{2}\) here is arbitrary; replacing the constant \(\frac{1}{2}\) with any constant \(0< c < 1\) would result in similar bounds, though perhaps with slightly different constants.

  2. This proof is given for Markov chains on discrete state spaces, but the same bound holds for continuous-state Markov chains with minor changes in notation.

  3. Again, the result is stated for chains on discrete state spaces but applies more generally to continuous-state Markov chains.

  4. We point out that such a function \(x^{*}\) always exists. Note that the containment condition actually determines the value of \(x^{*}\) except on the set \(A = \{x \in \mathbb {R}^{d} \, : \, \exists \, 1 \le i \le d \, \text { s.t. } x_{i} \in \frac{1}{N} \mathbb {Z} \}\). Furthermore, \(x^{*}\) is continuous on \(A^{c}\) and A has measure 0. It is clear that the domain of \(x^{*}\) can be extended from \(A^{c}\) to \(\mathbb {R}^{d}\) in a measurable way, e.g., by choosing the smallest allowed value in the usual lexicographic order on \(\mathbb {R}^{d}\).

References

  1. Chen, F., Lovász, L., Pak, I.: Lifting markov chains to speed up mixing. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pp. 275–281. ACM (1999)

  2. Diaconis, P.: Group Representations in Probability and Statistics, volume 11 of IMS Lecture Notes—Monograph Series. Institute of Mathematical Statistics, Hayward (1988)

  3. Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab. 10(3), 726–752 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Diaconis, P.: The Markov chain Monte Carlo revolution. Bull. Am. Math. Soc. 46(1), 179–205 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. Diaconis, P., Miclo, L.: On the spectral analysis of second-order Markov chains. Ann. Fac. Sci. Toulouse Math. 22, 573–621 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)

    Article  MATH  Google Scholar 

  7. Gomez, F., Schmidhuber, J., Sun, Y.: Improving the asymptotic performance of Markov chain Monte-Carlo by inserting vortices. Advances in Neural Information Processing Systems (2010)

  8. Levin, D., Peres, Y., Wilmer, E.: Markov Chains and Mixing Times. American Mathematical Society, Providence (2009)

    MATH  Google Scholar 

  9. Morris, B., Peres, Y.: Evolving sets, mixing and heat kernel bounds. Probab. Theory Relat. Fields 133, 245–266 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  10. Neal, R.: Improving asymptotic variance of MCMC estimators: Non-reversible chains are better. Technical Report No. 0406, Department of Statistics, University of Toronto (2004)

  11. Peres, Y., Sousi, P.: Mixing times are hitting times of large sets. J. Theor. Probab. 28(2), 488–519 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Rosenthal, J.: Random rotations: characters and random walks on \(\text{ SO }(n)\). Ann. Probab. 22, 398–423 (1994)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The first author, KR, is partially supported by NSF Grant DMS-1407504 and the second author, AMS, is partially supported by an NSERC Grant. This project was started while AMS was visiting ICERM, and he thanks ICERM for its generous support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aaron Smith.

Appendices

Appendix 1: Comparison to Previous Results on Discrete Spaces

We point out that Theorem 2 can give better bounds than Theorem 3.1 of [1] in some cases. Let \(\{ X_{t} \}_{t \in \mathbb {N}}\) be a Markov chain on a finite state space that satisfies Assumptions 2.7, with associated conductance \(\varPhi \), mixing time \(\tau \), and constants \(\beta , \gamma > 0\) satisfying Eq. (2.13). Then let \(\{ \widehat{X}_{t} \}_{t \in \mathbb {N}}\) be a lift of \(\{ X_{t} \}_{t \in \mathbb {N}}\) with mixing time \(\widehat{\tau }\) and conductance \(\widehat{\varPhi }\). By the same calculation as contained in inequality (4.11) (but omitting the lines involving superscript N’s), we have:

$$\begin{aligned} \widehat{\tau } \ge \frac{\sqrt{\gamma }}{32 \sqrt{ \log \left( \frac{4}{\pi _{*}} \right) \log \left( \frac{\sqrt{\beta }}{2} \right) }} \sqrt{\tau } \end{aligned}$$
(6.1)

We give an example for which this bound is better than Theorem 3.1 of [1]. We fix \(k, n \in \mathbb {N}\) and consider a random walk \(\{ X_{t} \}_{t \in \mathbb {N}}\) on the cycle \(\mathbb {Z}_{kn} = \{1,2,\ldots , nk \}\). Recall that the usual graph distance on \(\mathbb {Z}_{kn}\) is given by

$$\begin{aligned} d(x,y) = \min (|x-y|, nk - |x-y|) \end{aligned}$$

We then define the transition kernel K by:

$$\begin{aligned} \begin{aligned} K(i,i)&= \frac{1}{2} \\ K(i,j)&= \frac{1}{2k}, \qquad 0 < d(i,j) \le k \\ K(i,j)&= 0, \qquad d(i,j) > k. \\ \end{aligned} \end{aligned}$$
(6.2)

This walk has stationary distribution \(\pi (x) = \frac{1}{kn}\) for all \(x \in \mathbb {Z}_{kn}\). It can be represented according to the form of Eq. (6.2) using the weights

$$\begin{aligned} \begin{aligned} K_{x,\{x\}}&= \frac{1}{2}, \\ K_{x, \{x-k,\ldots ,x-1,x+1,\ldots ,x+k\}}&= \frac{1}{2}, \\ K_{x,U}&= 0 \qquad \text {otherwise.} \end{aligned} \end{aligned}$$
(6.3)

Using arguments identical to those in Theorem 2 of Chapter 3 of [2], the mixing time \(\tau \) of this walk can be shown to be \(\tau = \varTheta (n^{2})\). Theorem 3.1 of [1] (combined with remark 2.1) implies that any lift of this chain must have mixing time at least \(\widehat{\tau } = \varOmega \left( \frac{n}{\sqrt{\log (kn)}} \right) \).

In the other direction, set \(\beta = \frac{1}{12n}\), \(\gamma = \frac{3}{4}\). For any set \(S \subset \mathbb {Z}_{kn}\) with \(\pi (S) \le \frac{1}{12n}\),

$$\begin{aligned} \frac{Q(S,y)}{\pi (y)} \le \frac{1}{2} + \frac{1}{2k} | \{ x \in S \, : \, 1 \le d(x,y) \le k\} | \le \frac{3}{4}. \end{aligned}$$

Thus, part (1) of Assumptions 2.7 is satisfied with \(\beta = \frac{1}{12n}\) and \(\gamma = \frac{3}{4}\). Using the representation of K given by Eq. (6.3), we have \(\pi _{*} = \frac{2}{n}\). By inequality (6.1), then, we find \(\widehat{\tau } = \varOmega \left( \frac{n}{\log (n)} \right) \).

Let \(\{ k(\lambda ) \}_{ \lambda \in \mathbb {N}}\) be a sequence with the property \(k(\lambda ) \gg n(\lambda )^{\log (n(\lambda ))}\). Then let \(\{X_{t}^{(\lambda )}\}_{t \ge 0}\) be the Markov chain with kernel given by Eq. (6.2) with \(k = k(\lambda )\). In this regime, the bound from inequality (6.1) is tighter than that of Theorem 3.1 of [1]. More generally, we expect inequality (6.1) to be tighter than Theorem 3.1 of [1] when the discrete state space \(\varOmega \) of the Markov chain \(\{X_{t}\}_{t \ge 0}\) is extremely large compared to the support of \(K(x,\cdot )\) for all x.

Appendix 2: Mixing Bound for the Continuous Cycle Walk

In this section, we show that the kernel K from example 1 has mixing time \(\tau = \varTheta (c^{-2})\). Let \(\mu \) be the uniform measure on \([-c,c]\) as a subset of the torus [0, 1]. It is well known that, as a compact abelian Lie group, the characters of the torus [0, 1] are given by \(\chi _{n}(x) = e^{2 \pi inx}\). We calculate

$$\begin{aligned} \int _{[0,1]} \chi _{n}(x) \mathrm{d}\mu (x)&= \frac{\sin (2 \pi n c)}{2 \pi n c}. \end{aligned}$$

By Lemma 4.3 of [12], this means that for \(A >1\), \(T > A c^{-2}\) and \(S \subseteq [0,1]\) measurable, we have for all \(c < C_{0}\) sufficiently small that

$$\begin{aligned} \vert K^{T}(x,S) - \pi (S) \vert&\le \frac{1}{4} \sum _{n>1} \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T} \\&\le \frac{1}{4} \sum _{n=2}^{\lfloor \frac{1}{20 \pi \, c} \rfloor } \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T} + \frac{1}{4} \sum _{n=\lfloor \frac{1}{20 \pi \, c} \rfloor }^{\lceil \frac{1}{4 c} \rceil } \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T}\\&\quad + \frac{1}{4}\sum _{n> \lceil \frac{1}{4c} \rceil } \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T} \\&\le \frac{1}{2} \sum _{n=2}^{\lfloor \frac{1}{20 \pi \, c} \rfloor } \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T} + \frac{1}{2} \sum _{n=\lfloor \frac{1}{20 \pi \, c} \rfloor }^{\lceil \frac{1}{4 c} \rceil } \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T} \\&\quad + \frac{1}{4}\sum _{n> \lceil \frac{1}{2c} \rceil } \left( \frac{\sin (2 \pi n c)}{2 \pi n c}\right) ^{T} \\&\le \frac{1}{2} \sum _{n=2}^{\lfloor \frac{1}{20 \pi \, c} \rfloor } \left( 1 - \frac{9}{10} \frac{(2 \pi n c)^{2}}{6} \right) \\&\quad + \frac{1}{2} \sum _{n=\lfloor \frac{1}{20 \pi \, c} \rfloor }^{\lceil \frac{1}{4 c} \rceil } \left( 0.999 \right) ^{T} + \frac{1}{4}\sum _{n > \lceil \frac{1}{2c} \rceil } \left( \frac{1}{2 \pi n c}\right) ^{T} \\&\le \frac{1}{2} \sum _{n=2}^{\infty } e^{- \frac{36 \pi ^{2}}{60} A \, n^{2}} + c^{-1} \left( 0.999 \right) ^{A c^{-2}} + \frac{1}{4} \int _{\frac{1}{2c} -2}^{\infty } \left( \frac{1}{2 \pi n c}\right) ^{T} \mathrm{d}x \\&\le e^{- \frac{36 \pi ^{2}}{60} A } + c^{-1} \left( 0.999 \right) ^{A c^{-2}} + \frac{1}{4} \frac{1}{\frac{1}{2c} - 2} \frac{1}{A c^{-2} - 1} \pi ^{-A c^{-2}}. \end{aligned}$$

Since each of the three terms in the final line can be made arbitrarily small (uniformly in \(c < C_{0}\)) by choosing A large, this implies \(\tau = O(c^{-2})\).

To show the reverse inequality, let \(T(c) = A_{c} c^{-2}\), for some sequence \(A_{c} \rightarrow 0\) as \(c \rightarrow \infty \). For a copy of the chain started at \(X_{0}=0\), we have by Bernstein’s inequality \(P[ \vert X_{T(c)} \vert > \frac{1}{10}] \le 2 e^{- \frac{3}{2 A_{c}} \frac{1}{30 + c} } \rightarrow 0\). Thus, \(\limsup _{c \rightarrow \infty } \vert \vert \mathcal {L}(X_{T(c)}) - U \vert \vert _{TV} \ge \frac{1}{2}\), and so \(\tau = \varOmega (c^{-2})\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramanan, K., Smith, A. Bounds on Lifting Continuous-State Markov Chains to Speed Up Mixing. J Theor Probab 31, 1647–1678 (2018). https://doi.org/10.1007/s10959-017-0745-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10959-017-0745-5

Keywords

Mathematics Subject Classification (2010)

Navigation