Advertisement

Statistics and Computing

, Volume 29, Issue 1, pp 177–202 | Cite as

Irreversible samplers from jump and continuous Markov processes

  • Yi-An MaEmail author
  • Emily B. Fox
  • Tianqi Chen
  • Lei Wu
Article

Abstract

In this paper, we propose irreversible versions of the Metropolis–Hastings (MH) and Metropolis-adjusted Langevin algorithm (MALA) with a main focus on the latter. For the former, we show how one can simply switch between different proposal and acceptance distributions upon rejection to obtain an irreversible jump sampler (I-Jump). The resulting algorithm has a simple implementation akin to MH, but with the demonstrated benefits of irreversibility. We then show how the previously proposed MALA method can also be extended to exploit irreversible stochastic dynamics as proposal distributions in the I-Jump sampler. Our experiments explore how irreversibility can increase the efficiency of the samplers in different situations.

Keywords

Bayesian inference Hamiltonian Monte Carlo Irreversible samplers Jump processes Markov chain Monte Carlo Metropolis–Hastings 

Notes

Acknowledgements

This work was supported in part by ONR Grant N00014-15-1-2380, NSF CAREER Award IIS-1350133 and the TerraSwarm Research Center sponsored by MARCO and DARPA. We thank Samuel Livingstone, Paul Fearnhead, Galin Jones, Hong Qian and Michael I. Jordan for helpful suggestions and discussions. Y.-A. Ma would like to thank Sebastian J. Vollmer for directing him to the reference (Komorowski et al. 2012). We also thank the reviewers for their thoughtful comments and suggestions.

References

  1. Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18, 343–373 (2008)MathSciNetCrossRefGoogle Scholar
  2. Bardenet, R., Doucet, A., Holmes, C.: On Markov chain Monte Carlo methods for tall data. arXiv:1505.02827 (2015)
  3. Bardenet, R., Doucet, A., Holmes, C.: Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 30th International Conference on Machine Learning (ICML’14) (2014)Google Scholar
  4. Barp, A., Briol, F.-X., Kennedy, A. D., Girolami, M.: Geometry and dynamics for Markov chain Monte Carlo. arXiv:1705.02891 (2017)
  5. Bartlett, M.S.: Smoothing periodograms from time-series with continuous spectra. Nature 161, 686–687 (1948)CrossRefGoogle Scholar
  6. Bierkens, J., Fearnhead, P., Roberts, G.: The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. arXiv:1607.03188 (2016)
  7. Bierkens, J., Roberts, G.: A piecewise deterministic scaling limit of Lifted Metropolis–Hastings in the Curie-Weiss model. arXiv:1509.00302 (2016)
  8. Bierkens, J.: Non-reversible metropolis-hastings. Stat. Comput. 26, 1–16 (2015)MathSciNetzbMATHGoogle Scholar
  9. Bouchard-Côté, A., Vollmer, S.J., Doucet, A.: The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. arXiv:1510.02451 (2016)
  10. Bou-Rabee, N., Owhadi, H.: Long-run accuracy of variational integrators in the stochastic context. SIAM J. Num. Anal. 48, 278–297 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  11. Chen, C., Ding, N., Carin, L.: On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In: Advances in Neural Information Processing Systems 28 (NIPS’15), pp. 2278–2286 (2015)Google Scholar
  12. Chen, T., Fox, E. B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: Proceeding of 31st International Conference on Machine Learning (ICML’14) (2014)Google Scholar
  13. Chen, F., Lovász, L., Pak, I.: Lifting Markov chains to speed up mixing. In: Proceedings of the 31st annual ACM STOC, pp. 275–281 (1999)Google Scholar
  14. Chen, T.-L., Hwang, C.-R.: Accelerating reversible Markov chains. Stat. Probab. Lett. 83(9), 1956–1962 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Chib, S., Greenberg, E.: Understanding the Metropolis-Hastings algorithm. Am. Stat. 49(4), 327–335 (1995)Google Scholar
  16. Crooks, G.: Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 60, 2721–2726 (1999)CrossRefGoogle Scholar
  17. Dembo, A., Deuschel, J.-D.: Markovian perturbation, response and fluctuation dissipation theorem. Ann. Inst. H. Poincaré Probab. Stat. 46, 822–852 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  18. Deuschel, J.D., Stroock, D.W.: Large Deviations. American Mathematical Society, Providence (2001)zbMATHGoogle Scholar
  19. Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab. 10, 726–752 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  20. Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., Neven, H.: Bayesian sampling using stochastic gradient thermostats. In: Advances in Neural Information Processing Systems 27 (NIPS’14) (2014)Google Scholar
  21. Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)CrossRefGoogle Scholar
  22. Duncan, A.B., Lelièvre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163(3), 457–491 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  23. Flegal J. M., Vats, D., Jones, G. L.: Strong consistency of multivariate spectral variance estimators in Markov chain Monte Carlo. arXiv:1507.08266 (2016)
  24. Flegal, J.M., Vats, D., Jones, G.L.: Multivariate output analysis for Markov chain Monte Carlo (2017)Google Scholar
  25. Gelman, A., Carhn, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall, Boca Raton (2004)Google Scholar
  26. Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–483 (1992)CrossRefGoogle Scholar
  27. Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)MathSciNetCrossRefGoogle Scholar
  28. Gustafson, P.: A guided walk Metropolis algorithm. Stat. Comput. 8(4), 357–364 (1998)CrossRefGoogle Scholar
  29. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Hatano, T., Sasa, S.-I.: Steady-state thermodynamics of Langevin systems. Phys. Rev. Lett. 86, 3463–3466 (2001)CrossRefGoogle Scholar
  31. Horowitz, A.M.: A generalized guided Monte Carlo algorithm. Phys. Lett. B 268(2), 247–252 (1991)CrossRefGoogle Scholar
  32. Hwang, C.-R., Hwang-Ma, S.-Y., Sheu, S.-J.: Accelerating Gaussian diffusions. Ann. Appl. Probab. 3(3), 897–913, 08 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  33. Hwang, C.-R., Hwang-Ma, S.-Y., Sheu, S.-J.: Accelerating diffusions. Ann. Appl. Probab. 15(2), 1433, 05–1444 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  34. Jansen, S., Kurt, N.: On the notion(s) of duality for Markov processes. Probab. Surv. 11, 59–120 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  35. Jarner, S.F., Roberts, G.O.: Convergence of heavy-tailed Monte Carlo Markov chain algorithms. Scand. J. Stat. 34(4), 781–815 (2007)MathSciNetzbMATHGoogle Scholar
  36. Kaiser, Marcus, Jack, Robert L., Zimmer, Johannes: Acceleration of convergence to equilibrium in Markov chains by breaking detailed balance. J. Stat. Phys. 168(2), 259–287 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  37. Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with ARCH models. Rev. Econ. Stud. 65, 361–393 (1998)CrossRefzbMATHGoogle Scholar
  38. Komorowski, T., Landim, C., Olla, S.: Fluctuations in Markov Processes—Time Symmetry and Martingale Approximation. Springer, Berlin (2012)CrossRefzbMATHGoogle Scholar
  39. Korattikara, A., Chen, Y., Welling, M.: Austerity in MCMC land: cutting the Metropolis-Hastings budget. In: Proceedings of the 30th International Conference on Machine Learning (ICML’14) (2014)Google Scholar
  40. Kou, S.C., Zhou, Q., Wong, W.H.: Discussion paper: equi-energy sampler with applications in statistical inference and statistical mechanics. Ann. Stat. 34(4), 1581–1619 (2006)CrossRefzbMATHGoogle Scholar
  41. Kwon, C., Ao, P., Thouless, D.J.: Structure of stochastic dynamics near fixed points. Proc. Natl. Acad. Sci. 102(37), 13029–13033 (2005)CrossRefGoogle Scholar
  42. Leimkuhler, B., Shang, X.: Adaptive thermostats for noisy gradient systems. SIAM J. Sci. Comput. 38(2), A712–A736 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  43. Leimkuhler, B., Matthews, C., Tretyakov, M.: On the long-time integration of stochastic gradient systems. Proc. R. Soc. A 470, 20140120 (2014)MathSciNetCrossRefGoogle Scholar
  44. Leliévre, T., Nier, F., Pavliotis, G.A.: Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J. Stat. Phys. 152, 237–274 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  45. Liu, C., Zhu, J., Song, Y.: Stochastic gradient geodesic MCMC methods. In: Advances in Neural Information Processing Systems 29 (NIPS’16), pp 3009–3017 (2016)Google Scholar
  46. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Berlin (2001)zbMATHGoogle Scholar
  47. Lu, X., Perrone, V., Hasenclever, L., Teh, Y.W., Vollmer, S.J.: Relativistic Monte Carlo. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS’17) (2017)Google Scholar
  48. Ma, Y.-A, Chen, T., Fox, E.: A complete recipe for stochastic gradient MCMC. In: Advances in Neural Information Processing Systems 28 (NIPS’15), pp. 2899–2907 (2015)Google Scholar
  49. Ma, Y.-A., Qian, H.: Universal ideal behavior and macroscopic work relation of linear irreversible stochastic thermodynamics. New J. Phys. 17(6), 065013 (2015)MathSciNetCrossRefGoogle Scholar
  50. Metropolis, M., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys 21, 1087–1092 (1953)CrossRefGoogle Scholar
  51. Neal, R.M.: Improving asymptotic variance of MCMC estimators: non-reversible chains are better. arXiv:math/0407281 (2004)
  52. Neal, R.M.: Bayesian Learning for Neural Networks. Springer, Berlin (1996)CrossRefzbMATHGoogle Scholar
  53. Neal, R.M.: MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo 54, 113–162 (2010)zbMATHGoogle Scholar
  54. Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli 22(1), 60–106, 02 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  55. Patterson, S., Teh, Y.W.: Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In: Advances in Neural Information Processing Systems 26 (NIPS’13) (2013)Google Scholar
  56. Pavliotis, G.A.: Stochastic Processes and Applications. Springer, Berlin (2014)zbMATHGoogle Scholar
  57. Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations. Springer, Berlin (1983)CrossRefzbMATHGoogle Scholar
  58. Poncet, R.: Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833 (2017)
  59. Priestley, M.B.: Spectral Analysis and Time Series. Academic, San Diego (1981)zbMATHGoogle Scholar
  60. Qian, H.: A decomposition of irreversible diffusion processes without detailed balance. J. Math. Phys. 54, 053302 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  61. Qian, H., Qian, M., Tang, X.: Thermodynamics of the general diffusion process: time-reversibility and entropy production. J. Stat. Phys. 107, 1129 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  62. Rey-Bellet, L., Spiliopoulos, K.: Irreversible Langevin samplers and variance reduction: a large deviations approach. Nonlinearity 28(7), 2081 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  63. Rey-Bellet, L., Spiliopoulos, K.: Improving the convergence of reversible samplers. J. Stat. Phys. 164(3), 472–494 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  64. Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, Berlin (2004)CrossRefzbMATHGoogle Scholar
  65. Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 4, 337–357 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  66. Shang, X., Zhu, Z., Leimkuhler, B., Storkey, A.: Covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling. In: Advances in Neural Information Processing Systems 28 (NIPS’15) (2015)Google Scholar
  67. Shi, J., Chen, T., Yuan, R., Yuan, B., Ao, P.: Relation of a new interpretation of stochastic differential equations to Itô process. J. Stat. Phys. 148(3), 579–590 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  68. Tak, H., Meng, X.-L., van Dyk, D. A.: A repulsive-attractive Metropolis algorithm for multimodality. arXiv:1601.05633 (2016)
  69. Turitsyn, K.S., Chertkov, M., Vucelja, M.: Irreversible Monte Carlo algorithms for efficient sampling. Physica D 240(4–5), 410–414 (2011)CrossRefzbMATHGoogle Scholar
  70. Villani, C.: Hypocoercivity. American Mathematical Society, Providence (2009)CrossRefzbMATHGoogle Scholar
  71. Vucelja, M.: Lifting—a nonreversible Markov chain Monte Carlo algorithm. arXiv:1412.8762 (2015)
  72. Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML’11), pp. 681–688 (2011)Google Scholar
  73. Wu, S.-J., Hwang, C.-R., Chu, M.T.: Attaining the optimal Gaussian diffusion acceleration. J. Stat. Phys. 155(3), 571–590 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  74. Xifara, T., Sherlock, C., Livingstone, S., Byrne, S., Girolami, M.: Langevin diffusions and the Metropolis-adjusted Langevin algorithm. Stat. Probab. Lett. 91, 14–19 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  75. Yin, L., Ao, P.: Existence and construction of dynamical potential in nonequilibrium processes without detailed balance. J. Phys. A 39(27), 8593 (2006)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer SciencesUniversity of CaliforniaBerkeleyUSA
  2. 2.Paul G. Allen School of Computer Science and Engineering and Department of StatisticsUniversity of WashingtonSeattleUSA
  3. 3.Paul G. Allen School of Computer Science and EngineeringUniversity of WashingtonSeattleUSA
  4. 4.School of Mathematical SciencesPeking UniversityBeijingChina

Personalised recommendations