Abstract
The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called Pseudo-Marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.
Similar content being viewed by others
References
Alquier, P., Friel, N., Everitt, R. and Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Stat. Comput. 26, 1-2, 29–47.
Andrieu, C. and Roberts, G.O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 2, 697–725.
Bardenet, R., Doucet, A. and Holmes, C. (2014). Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach, p. 405–413.
Bardenet, R., Doucet, A. and Holmes, C. (2017). On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18, 47, 1–43.
Beaumont, M.A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164, 3, 1139–1160.
Betancourt, M. (2015). The fundamental incompatibility of scalable Hamiltonian Monte Carlo and naive data subsampling. In International Conference on Machine Learning, pp. 533–540.
Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv:1701.02434.
Bierkens, J., Fearnhead, P. and Roberts, G (2018). The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Annals of Statistics, forthcoming.
Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 518, 859–877.
Bouchard-Côté, A., Vollmer, S.J. and Doucet, A. (2018). The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113, 522, 855–867.
Brooks, S., Gelman, A., Jones, G. and Meng, X.-L (2011). Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton.
Ceperley, D. and Dewing, M. (1999). The penalty method for random walks with uncertain energies. J. Chem. Phys. 110, 20, 9812–9820.
Chen, C.-F. (1985). On asymptotic normality of limiting density functions with Bayesian implications. J. R. Stat. Soc. Ser. B Stat. Methodol. 47, 3, 540–546.
Chen, T., Fox, E. and Guestrin, C. (2014). Stochastic gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning, pp. 1683–1691.
Dang, K.-D., Quiroz, M., Kohn, R., Tran, M.-N. and Villani, M. (2017). Hamiltonian Monte Carlo with energy conserving subsampling. arXiv:1708.00955.
Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107–113.
Del Moral, P. (2004). Feynman-Kac formulae: genealogical and interacting particle systems with applications. Springer, Berlin.
Deligiannidis, G., Doucet, A. and Pitt, M.K. (2018). The correlated pseudo-marginal method. Journal of the Royal Statistical Society B, forthcoming.
Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87, 418, 376–382.
Doucet, A., De Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo methods in practice, pp. 3–14. Springer.
Doucet, A., Pitt, M., Deligiannidis, G. and Kohn, R. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102, 2, 295–313.
Duane, S., Kennedy, A.D., Pendleton, B.J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195, 2, 216–222.
Flury, T. and Shephard, N. (2011). Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econometric Theory 27, 5, 933–956.
Gelman, A., Vehtari, A., Jylänki, P., Sivula, T., Tran, D., Sahai, S., Blomstedt, P., Cunningham, J.P., Schiminovich, D. and Robert, C. (2017). Expectation Propagation as a way of life: A framework for Bayesian inference on partitioned data. arXiv:1412.4869.
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 6, 721–741.
Gunawan, D., Kohn, R., Quiroz, M., Dang, K.-D. and Tran, M.-N. (2018). Subsampling sequential Monte Carlo for static Bayesian models. arXiv:1805.03317.
Hammersley, J.M. and Handscomb, D.C. (1964). Monte Carlo methods. Chapman and Hall, London.
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97–109.
Hoffman, M.D. and Gelman, A. (2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1, 1593–1623.
Joe, H. (2014). Dependence modeling with copulas. CRC Press, Boca Raton.
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S. and Saul, L.K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 2, 183–233.
Korattikara, A., Chen, Y. and Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, pp. 181–189.
Lin, L., Liu, K. and Sloan, J. (2000). A noisy Monte Carlo algorithm. Phys. Rev. D 61, 7, 074505.
Lyne, A.-M., Girolami, M., Atchade, Y., Strathmann, H. and Simpson, D. (2015). On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Stat. Sci. 30, 4, 443–467.
Maclaurin, D. and Adams, R.P. (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014).
Maire, F., Friel, N. and Alquier, P. (2018). Informed sub-sampling MCMC: Approximate Bayesian inference for large datasets. Statistics and Computing, forthcoming.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 6, 1087–1092.
Minka, T.P. (2001). Expectation Propagation for approximate Bayesian inference. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pp. 362–369. Morgan Kaufmann Publishers Inc.
Minsker, S., Srivastava, S., Lin, L. and Dunson, D. (2014). Scalable and robust Bayesian inference via the median posterior. In International Conference on Machine Learning, pp. 1656–1664.
Neal, R.M. et al. (2011). MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11, 2.
Neiswanger, W., Wang, C. and Xing, E. (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv:1311.4780.
Nemeth, C. (2018). Sherlock Merging MCMC subposteriors through Gaussian-process approximations. Bayesian Anal. 13, 2, 507–530.
Nicholls, G.K., Fox, C. and Watt, A.M. (2012). Coupled MCMC with a randomized acceptance probability. arXiv:1205.6857.
Papaspiliopoulos, O. (2009). A methodological framework for Monte Carlo probabilistic inference for diffusion processes. Manuscript. Available at http://wrap.warwick.ac.uk/35220/1/WRAP_Papaspiliopoulos_09-31w.pdf.
Pitt, M.K., dos Santos Silva, R., Giordani, P. and Kohn, R. (2012). On some properties of Markov Chain Monte Carlo simulation methods based on the particle filter. J. Econ. 171, 2, 134–151.
Plummer, M., Best, N., Cowles, K. and Vines, K. (2006). Coda: Convergence diagnosis and output analysis for MCMC. R News 6, 1, 7–11.
Quiroz, M., Kohn, R., Villani, M. and Tran, M.-N. (2018a). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, forthcoming.
Quiroz, M., Tran, M.-N., Villani, M. and Kohn, R. (2018b). Speeding up MCMC by delayed acceptance and data subsampling. J. Comput. Graph. Stat. 27, 12– 22.
Quiroz, M., Tran, M.-N., Villani, M., Kohn, R. and Dang, K.-D. (2018c). The block-Poisson estimator for optimally tuned exact Subsampling MCMC. arXiv:1603.08232.
Quiroz, M., Villani, M. and Kohn, R. (2014). Speeding up MCMC by efficient data subsampling. arXiv:1603.08232v1.
Rhee, C. and Glynn, P.W. (2015). Unbiased estimation with square root convergence for SDE models. Oper. Res. 63, 5, 1026–1043.
Roberts, G.O., Gelman, A. and Gilks, W.R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7, 1, 110–120.
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71, 2, 319–392.
Särndal, C.-E., Swensson, B. and Wretman, J. (2003). Model assisted survey sampling. Springer Science & Business Media, Berlin.
Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I. and McCulloch, R.E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11, 2, 78–88.
Sherlock, C., Thiery, A.H., Roberts, G.O. and Rosenthal, J.S. (2015). On the efficiency of pseudo-marginal random walk Metropolis algorithms. Ann. Stat. 43, 1, 238– 275.
Steel, D. and McLaren, C. (2009). Design and analysis of surveys repeated over time. Handbook of Statist. 29, 289–313.
Van der Vaart, A.W. (1998). Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge.
Wagner, W. (1988). Unbiased multi-step estimators for the Monte Carlo evaluation of certain functional integrals. J. Comput. Phys. 79, 2, 336–352.
Wang, X. and Dunson, D.B. (2014). Parallel MCMC via Weierstrass sampler. arXiv:1312.4605v2.
Acknowledgements
Matias Quiroz and Robert Kohn were partially supported by Australian Research Council Center of Excellence grant CE140100049.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Algorithms
This appendix contains the main sampling algorithms discussed in the paper.
Appendix B: Details for the Poisson Regression Example
This appendix gives the details for the control variates in our illustrative Poisson regression example. Quiroz et al. (2018a) give general expressions for the gradients and hessians in the GLM class, and provide general compact expression that reduces the computational complexity of the control variates.
The Poisson Regression Model
The Poisson regression is of the form
where 𝜃 = (α, β)T.
Parameter-Expanded Control Variates
Let \(\mathbf {w}_{i} = (1,\mathbf {x}_{i}^{T})^{T}\). The log-likelihood contribution from the i th observation is
with gradient and hessian
Let μ(𝜃, x) = α + xTβ = wT𝜃. The parameter-expanded control variate in (3.4) is then
Data-Expanded Control Variates
The log-likelihood contribution from the i th observation is
with gradient and hessian
where \(\psi _{k}(z) = {\nabla _{z}^{k}} \log {\Gamma }(z)\) is the polygamma function of order k,
We can write the gradients and hessian compactly by defining \(\mathbf {z}_{i}=(y_{i},\mathbf {x}_{i}^{T})^{T}\),
Let μ(𝜃, x) = α + xTβ. The data-expanded control variate in Eq. 3.5 can after some simplifications be expressed as
Rights and permissions
About this article
Cite this article
Quiroz, M., Villani, M., Kohn, R. et al. Subsampling MCMC - an Introduction for the Survey Statistician. Sankhya A 80 (Suppl 1), 33–69 (2018). https://doi.org/10.1007/s13171-018-0153-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-018-0153-7