Subsampling MCMC - an Introduction for the Survey Statistician

Quiroz, Matias; Villani, Mattias; Kohn, Robert; Tran, Minh-Ngoc; Dang, Khue-Dung

doi:10.1007/s13171-018-0153-7

Subsampling MCMC - an Introduction for the Survey Statistician

Published: 17 December 2018

Volume 80, pages 33–69, (2018)
Cite this article

Sankhya A Aims and scope Submit manuscript

Matias Quiroz ORCID: orcid.org/0000-0002-4400-9184¹,
Mattias Villani^2,3,
Robert Kohn¹,
Minh-Ngoc Tran⁴ &
…
Khue-Dung Dang¹

376 Accesses
11 Citations
5 Altmetric
Explore all metrics

Abstract

The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called Pseudo-Marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

References

Alquier, P., Friel, N., Everitt, R. and Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Stat. Comput. 26, 1-2, 29–47.
Article MathSciNet MATH Google Scholar
Andrieu, C. and Roberts, G.O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 2, 697–725.
Article MathSciNet MATH Google Scholar
Bardenet, R., Doucet, A. and Holmes, C. (2014). Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach, p. 405–413.
Bardenet, R., Doucet, A. and Holmes, C. (2017). On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18, 47, 1–43.
MathSciNet MATH Google Scholar
Beaumont, M.A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164, 3, 1139–1160.
Google Scholar
Betancourt, M. (2015). The fundamental incompatibility of scalable Hamiltonian Monte Carlo and naive data subsampling. In International Conference on Machine Learning, pp. 533–540.
Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv:1701.02434.
Bierkens, J., Fearnhead, P. and Roberts, G (2018). The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Annals of Statistics, forthcoming.
Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 518, 859–877.
Article MathSciNet Google Scholar
Bouchard-Côté, A., Vollmer, S.J. and Doucet, A. (2018). The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113, 522, 855–867.
Article MathSciNet MATH Google Scholar
Brooks, S., Gelman, A., Jones, G. and Meng, X.-L (2011). Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton.
Book MATH Google Scholar
Ceperley, D. and Dewing, M. (1999). The penalty method for random walks with uncertain energies. J. Chem. Phys. 110, 20, 9812–9820.
Article Google Scholar
Chen, C.-F. (1985). On asymptotic normality of limiting density functions with Bayesian implications. J. R. Stat. Soc. Ser. B Stat. Methodol. 47, 3, 540–546.
MathSciNet MATH Google Scholar
Chen, T., Fox, E. and Guestrin, C. (2014). Stochastic gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning, pp. 1683–1691.
Dang, K.-D., Quiroz, M., Kohn, R., Tran, M.-N. and Villani, M. (2017). Hamiltonian Monte Carlo with energy conserving subsampling. arXiv:1708.00955.
Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107–113.
Article Google Scholar
Del Moral, P. (2004). Feynman-Kac formulae: genealogical and interacting particle systems with applications. Springer, Berlin.
Book MATH Google Scholar
Deligiannidis, G., Doucet, A. and Pitt, M.K. (2018). The correlated pseudo-marginal method. Journal of the Royal Statistical Society B, forthcoming.
Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87, 418, 376–382.
Article MathSciNet MATH Google Scholar
Doucet, A., De Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo methods in practice, pp. 3–14. Springer.
Doucet, A., Pitt, M., Deligiannidis, G. and Kohn, R. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102, 2, 295–313.
Article MathSciNet MATH Google Scholar
Duane, S., Kennedy, A.D., Pendleton, B.J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195, 2, 216–222.
Article MathSciNet Google Scholar
Flury, T. and Shephard, N. (2011). Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econometric Theory 27, 5, 933–956.
Article MathSciNet MATH Google Scholar
Gelman, A., Vehtari, A., Jylänki, P., Sivula, T., Tran, D., Sahai, S., Blomstedt, P., Cunningham, J.P., Schiminovich, D. and Robert, C. (2017). Expectation Propagation as a way of life: A framework for Bayesian inference on partitioned data. arXiv:1412.4869.
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 6, 721–741.
Article MATH Google Scholar
Gunawan, D., Kohn, R., Quiroz, M., Dang, K.-D. and Tran, M.-N. (2018). Subsampling sequential Monte Carlo for static Bayesian models. arXiv:1805.03317.
Hammersley, J.M. and Handscomb, D.C. (1964). Monte Carlo methods. Chapman and Hall, London.
Book MATH Google Scholar
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97–109.
Article MathSciNet MATH Google Scholar
Hoffman, M.D. and Gelman, A. (2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1, 1593–1623.
MathSciNet MATH Google Scholar
Joe, H. (2014). Dependence modeling with copulas. CRC Press, Boca Raton.
Book MATH Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S. and Saul, L.K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 2, 183–233.
Article MATH Google Scholar
Korattikara, A., Chen, Y. and Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, pp. 181–189.
Lin, L., Liu, K. and Sloan, J. (2000). A noisy Monte Carlo algorithm. Phys. Rev. D 61, 7, 074505.
Article Google Scholar
Lyne, A.-M., Girolami, M., Atchade, Y., Strathmann, H. and Simpson, D. (2015). On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Stat. Sci. 30, 4, 443–467.
Article MathSciNet MATH Google Scholar
Maclaurin, D. and Adams, R.P. (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014).
Maire, F., Friel, N. and Alquier, P. (2018). Informed sub-sampling MCMC: Approximate Bayesian inference for large datasets. Statistics and Computing, forthcoming.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 6, 1087–1092.
Article Google Scholar
Minka, T.P. (2001). Expectation Propagation for approximate Bayesian inference. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pp. 362–369. Morgan Kaufmann Publishers Inc.
Minsker, S., Srivastava, S., Lin, L. and Dunson, D. (2014). Scalable and robust Bayesian inference via the median posterior. In International Conference on Machine Learning, pp. 1656–1664.
Neal, R.M. et al. (2011). MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11, 2.
MATH Google Scholar
Neiswanger, W., Wang, C. and Xing, E. (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv:1311.4780.
Nemeth, C. (2018). Sherlock Merging MCMC subposteriors through Gaussian-process approximations. Bayesian Anal. 13, 2, 507–530.
Article MathSciNet MATH Google Scholar
Nicholls, G.K., Fox, C. and Watt, A.M. (2012). Coupled MCMC with a randomized acceptance probability. arXiv:1205.6857.
Papaspiliopoulos, O. (2009). A methodological framework for Monte Carlo probabilistic inference for diffusion processes. Manuscript. Available at http://wrap.warwick.ac.uk/35220/1/WRAP_Papaspiliopoulos_09-31w.pdf.
Pitt, M.K., dos Santos Silva, R., Giordani, P. and Kohn, R. (2012). On some properties of Markov Chain Monte Carlo simulation methods based on the particle filter. J. Econ. 171, 2, 134–151.
Article MathSciNet MATH Google Scholar
Plummer, M., Best, N., Cowles, K. and Vines, K. (2006). Coda: Convergence diagnosis and output analysis for MCMC. R News 6, 1, 7–11.
Google Scholar
Quiroz, M., Kohn, R., Villani, M. and Tran, M.-N. (2018a). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, forthcoming.
Quiroz, M., Tran, M.-N., Villani, M. and Kohn, R. (2018b). Speeding up MCMC by delayed acceptance and data subsampling. J. Comput. Graph. Stat. 27, 12– 22.
Article MathSciNet Google Scholar
Quiroz, M., Tran, M.-N., Villani, M., Kohn, R. and Dang, K.-D. (2018c). The block-Poisson estimator for optimally tuned exact Subsampling MCMC. arXiv:1603.08232.
Quiroz, M., Villani, M. and Kohn, R. (2014). Speeding up MCMC by efficient data subsampling. arXiv:1603.08232v1.
Rhee, C. and Glynn, P.W. (2015). Unbiased estimation with square root convergence for SDE models. Oper. Res. 63, 5, 1026–1043.
Article MathSciNet MATH Google Scholar
Roberts, G.O., Gelman, A. and Gilks, W.R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7, 1, 110–120.
Article MathSciNet MATH Google Scholar
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71, 2, 319–392.
Article MathSciNet MATH Google Scholar
Särndal, C.-E., Swensson, B. and Wretman, J. (2003). Model assisted survey sampling. Springer Science & Business Media, Berlin.
MATH Google Scholar
Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I. and McCulloch, R.E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11, 2, 78–88.
Google Scholar
Sherlock, C., Thiery, A.H., Roberts, G.O. and Rosenthal, J.S. (2015). On the efficiency of pseudo-marginal random walk Metropolis algorithms. Ann. Stat. 43, 1, 238– 275.
Article MathSciNet MATH Google Scholar
Steel, D. and McLaren, C. (2009). Design and analysis of surveys repeated over time. Handbook of Statist. 29, 289–313.
Article Google Scholar
Van der Vaart, A.W. (1998). Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge.
Book MATH Google Scholar
Wagner, W. (1988). Unbiased multi-step estimators for the Monte Carlo evaluation of certain functional integrals. J. Comput. Phys. 79, 2, 336–352.
Article MathSciNet MATH Google Scholar
Wang, X. and Dunson, D.B. (2014). Parallel MCMC via Weierstrass sampler. arXiv:1312.4605v2.

Download references

Acknowledgements

Matias Quiroz and Robert Kohn were partially supported by Australian Research Council Center of Excellence grant CE140100049.

Author information

Authors and Affiliations

Australian School of Business, University of New South Wales, Sydney, Australia
Matias Quiroz, Robert Kohn & Khue-Dung Dang
Department of Statistics, Stockholm University, Stockholm, Sweden
Mattias Villani
Department of Computer and Information Science, Linköping University, Linköping, Sweden
Mattias Villani
Discipline of Business Analytics, University of Sydney, Camperdown, Australia
Minh-Ngoc Tran

Authors

Matias Quiroz
View author publications
You can also search for this author in PubMed Google Scholar
Mattias Villani
View author publications
You can also search for this author in PubMed Google Scholar
Robert Kohn
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Ngoc Tran
View author publications
You can also search for this author in PubMed Google Scholar
Khue-Dung Dang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matias Quiroz.

Appendices

Appendix A: Algorithms

This appendix contains the main sampling algorithms discussed in the paper.

Appendix B: Details for the Poisson Regression Example

This appendix gives the details for the control variates in our illustrative Poisson regression example. Quiroz et al. (2018a) give general expressions for the gradients and hessians in the GLM class, and provide general compact expression that reduces the computational complexity of the control variates.

The Poisson Regression Model

The Poisson regression is of the form

$$y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta} \overset{indep.}{\sim} \text{Pois}(\lambda_{i}), \hspace{0.5cm} \lambda_{i} = \exp(\alpha + \mathbf{x}_{i}^{T} \beta), $$

where 𝜃 = (α, β)^T.

Parameter-Expanded Control Variates

Let $\mathbf {w}_{i} = (1,\mathbf {x}_{i}^{T})^{T}$. The log-likelihood contribution from the i th observation is

$$\ell_{i}(\boldsymbol{\theta}) = y_{i} \mathbf{w}_{i}^{T} \boldsymbol{\theta} - \exp(\mathbf{w}_{i}^{T} \boldsymbol{\theta}) - \log(y_{i}!) $$

with gradient and hessian

$$\nabla_{\boldsymbol{\theta}} \ell_{i}(\boldsymbol{\theta}) = (y_{i} - \exp(\mathbf{w}_{i}^{T} \boldsymbol{\theta}))\mathbf{w}_{i} $$

$$\nabla_{\boldsymbol{\theta} \boldsymbol{\theta}^{T}}^{2} \ell_{i}(\boldsymbol{\theta}) = - \exp(\mathbf{w}_{i}^{T} \boldsymbol{\theta})\mathbf{w}_{i}\mathbf{w}_{i}^{T}. $$

Let μ(𝜃, x) = α + x^Tβ = w^T𝜃. The parameter-expanded control variate in (3.4) is then

$$\begin{array}{@{}rcl@{}} \ell_{i}(\boldsymbol{\theta}) &\approx& y_{i} \mu(\boldsymbol{\hat \theta},\mathbf{x}_{i}) - \exp(\mu(\boldsymbol{\hat \theta},\mathbf{x}_{i})) - \log(y_{i}!) \\ &&+ [y_{i} - \exp(\mu(\boldsymbol{\hat \theta},\mathbf{x}_{i}))](\mu_{i}(\theta)-\mu_{i}(\boldsymbol{\hat \theta})) \\ &&- \frac{1}{2}\exp(\mu(\boldsymbol{\hat \theta},\mathbf{x}_{i}))(\mu(\theta,\mathbf{x}_{i})-\mu(\boldsymbol{\hat \theta}, \mathbf{x}_{i}))^{2}. \end{array} $$

Data-Expanded Control Variates

The log-likelihood contribution from the i th observation is

$$\ell_{i}(\theta) = y_{i} (\alpha + \mathbf{x}_{i}^{T} \beta) - \exp(\alpha + \mathbf{x}_{i}^{T} \beta) - \log(y_{i}!) $$

with gradient and hessian

$$\nabla_{y_{i}} \ell_{i}(\theta) = \alpha + \mathbf{x}_{i}^{T} \beta - \psi_{0}(y_{i}+ 1), $$

where $\psi _{k}(z) = {\nabla _{z}^{k}} \log {\Gamma }(z)$ is the polygamma function of order k,

$$\begin{array}{*{20}l} \nabla_{\mathbf{x}_{i}} \ell_{i}(\theta) &= (y_{i} - \exp(\alpha + \mathbf{x}_{i}^{T} \beta)) \beta, \hspace{0.1cm} \nabla_{y_{i} y_{i}}^{2} \ell_{i}(\theta) = - \psi_{1}(y_{i}+ 1), \\ \nabla_{\mathbf{x}_{i} \mathbf{x}_{i}^{T}}^{2} \ell_{i}(\boldsymbol{\theta}) &= - \exp(\alpha + \mathbf{x}_{i}^{T} \beta) \beta \beta^{T}, \text{ and } \nabla_{y_{i} \mathbf{x}_{i}^{T}}^{2} \ell_{i}(\boldsymbol{\theta}) = \beta. \end{array} $$

We can write the gradients and hessian compactly by defining $\mathbf {z}_{i}=(y_{i},\mathbf {x}_{i}^{T})^{T}$,

$$\nabla_{z_{i}} \ell_{i}(\boldsymbol{\theta}) = \left[\begin{array}{l} \alpha + \mathbf{x}_{i}^{T} \beta - \psi_{0}(y_{i}+ 1) \\ (y_{i} - \exp(\alpha + \mathbf{x}_{i}^{T} \beta)) \beta \end{array}\right] $$

$$\nabla_{z_{i} {z_{i}^{T}}}^{2} \ell_{i}({\boldsymbol{\theta}}) = \left[\begin{array}{ll} - \psi_{1}(y_{i}+ 1) & \beta^{T} \\ \beta & - \exp(\alpha + \mathbf{x}_{i}^{T} \beta) \beta \beta^{T} \end{array}\right]. $$

Let μ(𝜃, x) = α + x^Tβ. The data-expanded control variate in Eq. 3.5 can after some simplifications be expressed as

$$\begin{array}{@{}rcl@{}} \ell_{i}(\boldsymbol{\theta}) &\approx& y_{c_{i}} \mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}}) - \exp(\mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}})) - \log(y_{c_{i}}!) \\ &&+ (y_{i} - y_{c_{i}})(\mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}}) - \psi_{0}(y_{c_{i}}+ 1)) -\frac{1}{2} (y_{i} - y_{c_{i}})^{2}\psi_{1}(y_{c_{i}}+ 1) \\ &&+[y_{i}-\exp(\mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}}))] (\mu(\boldsymbol{\theta}, \mathbf{x}_{i})-\mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}}) ) \\ &&-\frac{1}{2} \exp(\mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}}))(\mu(\boldsymbol{\theta}, \mathbf{x}_{i})-\mu(\boldsymbol{\theta}, \mathbf{x}_{c_{i}}) )^{2} . \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quiroz, M., Villani, M., Kohn, R. et al. Subsampling MCMC - an Introduction for the Survey Statistician. Sankhya A 80 (Suppl 1), 33–69 (2018). https://doi.org/10.1007/s13171-018-0153-7

Download citation

Received: 11 December 2017
Published: 17 December 2018
Issue Date: 30 December 2018
DOI: https://doi.org/10.1007/s13171-018-0153-7

Keywords and phrases.

AMS (2000) subject classification.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Subsampling MCMC - an Introduction for the Survey Statistician

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References

Acknowledgements