Abstract
If, in the mid 1980s, one had asked the average statistician about the difficulties of using Bayesian Statistics, the most likely answer would have been “Well, there is this problem of selecting a prior distribution and then, even if one agrees on the prior, the whole Bayesian inference is simply impossible to implement in practice!” The same question asked in the Twenty first Century does not produce the same reply, but rather a much less aggressive complaint about the lack of generic software (besides winBUGS), along with the renewed worry of subjectively selecting a prior! The last 20 years have indeed witnessed a tremendous change in the way Bayesian Statistics are perceived, both by mathematical statisticians and by applied statisticians and the impetus behind this change has been a prodigious leap-forward in the computational abilities. The availability of very powerful approximation methods has correlatively freed Bayesian modelling, in terms of both model scope and prior modelling. This opening has induced many more scientists from outside the statistics community to opt for a Bayesian perspective as they can now handle those tools on their own. As discussed below, a most successful illustration of this gained freedom can be seen in Bayesian model choice, which was only emerging at the beginning of the MCMC era, for lack of appropriate computational tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this chapter, the denomination universal is used in the sense of uniformly over all distributions.
- 2.
To impose the stationarity constraint when the order of the AR(p) model varies, it is necessary to reparameterise this model in terms of either the partial autocorrelations or of the roots of the associated lag polynomial. (See, e.g., Robert 2007, Sect. 4.5.)
- 3.
In this presentation of Bayes factors, we completely bypass the methodological difficulty of defining π(θ ∈ Θ 0) when Θ 0 is of measure 0 for the original prior π and refer the reader to Robert (2007, Sect. 5.2.3) and Marin and Robert (2007, Sect. 2.3.2) for proper coverage of this issue.
- 4.
The prior distribution can be used for importance sampling only if it is a proper prior and not a σ-finite measure.
- 5.
The constant order of the Monte Carlo error does not imply that the computational effort remains the same as the dimension increases, most obviously, but rather that the decrease (with m) in variation has the rate \(1/\sqrt{m}\).
- 6.
The empirical (Monte Carlo) confidence interval is not to be confused with the asymptotic confidence interval derived from the normal approximation. As discussed in Robert and Casella (2004, Chap. 4), these two intervals may differ considerably in width, with the interval derived from the CLT being much more optimistic!
- 7.
An alternative to the simulation from one \(\mathcal{T} (\nu,{x}_{i}, 1)\) distribution that does not require an extensive study on the most appropriate x i is to use a mixture of the \(\mathcal{T} (\nu,{x}_{i}, 1)\) distributions. As seen in Sect. 26.5.2, the weights of this mixture can even be optimised automatically.
- 8.
We stress the point that this is mostly an academic exercise as, in regular settings, it is rarely the case that independent components are used for the importance function.
- 9.
Sect. 26.4.3 covers in greater details the setting of varying dimension problems, with the same theme that completion distributions and parameters are necessary but influential for the performances of the approximation.
- 10.
Even in the simple case of the probit model, MCMC algorithms do not always converge very quickly, as shown in Robert and Casella (2004, Chap. 14).
- 11.
It is quite interesting to see that the mixture Gibbs sampler suffers from the same pathology as the EM algorithm, although this is not surprising given that it is based on the same completion scheme.
- 12.
This wealth of possible alternatives to the completion Gibbs sampler is a mixed blessing in that their range, for instance the scale of the random walk proposals, needs to be scaled properly to avoid inefficiencies.
- 13.
The difficulty with the infinite part of the problem is easily solved in that the setting is identical to simulation problems in (countable or uncountable) infinite spaces. When running simulations in those spaces, some values are never visited by the simulated Markov chain and the chances a value is visited is related with the probability of this value under the target distribution.
- 14.
Early proposals to solve the varying dimension problem involved saturation schemes where all the parameters for all models were updated deterministically (Carlin and Chib 1995), but they do not apply for an infinite collection of models and they need to be precisely calibrated to achieve a sufficient amount of moves between models.
- 15.
For a simple proof that the acceptance probability guarantees that the stationary distribution is π(k, θ(k)), see Robert and Casella (2004, Sect. 11.2.2).
- 16.
In the birth acceptance probability, the factorials k! and (k + 1)! appear as the numbers of ways of ordering the k and k + 1 components of the mixtures. The ratio cancels with \(1/(k + 1)\), which is the probability of selecting a particular component for the death step.
- 17.
The “sequential” denomination in the sequential Monte Carlo methods thus refers to the algorithmic part, not to the statistical part.
- 18.
- 19.
Using a Gaussian non-parametric kernel estimator amounts to (a) sampling from the x i (t)’s with equal weights and (b) using a normal random walk move from the selected x i (t), with standard deviation equal to the bandwidth of the kernel.
- 20.
When the survival rate of a proposal distribution is null, in order to avoid the complete removal of a given scale v k , the corresponding number r k of proposals with that scale is set to a positive value, like 1% of the sample size.
- 21.
An R package called mcsm has been developed in association with Robert and Casella (2009) for training about Monte Carlo methods.
References
Abowd, J., Kramarz, F., Margolis, D.: High-wage workers and high-wage firms. Econometrica 67, 251–333 (1999)
Albert, J.: Bayesian Computation Using Minitab. Wadsworth Publishing Company (1996)
Albert, J.H.: Bayesian Computation with R. Springer, New York, (2007)
Andrieu, C., Robert, C.P.: Controlled Markov chain Monte Carlo methods for optimal sampling. Technical Report 0125, Université Paris Dauphine (2001)
Andrieu, C., Doucet, A., Robert, C.P.: Computational advances for and from Bayesian analysis. Stat. Sci. 19(1), 118–127 (2004)
Bauwens, L., Richard, J.F.: A 1-1 Poly-t random variable generator with application to Monte Carlo integration. J. Econometrics 29, 19–46 (1985)
Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)
Beaumont, M.A., Cornuet, J.-M., Marin, J.-M., Robert, C.P.: Adaptive approximate Bayesian computation. Biometrika 96(4), 983–990 (2009)
Berkhof, J., van Mechelen, I., Gelman, A.: A Bayesian approach to the selection and testing of mixture models. Statistica Sinica 13, 423–442 (2003)
Blum, M.G.B., François, O.: Non-linear regression models for approximate Bayesian computation. Stat. Comput. 20, 63–73 (2010)
Bortot, P., Coles, S.G., Sisson, S.A.: Inference for stereological extremes. J. Am. Stat. Assoc. 102, 84–92 (2007)
Cappé, O., Robert, C.P.: MCMC: Ten years and still running! J. Am. Stat. Assoc. 95(4), 1282–1286 (2000)
Cappé, O., Guillin, A., Marin, J.-M., Robert, C.P.: Population Monte Carlo. J. Comput. Graph. Stat. 13(4), 907–929 (2004)
Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)
Cappé, O., Douc, R., Guillin, A., Marin, J.-M., Robert, C.P.: Adaptive importance sampling in general mixture classes. Stat. Comput. 18, 447–459 (2008)
Carlin, B.P., Chib, S.: Bayesian model choice through Markov chain Monte Carlo. J. Roy. Stat. Soc. B. 57(3), 473–484 (1995)
Casella, G., Robert, C.P.: Rao-Blackwellisation of sampling schemes. Biometrika 83(1), 81–94 (1996)
Celeux, G., Hurn, M.A., Robert, C.P.: Computational and inferential difficulties with mixtures posterior distribution. J. Am. Stat. Assoc. 95(3), 957–979 (2000)
Chen, M.H., Shao, Q.M., Ibrahim, J.G.: Monte Carlo Methods in Bayesian Computation. Springer, New York (2000)
Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90, 1313–1321 (1995)
Chopin, N.: Inference and model choice for time-ordered hidden Markov models. J. Roy. Stat. Soc. B. 69(2), 269–284 (2007)
Chopin, N., Robert, Cp.P.: Properties of nested sampling. Biometrika (2010); To appear, see arXiv:0801.3887. 97, 741–755
Cornuet, J.-M., Marin, J.-M., Mira, A., Robert, C.P.: Adaptive multiple importance sampling. Technical Report arXiv.org:0907.1254, CEREMADE, Université Paris, Dauphine (2009)
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. Roy. Stat. Soc. B. 68(3), 411–436 (2006)
Dickey, J.M.: The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann. Math. Stat. 42, 204–223 (1971)
Diebolt, J., Robert, C.P.: Estimation of finite mixture distributions by Bayesian sampling. J. Roy. Stat. Soc. B. 56, 363–375 (1994)
Doornik, J.A., Hendry, D.F., Shephard, N.: Computationally-intensive econometrics using a distributed matrix-programming language. Philo. Trans. Roy. Soc. London 360, 1245–1266 (2002)
Douc, R., Guillin, A., Marin, J.-M., Robert, C.P.: Convergence of adaptive mixtures of importance sampling schemes. Ann. Stat. 35(1), 420–448 (2007a)
Douc, R., Guillin, A., Marin, J.-M., Robert, C.P.: Minimum variance importance sampling via population Monte Carlo. ESAIM: Probab. Stat. 11, 427–447 (2007b)
Doucet, A., de Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Frühwirth-Schnatter, S.: Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. Am. Stat. Assoc. 96(453), 194–209 (2001)
Frühwirth-Schnatter, S.: Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econometrics J. 7(1), 143–167 (2004)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Gelfand, A.E., Smith, A.F.M.: Sampling based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990)
Gelman, A., Gilks, W.R., Roberts, G.O.: Efficient Metropolis jumping rules. In: Berger, J.O., Bernardo, J.M., Dawid, A.P., Lindley, D.V., Smith, A.F.M. (eds.) Bayesian Statistics 5, pp. 599–608. Oxford University Press, Oxford (1996)
Geweke, J.: Using simulation methods for Bayesian econometric models: Inference, development, and communication (with discussion and rejoinder). Economet. Rev. 18, 1–126 (1999)
Geweke, J.: Interpretation and inference in mixture models: Simple MCMC works. Comput. Stat. Data Anal. 51(7), 3529–3550 (2007)
Gilks, W.R., Berzuini, C.: Following a moving target–Monte Carlo inference for dynamic Bayesian models. J. Roy. Stat. Soc. B. 63(1), 127–146 (2001)
Gilks, W.R., Thomas, A., Spiegelhalter, D.J.: A language and program for complex Bayesian modelling. The Statistician 43, 169–178 (1994)
Gilks, W.R., Roberts, G.O., Sahu, S.K.: Adaptive Markov chain Monte Carlo. J. Am. Stat. Assoc. 93, 1045–1054 (1998)
Gordon, N., Salmond, J., Smith, A.F.M.: A novel approach to non-linear/non-Gaussian Bayesian state estimation. IEEE Proceedings on Radar and Signal Processing 140, 107–113 (1993)
Green, P.J.: Reversible jump MCMC computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)
Haario, H., Saksman, E., Tamminen, J.: Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Stat. 14(3), 375–395 (1999)
Haario, H., Saksman, E., Tamminen, J.: An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242 (2001)
Hesterberg, T.: Weighted average importance sampling and defensive mixture distributions. Technometrics 37, 185–194 (1995)
Iba, Y.: Population-based Monte Carlo algorithms. Trans. Jpn. Soc. Artif. Intell. 16(2), 279–286 (2000)
Jasra, A., Holmes, C.C., Stephens, D.A.: Markov Chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50–67 (2005)
Jeffreys, H.: Theory of Probability. Oxford Classic Texts in the Physical Sciences. (3rd edn.), Oxford University Press, Oxford (1961)
Lee, K., Marin, J.-M., Mengersen, K.L., Robert, C.P.: Bayesian inference on mixtures of distributions. In: Narasimha Sastry, N.S., Delampady, M., Rajeev, B. (eds.) Perspectives in Mathematical Sciences I: Probability and Statistics, pp. 165–202. World Scientific, Singapore (2009)
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)
Liu, J.S., Wong, W.H., Kong, A.: Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and sampling schemes. Biometrika 81, 27–40 (1994)
Marin, J.-M., Robert, C.P.: Bayesian Core. Springer, New York (2007a).
Marin, J.-M., Robert, C.P.: Importance sampling methods for Bayesian discrimination between embedded models. In: Chen, M.-H., Dey, D.K., Müller, P., Sun, D., Ye, K. (eds.) Frontiers of Statistical Decision Making and Bayesian Analysis. Springer, New York (2007b); To appear, see arXiv:0910.2325.
Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100(26), 15324–15328 (2003)
McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall, New York (1989)
Meng, X.L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sinica. 6, 831–860 (1996)
Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44, 335–341 (1949)
Neal, R.M.: Slice sampling (with discussion). Ann. Stat. 31, 705–767 (2003)
Nobile, A.: A hybrid Markov chain for the Bayesian analysis of the multinomial probit model. Stat. Comput. 8, 229–242 (1998)
Pole, A., West, M., Harrison, P.J.: Applied Bayesian Forecasting and Time Series Analysis. Chapman-Hall, New York (1994)
Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Roy. Stat. Soc. B. 59, 731–792 (1997)
Robert, C.P.: The Bayesian Choice. paperback edn, Springer, New York (2007)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. (2nd edn.), Springer, New York (2004)
Robert, C.P., Casella, G.: Introducing Monte Carlo Methods with R. Springer, New York (2009)
Robert, C.P., Casella, G.: A history of Markov chain Monte Carlo-subjective recollections from incomplete data. In: Brooks, S., Gelman, A., Meng, X.L., Jones, G. (eds.) Handbook of Markov Chain Monte Carlo: Methods and Applications. Chapman and Hall, New York (2010); arXiv0808.2902.
Robert, C.P., Marin, J.-M.: On resolving the Savage–Dickey paradox. Technical Report arxiv.org:0910.1452, CEREMADE, Université Paris Dauphine (2009)
Robert, C.P., Wraith, D.: Computational methods for Bayesian model choice. In: Paul, M.G., Chun-Yong, C. (eds.) MaxEnt 2009 proceedings, vol. 1193, AIP (2009)
Roberts, G.O., Rosenthal, J.S.: Examples of adaptive MCMC. J. Comp. Graph. Stat. 18, 349–367 (2009)
Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Am. Stat. Assoc. 85, 617–624 (1992)
Rosenthal, J.S.: Amcm: An R interface for adaptive MCMC. Comput. Stat. Data Anal. 51, 5467–5470 (2007)
Shephard, N., Pitt, M.K.: Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653–668 (1997)
Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1(4), 833–860 (2006)
Spiegelhalter, D.J., Thomas, A., Best, N.G.: WinBUGS Version 1.2 User Manual. Cambridge (1999)
Stavropoulos, P., Titterington, D.M.: Improved particle filters and smoothing. In: Doucet, A., deFreitas, N., Gordon, N. (eds.) Sequential MCMC in Practice. Springer, New York (2001)
Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–550 (1987)
Verdinelli, I., Wasserman, L.: Computing Bayes factors using a generalization of the Savage–Dickey density ratio. J. Am. Stat. Assoc. 90, 614–618 (1995)
Wraith, D., Kilbinger, M., Benabed, K., Cappé, O., Cardoso, J.-F., Fort, G., Prunet, S., Robert, C.P.: Estimation of cosmological parameters using adaptive importance sampling. Phys. Rev. D. 80, 023507 (2009)
Acknowledgements
This work had been partly supported by the Agence Nationale de la Recherche (ANR, 212, rue de Bercy 75,012 Paris) through the 2009–2012 projects Big’MC and EMILE.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Robert, C.P. (2012). Bayesian Computational Methods. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21551-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-21551-3_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21550-6
Online ISBN: 978-3-642-21551-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)