Abstract
Bayesian models have become a mainstay in the tool set for marketing research in academia and industry practice. In this chapter, I discuss the advantages the Bayesian approach offers to researchers in marketing, the essential building blocks of a Bayesian model, Bayesian model comparison, and useful algorithmic approaches to fully Bayesian estimation. I show how to achieve feasible Bayesian inference to support marketing decisions under uncertainty using the Gibbs sampler, the Metropolis Hastings algorithm, and point to more recent developments – specifically the no-U-turn implementation of Hamiltonian Monte Carlo sampling available in Stan. The emphasis is on the development of an appreciation of Bayesian inference techniques supported by references to implementations in the open source software R, and not on the discussion of individual models. The goal is to encourage researchers to formulate new, more complete, and useful prior structures that can be updated with data for better marketing decision support.
Keywords
This is a preview of subscription content, log in via an institution.
References
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679. http://www.jstor.org/stable/2290350
Allenby, G. M., Arora, N., & Ginter, J. L. (1995). Incorporating prior knowledge into the analysis of conjoint studies. Journal of Marketing Research, 32(2), 152–162. http://www.jstor.org/stable/3152044
Allenby, G. M., Arora, N., & Ginter, J. L. (1998). On the heterogeneity of demand. Journal of Marketing Research, 35(3), 384–389. http://www.jstor.org/stable/3152035
Amemiya, T. (1985). Advanced econometrics. Cambridge, MA: Harvard University Press.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173.
Bernardo, J. M., & Smith, A. F. M. (2001). Bayesian theory. Measurement Science and Technology, 12(2), 221. http://stacks.iop.org/0957-0233/12/i=2/a=702.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2), 192–236. http://www.jstor.org/stable/2984812
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, Articles, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01. https://www.jstatsoft.org/v076/i01.
Chen, M.-H., Shao, Q.-M., & Ibrahim, J. G. (2000). Monte Carlo methods in Bayesian computation. New York: Springer. http://gateway.library.qut.edu.au/login?url=http://link.springer.com/openurl?genre=book&isbn=978-1-4612-1276-8.
Chib, S., & Carlin, B. P. (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and Computing, 9(1), 17–26. https://doi.org/10.1023/A:1008853808677.
Eddelbuettel, D. (2013). Seamless R and C+ + integration with Repp. New York: Springer.
Eddelbuettel, D., & François, R. (2011). Repp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 1–18. https://doi.org/10.18637/jss.v040.i08. http://www.jstatsoft.org/v40/i08/
Edwards, Y. D., & Allenby, G. M. (2003). Multivariate analysis of multiple response data. Journal of Marketing Research, 40(3), 321–334. https://doi.org/10.1509/jmkr.40.3.321.19233.
Fasiolo, M. (2016). An introduction to mvnfast. R package version 0.1.6. https://CRAN.R-project.org/package=mvnfast
Frühwirth-Schnatter, S., Tüchler, R., & Otter, T. (2004). Bayesian analysis of the heterogeneity model. Journal of Business & Economic Statistics, 22(1), 2–15. https://doi.org/10.1198/073500103288619331.
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2018). mvtnorm: Multivariate normal and t distributions. https://CRAN.R-project.org/package=mvtnorm. R package version 1.0-8.
Geweke, John. (1991). Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints and the evaluation of constraint probabilities. In: E. M. Keramidas (Ed.), Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pp. 571–578.
Gilks, W. R. (1996). Full conditional distributions. In S. (Sylvia) Richardson, D. J Spiegelhalter, & W. R. (Walter R.) Gilks (Eds.), Markov chain Monte Carlo in practice (pp. 75–88). London/Melbourne: Chapman & Hall.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634.
Hoffman, M. D., & Gelman, A. (2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 1593–1623. http://jmlr.org/papers/vl5/hoffmanl4a.html.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572.
Lenk, P. J., & DeSarbo, W. S. (2000). Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika, 65(1), 93–119. https://doi.org/10.1007/BF02294188.
Lenk, P. J., DeSarbo, W. S., Green, P. E., & Young, M. R. (1996). Hierarchical Bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Marketing Science, 15(2), 173–191. https://doi.org/10.1287/mksc.15.2.173.
Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage Publications. https://uk.sagepub.com/en-gb/eur/regression-models-for-categorical-and-limited-dependent-variables/book6071.
McCulloch, R., & Rossi, P. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics, 64(1–2), 207–240. https://EconPapers.repec.org/RePEc:eee:econom:v:64:y:1994:i:1-2:p:207-240.
Mersmann, O., Trautmann, H., Steuer, D., & Bornkamp, B. (2018). truncnorm: Truncated normal distribution. https://CRAN.R-project.org/package=truncnorm. R package version 1.0-8
Montgomery, A. L., & Bradlow, E. T. (1999). Why analyst overconfidence about the functional form of demand models can lead to overpricing. Marketing Science, 18(4), 569–583. http://www.jstor.org/stable/193243
Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L. Jones, & X-L. Meng (Eds.), Handbook of Markov chain Monte Carlo (Chap. 5). Chapman & Hall/CRC. http://arxiv.org/abs/1206.1901
Orme, B. (2017). The CBC system for choice-based conjoint analysis. Technical Report. https://sawtoothsoftware.com/download/techpap/cbctech.pdf
Otter, T., Tüchler, R., & Frühwirth-Schnatter, S. (2004). Capturing consumer heterogeneity in metric conjoint analysis using Bayesian mixture models. International Journal of Research in Marketing, 21(3), 285–297. https://doi.org/10.1016/j.ijresmar.2003.11.002. http://www.sciencedirect.com/science/article/pii/S0167811604000308
Otter, T., Gilbride, T. J., & Allenby, G. M. (2011). Testing models of strategic behavior characterized by conditional likelihoods. Marketing Science, 30(4), 686–701. http://www.jstor.org/stable/23012019
Otter, T., Pachali, M. J., Mayer, S., & Landwehr, J. R. (2018). Causal inference using mediation analysis or instrumental variables – Full mediation in the absence of conditional independence. Marketing ZFP, 40(2), 41–57. https://doi.org/10.15358/0344-1369-2018-2-41.
Pachali, M. J., Kurz, P., & Otter, T. (2018). How to generalize from a hierarchical model? Technical Report. https://ssrn.com/abstract=3018670
Pearl, J. (2009). Causality: Models, reasoning and inference (2nd ed.). New York: Cambridge University Press.
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). Coda: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. https://journal.r-project.org/archive/.
Ritter, C., & Tanner, M. A. (1992). Facilitating the Gibbs sampler: The Gibbs stopper and the Griddy-Gibbs sampler. Journal of the American Statistical Association, 87(419), 861–868. https://doi.org/10.1080/01621459.1992.10475289.
Robert, C. P. (1994). The Bayesian choice: a decision-theoretic motivation. New York: Springer.
Roberts, G. O. (1996). Markov chain concepts related to sampling algorithms. In S. (Sylvia) Richardson, D. J. Spiegelhalter, & W. R. (Walter R.) Gilks (Eds.), Markov chain Monte Carlo in practice (pp. 45–58). London/Melbourne: Chapman & Hall.
Rossi, P. E., McCulloch, R. E., & Allenby, G. M. (1996). The value of purchase history data in target marketing. Marketing Science, 15(4), 321–340. https://doi.org/10.1287/mksc.l5.4.321.
Rossi, P. E., Allenby, G. M., & McCulloch, R. E. (2005). Bayesian statistics and marketing. Chichester: Wiley.
Wachtel, S., & Otter, T. (2013). Successive sample selection and its relevance for management decisions. Marketing Science, 32(1), 170–185. https://doi.org/10.1287/mksc.1120.0754.
Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley.
Acknowledgments
I would like to thank Anocha Aribarg, Albert Bemmaor, Joachim Büschken, Arash Laghaie, anonymous reviewers, the editors, and participants in my class on “Bayesian Modeling for Marketing” helpful comments and feedback. All remaining errors are obviously mine.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
MCMC for Binomial Probit Without Data Augmentation
Simulate data, call MCMC routine, plot MCMC-traces. This R-script sources a RW-MH-sampler for the binomial probit model (see teh following script), simulates probit data, and runs the code with different step-sizes (standard deviations of ϵ).
# may need to install these packages first library ( bayesm ) library ( latex2exp ) # needs to be in R's working directory source (' rbprobitRWMetropolis .r') # function to simulate from binary probit simbprobit = function (X, beta ) { y= ifelse ((X%*% beta + rnorm ( nrow (X))) <0 ,0 ,1) list (X=X,y=y, beta = beta ) } nobs =500 # number of simulated observations X= cbind ( rep (1, nobs ), runif ( nobs ), runif ( nobs )) beta =c( -3 ,2 ,4) # data generating parameters nvar = ncol (X) simout = simbprobit (X, beta ) # probit responses y= simout $y R =200000 # length of MCMC sample # data list to passed to MCMC routine Data = list (X= simout $X,y= simout $y) Mcmc = list (R=R, keep =1) # prior mean set to zero, prior variances set to 100 Prior = list ( betabar = double ( nvar ),A= diag ( rep (.01, nvar ))) out _1= rbprobitRWMetropolis ( Data =Data, Mcmc =Mcmc, Prior =Prior, stepsize =.001) out _2= rbprobitRWMetropolis ( Data =Data, Mcmc =Mcmc, Prior =Prior, stepsize =.005) out _3= rbprobitRWMetropolis ( Data =Data, Mcmc =Mcmc, Prior =Prior, stepsize =.8) out _4= rbprobitRWMetropolis ( Data =Data, Mcmc =Mcmc, Prior =Prior, stepsize =3) windows () par ( mfrow =c (2,2)) matplot ( out _1$ betadraw, type ='l',xlab = “, ylab = “, main = TeX ('$\cr epsilon $-standard⌴deviation⌴=⌴.001 ')); grid () matplot ( out _2$ betadraw, type ='l',xlab = “, ylab = “, main = TeX ('$\cr epsilon $-standard⌴deviation⌴=⌴.005 ')); grid () matplot ( out _3$ betadraw, type ='l',xlab = “, ylab = “, main = TeX ('$\cr epsilon $-standard⌴deviation⌴=⌴.8 ')); grid () matplot ( out _4$ betadraw, type ='l',xlab = “, ylab = “, main = TeX ('$\cr epsilon $-standard⌴deviation⌴=⌴3')); grid ()
MCMC function. The following function implements a simple RW-MH-sampler for the binomial probit model coupled with a multivariate normal prior. All regression parameters are updated simultaneously in one MH-step.
rbprobitRWMetropolis <- function (Data, Prior, Mcmc, stepsize ) { require ( bayesm ) # because of the use of lndMnv to evaluate the log - density of a ... # ... multivariate normal distribution y = Data $y nvar = ncol (X) nobs = length (y) betabar = Prior $ betabar A = Prior $A R = Mcmc $R keep = Mcmc $ keep betadraw = matrix ( double ( floor (R/ keep ) * nvar ), ncol = nvar ) loglike = double ( floor (R/ keep )) beta = c( rep (0, nvar )) priorcov = chol2inv ( chol (A)) rootp = chol ( priorcov ) rootpi = backsolve (rootp, diag ( nvar )) # intialize log - likelihood at starting value oldloglike = sum ( pnorm (0, (X%*% beta )[ as. logical (y)], 1, log .p= TRUE ))+ sum ( pnorm (0, (-X%*% beta )[!as. logical (y)], 1, log .p= TRUE )) # compute non - normalized log - posterior at starting value oldlpost = oldloglike + lndMvn (beta, betabar, rootpi) naccept = 0 for ( rep in 1:R) { betac = beta + rnorm ( nvar )* stepsize # random walk proposal # compute probit log - likelihood at proposed value cloglike = sum ( pnorm (0, -(X%*% betac )[ as. logical (y)], 1, log .p= TRUE ))+ sum ( pnorm (0, (X%*% betac )[!as. logical (y)], 1, log .p= TRUE )) # compute non - normalized log - posterior at proposed value clpost = cloglike + lndMvn (betac, betabar, rootpi ) # compute log - ratio of non - normalized posterior at proposed ... # ... and old value ldiff = clpost - oldlpost alpha = min (1, exp ( ldiff )) # acceptance probability if ( alpha < 1) { unif = runif (1) } else { unif = 0 } if ( unif <= alpha ) { beta = betac oldloglike = cloglike oldlpost = clpost naccept = naccept + 1 } if ( rep %% keep == 0) { mkeep = rep / keep betadraw [mkeep, ] = beta loglike [ mkeep ] = oldloglike } } # betadraw is the matrix containing draws from the posterior # rateaccept is the relative frequency of accpeting proposed moves ... # ... from oldbeta to betac # loglike is the log - likelihood ... # ... evaluated at the current MCMC state ( beta ) return ( list ( betadraw = betadraw, mkeep =mkeep, rateaccept = naccept /R, loglike = loglike )) }
Stan probit definition file. This file that is called as StanProbit.stan by the R-script immediately below defines a binomial probit model with a multivariate normal prior for Stan. According to the model, the data are independently Bernoulli distributed with probabilities implied by the probit-link, parameters, and covariates.
data { int N; // number of observations int K; // number of covariates int < lower =0, upper =1> y[N]; // information matrix [N,K] X; // design matrix } parameters { vector [K] beta ; // beta coefficients } model { vector [N] mu; beta ~ normal (0, 100); mu = X* beta ; for (n in 1:N) mu[n] = Phi (mu[n ]); y ~ bernoulli (mu ); }
Calling Stan from R to estimate a binomial probit model. This R-script calls Stan to sample from the posterior of the binomial probit model coupled with a multivariate normal prior defined in the file above.
# may need to install the rstan package first require ( rstan ) # load the rstan package # see sripts above for nobs, nvar, simout objects prob _ data = list (N=nobs ,K=nvar ,X= simout $X,y=as. vector ( simout $y)) rstan _ options ( auto _ write = TRUE ) options (mc. cores = parallel :: detectCores ()) stanfit _ probit = stan ( file =" StanProbit . stan ",data = prob _ data, pars = c(" beta "), chains = 1, iter = 600000, warmup = 1000) # Make draws available for posterior analysis in R out _ StanProbit = extract ( stanfit _ probit )
HB -Logit Example
This code generates MNL-data from a hierarchical model, estimates an HB-logit model, and compares selected individual level posteriors to the corresponding maximum likelihood estimates.
genXy <- function (betai ,p,T){ ## generate multinomial logit choices # alternative specific constants # ... this assumes p=3 ( two inside brands, one outside choice ) X= kronecker ( rep (1,T), matrix (c(1 ,0 ,0 ,0 ,1 ,0) , ncol =( length ( betai ) -1))) # add the continuous covariate X= cbind (X, runif (T*p)) index = seq (p,p*T,p) X[index ,]=0 # outside good Xbeta =t( matrix (X%*%betai , nrow =p)) index = cbind (1:T, max . col ( Xbeta )) maxl = Xbeta [ index ] logsumel = log ( rowSums ( exp (Xbeta - maxl ))) + maxl logprob = matrix (Xbeta - logsumel , nrow =T) y= double (T) for (t in 1:T){ y[t]= sum ( cumsum ( exp ( logprob [t ,])) < runif (1))+1 ## draw from the CDF of probs } return ( list (y=y,X=X)) } p=3 # number of alterantives in each choice set T=5 # number of repeated measurements, i.e., choice sets or choices # generate panel data for MCMC analysis N =2000 # number of individuals in the panel # population mean preference betap =c(.3 , -2 , -1) # variance - covariance of preferences in the population Vbeta = matrix (c(3 , -2.99 ,0 , -2.99 ,3 ,0 ,0 ,0 ,.1) , ncol =3) # just for demonstration to make sure we all get ... # ... the same result date and results set . seed (66) # draw individual specific preferences from MVNormal distribution betai = betap +t( chol ( Vbeta ))%*% matrix ( rnorm (N* length ( betap )), ncol =N) lgtdata <- vector (" list ", N) T=5 # number of choices per individual betaMLE = betai betaMLE [ , ]=0 for (i in 1:N){ outgen = genXy ( betai [,i],p,T) # For Bayesian analysis using rhierMnlRwMixture ... # ... you need to organize your data in list format as ... # ... in the command line below # y :: vector of choice outcomes of length T or ... # ... T_i in case different panel units provide different numbers of choices # X :: A (p*T) rows x length ( beta [,i]) columns model matrix ; # the first ( second ) p rows correspond to the first ( second ) choice set, and so on. # Each alternative is represented by one row in X. # The numbers in y point to which 'row ' was chosen from a particular choice set lgtdata [[i ]]= list (y= outgen [[1]], X= outgen [[2]]) out = optim ( par = betai [,i], fn=llMNL, gr=NULL, y= outgen [[1]], X= outgen [[2]], p=p, hessian = FALSE, control = list ( fnscale = -1)) betaMLE [,i]= out $ par # collect MLE estimates } # load the bayesm package into the workspace # (if this gives you an error, ... # ... you need to install the package first ) library ( bayesm ) # run the Bayesian hierarchical model outMCMC = rhierMnlRwMixture ( Data = list (p=p, lgtdata = lgtdata ), Prior = list ( ncomp =1), Mcmc = list (R =100000, keep =10)) # posterior of individual specific coefficients betaimc = outMCMC $ betadraw index =1001:10000 # may need to install this first library ( latex2exp ) M=c (3 ,99 ,2000) # plot betai posterior for consumers in M jpeg ( filename =" ILposteriors880 . jpg ", quality = 100 , width = 880 , height = 480) # windows () par ( mfcol =c( length ( betap ), length (M)* 2)) for (i in M){ plot ( density ( betaimc [i ,1, index ]), xlab = TeX ('$\cr beta _{A}$'), ylab = "⌴", main = paste ("panel - unit⌴", i)) abline (v= betai [1,i], col ='green ', lwd =5, lty =1 ) abline (v= betaMLE [1,i], col ='red ', lwd =5, lty =2 ) plot ( density ( betaimc [i ,2, index ]), xlab = TeX ('$\cr beta _{B}$'), ylab = "⌴", main ="⌴") abline (v= betai [2,i], col ='green ', lwd =5, lty =1 ) abline (v= betaMLE [2,i], col ='red ', lwd =5, lty =2 ) plot ( density ( betaimc [i ,3, index ]), xlab = TeX ('$\cr beta $'), ylab = "⌴", main ="⌴") abline (v= betai [3,i], col ='green ', lwd =5, lty =1 ) abline (v= betaMLE [3,i], col ='red ', lwd =5, lty =2 ) plot ( betaimc [i ,1, index ], type ='l', xlab ="⌴", ylab = TeX ('$\cr beta _{A}$'), main = paste (" MLE : ⌴", round ( betaMLE [1,i ]))) abline (h= betai [1,i], col ='green ', lwd =5, lty =1 ) abline (h= betaMLE [1,i], col ='red ', lwd =5, lty =2 ) plot ( betaimc [i ,2, index ], type ='l', xlab ="⌴", ylab = TeX ('$\cr beta _{B}$'), main = paste (" MLE : ⌴", round ( betaMLE [2,i ]))) abline (h= betai [2,i], col ='green ', lwd =5, lty =1 ) abline (h= betaMLE [2,i], col ='red ', lwd =5, lty =2 ) plot ( betaimc [i ,3, index ], type ='l', xlab ="⌴", ylab = TeX ('$\cr beta $'), main = paste (" MLE : ⌴", round ( betaMLE [3,i ]))) abline (h= betai [3,i], col ='green ', lwd =5, lty =1 ) abline (h= betaMLE [3,i], col ='red ', lwd =5, lty =2 ) }
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this entry
Cite this entry
Otter, T. (2019). Bayesian Models. In: Homburg, C., Klarmann, M., Vomberg, A. (eds) Handbook of Market Research. Springer, Cham. https://doi.org/10.1007/978-3-319-05542-8_24-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-05542-8_24-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05542-8
Online ISBN: 978-3-319-05542-8
eBook Packages: Springer Reference Business and ManagementReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences