Efficient Bayesian inference for COMPoisson regression models
 2.9k Downloads
Abstract
COMPoisson regression is an increasingly popular model for count data. Its main advantage is that it permits to model separately the mean and the variance of the counts, thus allowing the same covariate to affect in different ways the average level and the variability of the response variable. A key limiting factor to the use of the COMPoisson distribution is the calculation of the normalisation constant: its accurate evaluation can be timeconsuming and is not always feasible. We circumvent this problem, in the context of estimating a Bayesian COMPoisson regression, by resorting to the exchange algorithm, an MCMC method applicable to situations where the sampling model (likelihood) can only be computed up to a normalisation constant. The algorithm requires to draw from the sampling model, which in the case of the COMPoisson distribution can be done efficiently using rejection sampling. We illustrate the method and the benefits of using a Bayesian COMPoisson regression model, through a simulation and two realworld data sets with different levels of dispersion.
Keywords
Bayesian statistics Conway–Maxwell–Poisson regression Count data Exchange algorithm Markov chain Monte Carlo Rejection sampling1 Introduction
Observational and epidemiological studies often give rise to count data, representing the number of occurrences of an event within some region in space or period of time, e.g., number of goals in a football match, number of emergency hospital admissions during a night shift, etc. A standard approach to modelling count data is Poisson regression: the counts are assumed to be independent Poisson random variables, with means determined, through a link function (usually the \(\log \)), by a linear regression on available covariates. The Poisson model entails that the mean and variance are equal (equidispersion). However, count data frequently exhibit underdispersion or, especially, overdispersion (these are often just symptoms of model misspecification, e.g. omission of important covariates, presence of outliers, lack of independence, inadequate link function). In the presence of substantial overdispersion, a commonly used alternative to the Poisson regression model is the negative binomial regression model, which allows the variance to be larger than the mean.
This paper is concerned with an even more flexible model for count data, the COMPoisson regression model (Sellers and Shmueli 2010; Guikema and Coffelt 2008), which allows the mean and the variance of count data to be modelled separately. The model is flexible enough to handle underdispersion, something that neither of the previous models can do. The COMPoisson model has become more popular in recent years, with SAS/ETS (SAS Institute Inc 2014) software containing a frequentist implementation. The main factor which inhibits the more widespread use of COMPoisson regression is that the normalisation constant of the COMPoisson distribution does not have a closed form. We take advantage of an MCMC algorithm, known as the exchange algorithm (Møller et al. 2006; Murray et al. 2006), to estimate a Bayesian COMPoisson regression model without computing any normalisation constant. The resulting improvements in computational speed and efficiency make the COMPoisson regression model a viable and attractive alternative to the usual count data models.
The paper is organised as follows. In Sect. 2 we review the COMPoisson distribution and regression model; show the drawbacks of its current implementation in R (R Core Team 2015) and SAS/ETS (SAS Institute Inc 2014) and then show how one can efficiently sample from the COMPoisson distribution using rejection sampling. In Sect. 3 we show how to overcome the problem of an intractable likelihood in a Bayesian setting, using a data augmentation technique which requires sampling from the COMPoisson distribution, and present an exact MCMC algorithm for the COMPoisson regression model. We have focused on the Bayesian implementation of the COMPoisson regression model which allows us to use prior information on the distribution of the regression coefficients. One can use different methods to estimate the normalisation constant (Geyer 1991) and apply the frequentist version of the regression model.
In Sect. 4 we apply the Poisson, negative binomial, and COMPoisson regression models to one artificial and two real world data sets. The results indicate the inability of the first two models to correctly estimate the effect of the covariate on the response variable and show that the COMPoisson regression model provides a better fit to the data. Finally, in Sect. 5 we compare the proposed MCMC algorithm with the one in Chanialidis et al. (2014) and show that the newly proposed MCMC samples from the correct posterior distribution and is computationally more efficient.
2 COMPoisson distribution
2.1 COMPoisson regression
3 Bayesian methods for COMPoisson regression models
The normalisation constant \(Z(\mu , \nu )\) in the COMPoisson distribution is not available in closed form, hence evaluating the likelihood can be computationally expensive. This makes it difficult to sample from the posterior distribution of the parameters in a COMPoisson regression model. One possible solution is to use an asymptotic approximation of \(Z(\mu , \nu )\) (Minka et al. 2003), which is known to be reasonably accurate when \(\mu > 10\). Alternatively one could compute \(Z(\mu , \nu )\) by truncating its series at the kth term, but in order to achieve reasonable accuracy k may need to be very large. Evaluation of \(Z(\mu , \nu )\) could be avoided altogether using approximate Bayesian computation (ABC) methods. However, the resulting algorithms may not sample from the distribution of interest and are usually much less efficient than standard MCMC algorithms which assume that the normalisation constants are known or cheap to compute. We overcome the problem of having an intractable likelihood by means of an MCMC algorithm that takes advantage of the exchange algorithm and the sampling technique of Sect. 3.1. This algorithm is almost as efficient as one assuming the normalisation constants are readily available.
3.1 Rejection sampling from the COMPoisson distribution
This section sets out a simple, yet efficient method for sampling from the COMPoisson distribution without having to evaluate its normalisation constant. This method will be a key part of the exchange algorithm proposed in Sect. 3.4.
Suppose we want to generate a random variable Y from the COMPoisson distribution with probability mass function \(p(y\varvec{\theta })=\frac{q_{\varvec{\theta }}(y)}{Z(\varvec{\theta })}\) where \(\theta =(\mu ,\nu )\), \(q_{\varvec{\theta }}(y)=\left( \frac{\mu ^{y}}{y!}\right) ^{\nu }\) and \(Z(\varvec{\theta })=\sum _y q_{\varvec{\theta }}(y)\). Denote by m the mode of the COMPoisson distribution, i.e., \(m={\lfloor \mu \rfloor }\) and denote by \(s=\lceil \sqrt{\mu }/\sqrt{\nu }\rceil \) its approximate standard deviation.
We construct an upper bound for the COMPoisson distribution based on a piecewise geometric distribution. We start by defining three cutoff points, \(ms\), m, \(m+s\). For the sake of simplicity, we assume that \(ms\ge 0\); otherwise, we can simply omit the part of the upper bound falling to the left of 0.
We can sample from \(g_{\varvec{\theta }}(y)\) using a simple twostage sampling procedure. First decide which part of the piecewise geometric distribution to sample from, according to probabilities proportional to the terms in (6). Then sample from the selected truncated geometric distribution using the inverse c.d.f. method, which is very efficient as the inverse c.d.f. is known in closed form.
The instrumental distribution in (4) is based on the same principle as the upper bounds used in the retrospective sampling algorithm proposed by Chanialidis et al. (2014). In contrast to the arbitrarily precise upper bound required for the retrospective algorithm, the bounds set out above are based on a tradeoff between achieving a high acceptance rate while at the same time keeping the instrumental distribution simple so that sampling from it is computationally efficient. Figure 2 shows the rejection rate of the above sampling algorithm as a function of the two parameters \(\mu \) and \(\nu \). For most values of \(\mu \) and \(\nu \), the rejection rate is less than 30% and one can draw \(10^6\) realisations from the COMPoisson distribution in one second on a modern desktop computer (Intel Core i5).
Using a tighter rejection envelope (say by using more than four geometric pieces) yields a small reduction in the rejection rate, but overall the computational cost increases.
3.2 Exchange algorithm
Møller et al. (2006) presented a MetropolisHastings algorithm for cases where the likelihood function involves an intractable normalisation constant that is a function of the parameters. The idea behind this algorithm is to enlarge the state of the Markov chain to include, beside the parameter \(\varvec{\theta }\), an auxiliary variable \(\varvec{y}^*\) defined on the same sample space as the data \(\varvec{y} = (y_1, \ldots , y_n)\). Suitable choice of the proposal distribution ensures that the MetropolisHastings acceptance ratio is free of normalisation constants. Murray et al. (2006) proposed a modification, known as the exchange algorithm, which still generates at each step \(\varvec{y}^*\), but only updates the parameter \(\varvec{\theta }\) if the move is accepted. To describe this algorithm, let us suppose that the sampling model \(p(y\varvec{\theta })\) can be written as \(p(y\varvec{\theta }) = \frac{q_{\varvec{\theta }}(y)}{Z(\varvec{\theta })} \) where \(q_{\varvec{\theta }}(y)\) is the unnormalised probability density and the normalisation constant \(Z(\varvec{\theta }) = \sum _y q_{\varvec{\theta }}(y)\) or \(Z(\varvec{\theta }) = \int q_{\varvec{\theta }}(y) \mathrm {d}y\) is unknown. This can easily be extended to the case where the \(y_i\) are not i.i.d. (i.e., instead of \(p(y\varvec{\theta })\) and \(q_{\varvec{\theta }}(y)\) we will have \(p_i(y\varvec{\theta })\) and \(q_{i,\varvec{\theta }}(y)\) since the sampling model and its unnormalised probability density will be different for each observation).
Recently, Lyne et al. (2015) provided the first practical and asymptotically correct MCMC method for doubly intractable distributions that does not require exact sampling. This was done by constructing unbiased estimates of the reciprocal normalisation constant \(1/Z(\theta )\) using unbiased estimates of \(Z(\theta )\) obtained by importance sampling. The pseudomarginal approach by Andrieu and Roberts (2009) is then adapted to use these estimates to form an MCMC algorithm. Finally, Wei and Murray (2016) construct unbiased estimates of reciprocal normalisation constants by applying Russian roulette truncations to a Markov chain rather than an importance sampler. However, given that we can draw exact samples from the COMPoisson distribution at very little computational cost there is no need to resort to these methods.
We discuss next the prior distributions for the regression coefficients \(\varvec{\beta }\) and \(\varvec{\delta }\) in the COMPoisson regression model in (3).
3.3 Choice of prior for the regression coefficients
where the first column represents a model with a lasso prior, while the second column represents a model with a spike and slab prior. The first model uses a conditional (on the variance) Laplace prior for the regression coefficients \(\varvec{\delta }\) and takes advantage of the representation of the Laplace as a scale mixture of normals with an exponential mixing density (Park and Casella 2008). The maximum a posteriori (MAP) solution, under the aforementioned Laplace prior, is identical to the estimate for the standard (nonBayesian) lasso procedure. The idea behind the second model is that the prior of every regression coefficient is a mixture of a point mass at zero and a diffuse uniform distribution elsewhere. This form of prior is known as a spike and slab prior (Mitchell and Beauchamp 1988). The parameter \(\omega \) controls how likely each of the binary variables \({\phi }_j\) is to equal 1. Since it controls the size of the models, it can be seen as a complexity parameter.
3.4 MCMC sampling
 A.First kind:
 1.We draw \(\varvec{\beta }^*\sim h(\cdot \varvec{\beta })\) where the proposal h() is a multivariate Gaussian centred at \(\varvec{\beta }\). Specifically, where for the unnormalised COMPoisson densities in (9) we have,$$\begin{aligned} q_{\varvec{\theta _i}}(y_i)&=\left( \frac{\mu _i^{y_i}}{y_i!}\right) ^{\nu _i},&q_{\varvec{\theta _i^*}}(y_i)&=\left( \frac{(\mu _i^*)^{y_i}}{y_i!}\right) ^{\nu _i^*},\nonumber \\ q_{\varvec{\theta _i}}(y_i^*)&=\left( \frac{\mu _i^{y_i^*}}{y_i^*!}\right) ^{\nu _i},&q_{\varvec{\theta _i^*}}(y_i^*)&=\left( \frac{(\mu _i^*)^{y_i^*}}{y_i^*!}\right) ^{\nu _i^*}. \end{aligned}$$(11)
 2.We now draw \(\varvec{\delta }^*\sim h(\cdot \varvec{\delta })\) where the proposal h() is a multivariate Gaussian centred at \(\varvec{\delta }\). Specifically, where the unnormalised COMPoisson densities can be evaluated as in (11).
 1.
 B.Second kind: For \(j=1, \ldots , p\):
 We draw \(\beta _j^*\sim h(\cdot \beta _j)\) and \(\delta _j^*\sim h(\cdot \delta _j)\) where the proposal distribution h() is a univariate Gaussian centred at \(\beta _j, \delta _j\) respectively and for \(l\ne j\) copy \(\beta ^*_{l}=\beta _{l}\) and \(\delta ^*_{l}=\delta _{l}\). Specifically, where the unnormalised COMPoisson densities can be evaluated as in (11).

In order to assess the computational efficiency of the proposed MCMC sampler, we have compared the effective sample size (ESS) per second of the proposed method to the one of a vanilla MCMC sampler for Poisson regression. We have simulated Poissondistributed data (i.e., \(\nu =1\)), for which the latter sampler has used the closedform expression of the normalisation constant. The ESS per second in the latter case is only 10 times higher than the one for our proposed MCMC sampler, i.e., in order to get the same effective sample size the proposed method takes about 10 times as long. This factor of about 10 can be broken down into a factor of about 2 caused by the slower mixing of the exchange algorithm and a factor of about 5 caused by the higher computational cost of evaluating the acceptance ratio.
4 Simulation and case studies
4.1 Simulation
An overdispersed regression model is then obtained by omitting \(x_{i4}\) from the model specification, which corresponds to \(x_{i4}\) not being directly observable. Because \(x_{i3}\) is related to the dispersion of \(x_{i4}\), the degree of overdispersion of \(Y_i\) depends on \(x_{i3}\) as well. In this case, the third covariate has a positive effect on the mean of the response variable (i.e., the value of the regression coefficient is positive) and a negative effect on its variance since higher values of \(x_{i3}\) will result in smaller dispersion for the covariate \(x_{i4}\). Thus, the dispersion of the response variable will also be smaller. Figure 3 shows the relationship between the response variables and the two covariates \(x_{i3}\) and \(x_{i4}\).
We simulate \(n=1000\) observations, which have empirical mean and variance of 1.36 and 2.37, respectively. The 95 and 68% credible intervals for the coefficients for the Poisson, negative binomial, and COMPoisson regression model can be seen in Figs. 4 and 5. Figure 4 shows the credible intervals for the regression coefficients of \(\mu \) for all the models.
The results for the Poisson and negative binomial models both lead to the conclusion that the third covariate has a negative effect on the mean of the response variable. This happens due to the covariate having a negative effect on the variance of the response variable. On the other hand, the COMPoisson regression model correctly identifies all regression coefficients for the mean of the response variable. The credible intervals for the regression coefficients of \(\nu \) for the COMPoisson model can be seen in Fig. 5. The only posterior credible interval that does not include zero is the one for the third covariate (the one for the intercept is also wholly positive, although the lower end is very close to 0).
Number of times, out of 100 different replications of the model in (14), that the 95% credible interval for the coefficient of the third covariate is wholly negative, includes 0, or is wholly positive
Negative  Includes 0  Positive  

Poisson  6  88  6 
Negative binomial  1  94  5 
COMPoisson  0  20  80 
4.2 Publications by Ph.D. students
Description of variables
Variable  Description 

Gender of student  Equals 1 if the student is female; else 0 
Married at Ph.D.  Equals 1 if the student was married by the year of the Ph.D.; else 0 
Children under 6 years old  Number of children less than 6 years old at the year of the students Ph.D. 
Ph.D. prestige  Prestige of the Ph.D. program in biochemistry based on studies. Unranked institutions were assigned a score of 0.75, while ranked institutions had scores ranging from 1 to 5 
Mentor  Number of articles produced by Ph.D. mentor during the last 3 years 
The study found, amongst other things, that females and Ph.D. students having children publish fewer (on average) papers during their Ph.D. studies. In addition, having a mentor with a large number of publications in the last three years has a positive effect on the number of publications of the Ph.D. student. We will focus on the students with at least one publication (640 individuals) with empirical mean and variance of 1.42 and 3.54, respectively, a sign of overdispersion. Note that after focusing on the students with at least one publication, we subtract 1 from each student’s number of publications (e.g. the 246 students that had 1 publication in the original dataset are represented with a 0 in the final dataset). Removing the students with no publications (275 students out of the 915 students in the original dataset) allows us to fit a simple parametric model on the subset instead of a more complex alternative on the original dataset (e.g. zeroinflated model, hurdle model, nonparametric model). Thus, we only compare the Poisson, negative binomial, and the COMPoisson regression models.
Specifically, these models conclude that female Ph.D. students publish less on average than male Ph.D. students and that a mentor who has published a lot of articles has a positive effect on the number of articles of the Ph.D. student. On the other hand for the COMPoisson models, the previous two covariates seem to not have an effect on the mean of the number of articles published by a Ph.D. student. It must be noted that there are four male Ph.D. students with a large number of articles published (11, 11, 15, 18) that could be considered as outliers. If these four students are taken out of the data set, the gender covariate does not have a significant effect for the Poisson and negative binomial models. In addition, the empirical means of the male and female Ph.D. students are 1.5 and 1.2, respectively, while the empirical median is 1 for both genders. Thus the COMPoisson regression model seems to be doing a better job at not concluding that there is an effect of the gender covariate.
Figure 7 shows the 95 and 68% credible intervals for the regression coefficients of \(\nu \) for the COMPoisson regression models. This figure shows that there seems to be a positive effect of the “mentor” covariate on the variance of the articles of the Ph.D. student. The more articles a mentor publishes (during the last 3 years) the larger the variance for the number of articles published by a Ph.D. student. This seems to be reinforced further when we look at the empirical variance of students having mentors with an above average number of articles published versus students having mentors with less than average number of articles published. The empirical variance for the former group is 5.8, with the latter group having a variance of 2.1, respectively (ratio of around 2.8). The corresponding empirical means are 1.9 and 1.2 (ratio of around 1.6). In Poissondistributed data, one would expect the ratios to be roughly equal.
4.3 Fertility data
Description of variables
Variable  Description 

Nationality  Equals 1 if the woman is German; else 0 
General education  Measured as years of schooling 
Postsecondary education (vocational training)  Equals 1 if the woman had vocational training; else 0 
Postsecondary education (university)  Equals 1 if the woman had a university degree; else 0 
Religion  The woman’s religious denomination (Catholic, Protestant, Muslim) with other or none as the baseline group 
Area of residence  Equals 1 if its a rural area; else 0 
Age  Age of the woman at the time of the survey 
Age at marriage  Age of the woman at the time of marriage 
Deviance information criterion for all models and all data sets with the minimum DIC in bold
Ph.D. data  Fertility data  

Poisson  2251.09  4214.55 
Negative binomial  2108.05  – 
COMPoisson  2056.77  4121.92 
COMPoisson (lasso)  2058.05  4121.43 
COMPoisson (spike and slab)  2062.23  4121.74 
5 Comparing MCMC algorithms for COMPoisson regression models
 1.
Draw \(U \sim \text {Unif}(0,1)\) and set the number of refinements \(n=0\).
 2.Compute \(\check{\pi }_n\) and \(\hat{\pi }_n\) and compare them to U.

If \(U\le \check{\pi }_n,\) accept the candidate value.

If \(U>\hat{\pi }_n,\) reject the candidate value.

If \(\check{\pi }_n< U < \hat{\pi }_n,\) refine the bounds, i.e increase n and return to step 2.

Effective sample size per minute for the regression coefficients of \(\mu \)
\(\beta _1\)  \(\beta _2\)  \(\beta _3\)  \(\beta _4\)  \(\beta _5\)  \(\beta _6\)  

“Exchange” MCMC  65.51  91.01  99.21  154.03  157.42  186.54 
“Bounds” MCMC  6.32  8.40  10.04  15.08  14.22  15.94 
Effective sample size per minute for the regression coefficients of \(\nu \)
\(\delta _1\)  \(\delta _2\)  \(\delta _3\)  \(\delta _4\)  \(\delta _5\)  \(\delta _6\)  

“Exchange” MCMC  58.87  91.86  74.67  168.63  161.44  166.85 
“Bounds” MCMC  5.80  8.74  8.07  16.85  15.55  16.68 
We will now compare the algorithm presented in Chanialidis et al. (2014) with the MCMC algorithm presented in this paper, using the publications data discussed in Sect. 4.2. Both MCMC algorithms include the two kinds of moves presented in Sect. 3.4, have a burnin period of 20,000 iterations and a posterior sample size of 60,000.
Figures 10 and 11 show the 95 and 68% credible intervals for the regression coefficients of \(\mu \) and \(\nu \) for both MCMC algorithms. It can be seen that the “exchange” MCMC gives similar results as the “bounds” MCMC.
Traceplots for the regression coefficients of \(\mu \) can be seen in Figs. 12, 13, while the traceplots for the regression coefficients of \(\nu \) can be seen in Figs. 14, 15. Both MCMC algorithms seem to mix well.
The main difference between the two algorithms is the computation time. For the “exchange” MCMC algorithm the computation time was 14 min, while the “bounds” MCMC algorithm needed 238 min for the same number of iterations, seventeen times longer. A similar difference on the computation time is seen on the fertility data set.
Figure 16 shows the scatterplot of the parameters \(\mu _i\), \(\nu _i\) for \(i=1, \ldots , 640\). The parameters \(\mu _i\), \(\nu _i\) were obtained using the posterior sample of the “exchange” algorithm and substituting the posterior mean of each regression coefficient \(\beta _j, \delta _j\) for \(j=1, \ldots , 6\) in Eq. (3).
R (R Core Team 2015) was used for all the computations in this paper. Traceplots, density plots, autocorrelation plots (for every regression coefficient) and results for the Gelman and Rubin diagnostic, Gelman and Rubin (1992), were employed to assess convergence of the MCMC samplers to the posterior distribution, using the coda package (Plummer et al. 2006). The plots for the credible intervals and the traceplots of the regression coefficients were made using the mcmcplots package (Curtis 2015).
The code for both MCMC algorithms (“exchange” and “bounds”) is now available on Github.^{1}
6 Conclusions
In this paper, we presented a computationally more efficient MCMC algorithm for COMPoisson regression compared to the alternative in Chanialidis et al. (2014). We showed how rejection sampling, combined with the exchange algorithm, can be used to overcome the problem of an intractable likelihood in the COMPoisson distribution. Finally, this allowed us to use a Bayesian COMPoisson regression model and show its benefits, compared to the most common regression models for count data, through a simulation and two realworld data sets.
Footnotes
References
 Andrieu, C., Roberts, G.O.: The pseudomarginal approach for efficient monte carlo computations. Ann. Stat. 37, 697–725 (2009)Google Scholar
 Chanialidis, C., Evers, L., Neocleous, T., Nobile, A.: Retrospective sampling in MCMC with an application to COMpoisson regression. Stat 3(1), 273–290 (2014). doi: 10.1002/sta4.61 CrossRefGoogle Scholar
 Conway, R.W., Maxwell, W.L.: A queuing model with state dependent service rate. J. Ind. Eng. 12, 132–136 (1962)Google Scholar
 Curtis, S.M.: mcmcplots: Create plots from MCMC output. http://CRAN.Rproject.org/package=mcmcplots, r package version 0.4.2 (2015)
 Dunn, J.: compoisson: Conway–Maxwell–Poisson Distribution. https://CRAN.Rproject.org/package=compoisson, r package version 0.3 (2012)
 Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992)CrossRefGoogle Scholar
 Geyer, C.J.: Markov chain Monte Carlo maximum likelihood. In: Proceedings of Computing Science and Statistics: The 23rd Symposium on the Interface, Interface Foundation of North America, pp. 156–161 (1991)Google Scholar
 Gilks, W.R., Wild, P.: Adaptive rejection sampling for Gibbs sampling. Appl. Stat. 42, 337–348 (1992)Google Scholar
 Guikema, S.D., Coffelt, J.P.: A flexible count data regression model for risk analysis. Risk Anal. 28, 213–223 (2008). doi: 10.1111/j.15396924.2008.01014.x CrossRefGoogle Scholar
 Long, J.S.: The origins of sex differences science. Soc. Forces 68(4), 1297–1315 (1990)CrossRefGoogle Scholar
 Lyne, A.M., Girolami, M., Atchadé, Y., Strathmann, H., Simpson, D.: On Russian roulette estimates for Bayesian inference with doublyintractable likelihoods. Stat. Sci. 30(4), 443–467 (2015). doi: 10.1214/15STS523 MathSciNetCrossRefGoogle Scholar
 Minka, T.P., Shmueli, G., Kadane, J.B., Borle, S., Boatwright, P.: Computing with the COMpoisson distribution. Tech. rep, CMU Statistics Department (2003)Google Scholar
 Mitchell, T.J., Beauchamp, J.J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83(404), 1023–1032 (1988)MathSciNetCrossRefMATHGoogle Scholar
 Møller, J., Pettitt, A.N., Reeves, R., Berthelsen, K.K.: An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93(2), 451–458 (2006). doi: 10.1093/biomet/93.2.451 MathSciNetCrossRefMATHGoogle Scholar
 Murray, I., Ghahramani, Z., MacKay, D.J.C.: MCMC for doublyintractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI06), AUAI Press, pp. 359–366 (2006)Google Scholar
 Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008). doi: 10.1198/016214508000000337 MathSciNetCrossRefMATHGoogle Scholar
 Plummer, M., Best, N., Cowles, K., Vines, K.: Coda: convergence diagnostics and output analysis for MCMC. R News 6(1), 7–11 (2006) http://CRAN.Rproject.org/doc/Rnews/
 R Core Team: R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria (2015) http://www.Rproject.org/
 Sellers, K., Lotze, T.: COMPoissonReg: Conway–Maxwell–Poisson (COMPoisson) regression. (2015) https://CRAN.Rproject.org/package=COMPoissonReg, r package version 0.3.5
 Sellers, K.F., Shmueli, G.: A flexible regression model for count data. Ann. Appl. Stat. 4(2), 943–961 (2010). doi: 10.1214/09aoas306 MathSciNetCrossRefMATHGoogle Scholar
 Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., Boatwright, P.: A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J. R. Stat. Soc.: Ser. C 54(1), 127–142 (2005). doi: 10.1111/j.14679876.2005.00474.x MathSciNetCrossRefMATHGoogle Scholar
 Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc.: Ser. B 64(4), 583–639 (2002). doi: 10.1111/14679868.00353 MathSciNetCrossRefMATHGoogle Scholar
 Wei, C., Murray, I.: Markov chain truncation for doublyintractable inference. (2016) ArXiv eprints ArXiv:1610.05672
 Winkelmann, R.: Duration dependence and dispersion in countdata models. J. Bus. Econ. Stat. 13(4), 467–474 (1995)MathSciNetGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.