Joint modelling of two count variables when one of them can be degenerate
- 188 Downloads
Abstract
We formulate a joint statistical model for two variables: one of them is either a count variable or just zero, and the other is a regular count variable. We consider a modelling framework based on switching between a bivariate Poisson regression model and a univariate one, where the switching depends on the observable outcome of the third, dichotomous variable. The ZIP–CP bivariate model (proposed quite recently) and the standard univariate Poisson regression model are used as basic elements of the switching (or mixture) model. Bayesian analysis is advocated; in two special cases of our Bayesian statistical model, consequences for inference are discussed. The empirical part is devoted to joint modelling of the numbers of cash payments and bank card payments in Poland, with the use of data for both cardholders and non-cardholders. Our Bayesian statistical test enables to examine whether it is appropriate to analyse each of two subsamples separately in order to infer on basic parameters. In the case of our data it is so, therefore inference on individual parameters is not affected by the sample selection error. However, inference on the correlation coefficient between two count variables is possible only within the proposed trivariate model.
Keywords
Bivariate Poisson regression model Zero inflated Poisson model Bayesian inference Count data models Bank card payments Cash payments1 Introduction
Modelling univariate count data by means of Poisson type regression models is nowadays a routine approach, and some competing model specifications have been proposed for bivariate count data as well (see, e.g. Lambert 1992; Kocherlakota and Kocherlakota 1992; Cameron and Trivedi 1998, 2005; Berkhout and Plug 2004; Famoye and Singh 2006; Lee et al. 2009; Winkelman 2008; Famoye 2010; Tsou 2016). It is worth mentioning that many empirical studies apply negative binomial regression because of the nature of count data. Multivariate count data occur in a wide range of applications, including accident analysis, sports statistics, economics, and many others (see e.g. Brijs et al. 2004, 2006; McHale and Scarf 2007; Ma et al. 2008; Baioa and Blangiardo 2010; Bermúdez and Karlis 2011; Shahtahmassebi and Moyeed 2016). Bayesian inference has sometimes been used in these applications. There are several approaches to the construction of the model for bivariate count data. The bivariate Poisson distribution based on the trivariate reduction method is restricted due to only positive correlation between two count variables (see Kocherlakota and Kocherlakota 1992; Brijs et al. 2006; Famoye 2010). Another approach is to model the joint distribution using copulas (see Van Ophem 1999; McHale and Scarf 2007); these models allow for more flexible specification of the dependence structure. Also, relatively flexible dependence structures appear in the models built upon the idea of the mixture of independent Poisson distributions (Aitchison and Ho 1989) or the conditional probabilities (Berkhout and Plug 2004). The models for multivariate count data are discussed in greater detail in, e.g. Cameron and Trivedi (1998) and Winkelman (2008).
In this paper we look at bivariate Poisson type regressions in the case when some observations follow a degenerate (i.e. univariate) distribution. Since a bivariate model for two non-degenerate count variables is the central part of our specification, we focus on some particular structure, following a simple and flexible modelling path, started by Berkhout and Plug (2004), which seems easy to generalize to dimensions higher than 2. However, our goal is to consider the possibility of degenerate bivariate observations, and not to propose one more new specification for related count variables.
In joint modelling of count variables we may face the situation when one of them is necessarily zero for many observed units. For example, if we analyze determinants of (and relation between) the numbers indicating how many times during a month people used public transport and how many times they used their own cars, then for any person without a car the number of using it is necessarily zero. It becomes crucial to examine the opportunities and consequences of inference (on determinants of using public transport and on the relationship between using public transport or private cars) on the basis of full data set in comparison to inference based on the data for car owners only. Using the latter means sample selection, which makes any generalization unjustified. In order to use all observations on two variables of interest, we propose a statistical model with switching between two specifications for count variables: a bivariate model and a univariate model. Switching is based on the third, zero–one variable (car ownership in the example above). Such approach enables to formulate testable hypotheses, e.g. that the mechanism generating values of the count variable which is always observed (never degenerate) is exactly the same in two groups of observed units.
The main part of the switching model, introduced in this paper, is a bivariate model of count variables representing the case where both variables are non-degenerate. We use the so-called ZIP–CP (zero inflated Poisson–conditional Poisson) specification, proposed by Marzec and Osiewalski (2012); it is a bivariate Poisson type regression, more general than P–CP (Poisson–conditional Poisson) model, introduced by Berkhout and Plug (2004). In the P–CP model, one of the variables is marginally Poisson and the other is conditionally Poisson. This model can be easily estimated and it allows correlation of any sign, but the sign of correlation between the two count variables depends on the sign of one parameter only and is independent of the explanatory variables of the model. In the ZIP–CP model of bivariate Poisson type regression, the marginal ZIP type distribution is used for the first variable (instead of the marginal Poisson distribution), which leads to the covariance sign dependent on the explanatory variables. The characteristics of the ZIP–CP model follow from the properties of the bivariate discrete ZIP–CP distribution, introduced and examined by Osiewalski (2012). The second part of the switching model proposed in this paper amounts to a univariate Poisson regression for the second variable—in case when the first variable is degenerate (with its full probability mass concentrated at zero). As it has been already mentioned, the third part is a dichotomous specification that describes switching between the bivariate (non-degenerate) and univariate (degenerate) case.
In the next section we present the probabilistic foundations of our switching model, i.e. the discrete distributions used to build the three parts of this model—in particular, the ZIP–CP distribution. Section 3 is devoted to our statistical model, the form of the likelihood function and the Bayesian analysis. Section 4 contains an empirical illustration, showing new results of joint modelling of the numbers of bank card and cash payments. In contrast to previous studies—see Polasik et al. (2012a) and Marzec and Osiewalski (2012)—we use all available data, for cardholders and non-cardholders. Our empirical example that refers to the research on non-cash transactions in Poland—see Polasik and Maciejewski (2009), Polasik (2015), Polasik et al. (2012b) and Goczek and Witkowski (2015, 2016)—serves as an illustration of modelling and inference problems with count variables, where one of them (the number of card payments) is degenerate for many observations (individuals without cards). In Sect. 5 concluding remarks are stated.
In the literature dealing with multivariate regression models we can find a variety of approaches to deal with dependent variables analyzed simultaneously of which at least one is partially observed. The practice of omitting data, also referred to as data selection, has been often used. For example, exploiting a dataset of over 11,000 payments, Bounie and Francois (2006) estimated the determinants of the probability of a transaction being paid by cash, check or bank card at the point of sale; a multinomial logit model was applied in their study. They excluded persons who did not hold bank cards or check accounts. The next simplification amounts to using variables which ignore the nature of the data. Kalckreuth et al. (2014) employed probit models for dependent variable. Although these practices seem useful, because they simplify the statistical modelling problem, they do not reflect the complexity of consumer behavior. From the viewpoint of statistical inference, such simplified approaches may be inappropriate, as we show in this paper. Stavins (2016) stated that “estimating consumers’ decisions to adopt and use payment instruments as independent events can lead to sample-selection problems”.
The aim of this paper is to present a new approach to payment behavior analysis by a tailor-made model, which is designed to cover all available observation units, e.g. consumers with or without cards. The advantage of our specification is that the proposed trivariate model can capture salient features of the original data set. This modeling approach is free from the sample selection error and it allows to test for possible misspecification due to sample selection.
2 Probabilistic foundations of the new statistical model
We consider the joint distribution of three random variables (Y_{1}, Y_{2}, Y_{3}), where the third one is a zero–one variable, the second variable can take any non-negative integer value, and the first one is concentrated at zero when Y_{3} = 0 (Pr{Y_{1} = 0|Y_{3} = 0} = 1), and can take any non-negative integer value if Y_{3} = 1. Thus, when Y_{3} = 0, the conditional distribution of (Y_{1}, Y_{2}) is the same as the distribution of (0, Y_{2}) and corresponds to the univariate distribution of Y_{2}. Only when Y_{3} = 1 the distribution of the pair (Y_{1}, Y_{2}) is a bivariate distribution over the set of all pairs of non-negative integers; now we focus on its two special cases: P–CP (Berkhout and Plug 2004) and ZIP–CP (Osiewalski 2012). These distributions lead to particularly simple and useful bivariate Poisson type regression models. Other specifications impose restrictions on the correlation between two count variables or are more complicated from the statistical or numerical perspective.
- 1.
are negatively correlated, if \( [(1 - e^{{ - \lambda_{1} }} )e^{\alpha } - (1 - \gamma )]\exp (\lambda_{1} (e^{\alpha } - 1)) < \gamma - e^{{ - \lambda_{1} }} \),
- 2.
are positively correlated, if \( [(1 - e^{{ - \lambda_{1} }} )e^{\alpha } - (1 - \gamma )]\exp (\lambda_{1} (e^{\alpha } - 1)) > \gamma - e^{{ - \lambda_{1} }} \),
- 3.
are uncorrelated, if \( [(1 - e^{{ - \lambda_{1} }} )e^{\alpha } - (1 - \gamma )]\exp (\lambda_{1} (e^{\alpha } - 1)) = \gamma - e^{{ - \lambda_{1} }} \).
When γ = g(0) = exp (− λ_{1}), i.e. if Y_{1} is Poisson (under Y_{3} = 1), the complicated formulas (18) and (20) reduce to the much simpler form (7), where the sign of covariance depends only on the sign of α. In other cases, i.e. when Y_{1} is of ZIP type, the sign of covariance in (20) depends on the values of λ_{1} and α (not only on the sign of the latter constant). Obviously, the value of covariance in the ZIP–CP distribution (and not only its sign) as well as the value of the correlation coefficient (19) depend on all the constants appearing in the ZIP–CP probability function, i.e. on γ, λ_{1}, λ_{2} and α.
Remind that increasing the probability of the zero value of Y_{1} (in comparison to the Poisson distribution with mean and variance λ_{1}), that is assuming the ZIP type distribution with γ > g(0), leads to variance (16) greater than expectation (11). The ZIP–CP distribution class enables inflating variances of both count variables, although they are not symmetrically treated.
3 The Bayesian statistical model
The specification based on (26) is known in the literature as the hurdle model (Cameron and Trivedi 2005, p. 680); Winkelman (2008) compares it to the original ZIP model. The hurdle model form of our ZIP type specification is very simple, thus making estimation and testing quite easy.
Under the separability of the likelihood function, obvious from (31), prior independence among (δ, α, β_{1}, β_{2}), β_{2,0} and β_{3} leads to their posterior independence, which means complete separability of inference on each group of parameters. In this case, using only observations with y_{3t} = 1 for estimating (δ, α, β_{1}, β_{2}) is fully justified as well as using only observations with y_{3t} = 0 for estimating β_{2,0} alone. Obviously, inference on such functions of θ that involve parameters from different groups, e.g. on Corr(Y_{1t}, Y_{2t}|θ)—the unconditional correlation coefficient between the first two elements of the triple (Y_{1t}, Y_{2t}, Y_{3t}), must be based on the joint posterior density of θ, p(θ|y), which uses the full likelihood function and complete data. The joint posterior is needed if one wants to compare the unconditional correlation coefficient Corr(Y_{1t}, Y_{2t}|θ) and the conditional one, Corr(Y_{1t}, Y_{2t}|Y_{3t} = 1, θ) = Corr*(Y_{1t}, Y_{2t}|Y_{3t} = 1, δ, α, β_{1}, β_{2}), derived in the ZIP–CP model using formula (19).
In the case of β_{2} = β_{2,0}, when (given Y_{1t} = 0) Y_{2t} is explained in exactly the same way no matter what Y_{3t} is, L_{1} and L_{2} in the likelihood function cannot be considered separately as both depend on β_{2}. In this case inference has to be based on all data, the full likelihood and the joint posterior. Making inferences with the use of the data with y_{3t} = 1 only would mean sample selection error. Of course, testing β_{2} = β_{2,0} requires the general model, without this restriction.
Complete specification of our Bayesian statistical model [with the sampling distribution (30) that leads to the likelihood function (31)] requires the prior distribution of θ. Obviously, our prior choice is related to the model structure, not to the data that are analysed in the empirical part. We assume prior independence and the standard normal prior N(0, 1) for each parameter. Zero prior expectations mean that the simplest model (with no ZIP effect, no dependence and no explanatory variables) gets the highest prior chance, but unitary standard deviations ensure significant prior chances for specifications being far from the simplest one. It seems that such simple joint prior distribution introduces little initial information and guarantees easy Monte Carlo simulations of the posterior distribution. Obviously, sensitivity of inferences with respect to the form of the prior distribution is an empirical question, to be answered for the data at hand, but it is of greater importance mainly in small data-sets. According to basic Bayesian asymptotic results, under any regular prior, the posterior based on a sufficiently large number of observations can be approximated by an appropriate multivariate normal distribution centred at the maximum likelihood estimate. Thus, in empirical studies based on large data-sets, sensitivity with respect to the prior distribution becomes much less important.
In this study we implement the random-walk Metropolis–Hastings MCMC algorithm to simulate samples from the posterior distribution of θ (Gamerman 1998). This algorithm was started either at zero values of the parameters or at maximum likelihood estimates obtained by estimating each sub-model separately (due to separability of the likelihood function). It turned out that the selection of starting values was not important for convergence. We generated a candidate random variable from a multivariate Student distribution; preliminary runs were used to calibrate its precision matrix. The algorithm involved 1,000,000 cycles, and the acceptance rate was about 10%. Convergence of single chains from the MCMC sampler was confirmed by the graphical procedure proposed by Yu and Mykland (1998).
4 Joint modelling of the numbers of card and cash payments
In order to illustrate the empirical usefulness of the proposed statistical model, we use the data collected for the research that was financed by the National Bank of Poland and described by Polasik et al. (2012a) and Marzec et al. (2013). The data consist of the information whether person t is a cardholder (y_{3t}) as well as the number of his/her cash payments (y_{2t}) and card payments (y_{1t}) within a month. T = 2518 persons were questioned in October or November 2010, or in January 2011. The fraction of cardholders was 47.3%.
Frequency distributions of the numbers of cash payments y_{2t} for cardholders (y_{3t} = 1) and non-cardholders (y_{3t} = 0).
Source: Own calculations
Number of cash payments y_{2t} (interval) | Number of cardholders (t such that y_{3t} = 1) | Frequency | Number of non-cardholders (t such that y_{3t} = 0) | Frequency |
---|---|---|---|---|
0 | 24 | 2% | 0 | 0% |
(0; 5] | 126 | 11% | 60 | 5% |
(5; 10] | 248 | 21% | 275 | 21% |
(10; 15] | 196 | 16% | 224 | 17% |
(15; 20] | 148 | 12% | 208 | 16% |
(20; 25] | 108 | 9% | 151 | 11% |
(25; 30] | 85 | 7% | 123 | 9% |
(30; 35] | 66 | 6% | 73 | 5% |
(35; 40] | 55 | 5% | 57 | 4% |
(40; 45] | 32 | 3% | 55 | 4% |
(45; 50] | 32 | 3% | 26 | 2% |
> 50 | 70 | 6% | 76 | 6% |
Sum | 1190 | 100% | 1328 | 100% |
Mean | 20.5 | – | 22.5 | – |
Median | 16 | – | 18 | – |
The results obtained by Polasik et al. (2012a)—within the P–CP model on the basis of the data for 1190 cardholders—showed very small positive correlation between the numbers of cash and card payments. Marzec and Osiewalski (2012) confirmed this using the ZIP–CP model, indicating at the same time that the P–CP model is not a valid reduction of the more general ZIP–CP case, as both parameters α and δ are significantly different from zero. Note that univariate empirical distributions for cardholders only (i.e. with y_{3t} = 1) require a bivariate model with inflated zeros for both count variables; the ZIP–CP specification meets this requirement, while the P–CP model does not. Moreover, for cardholders, formal Bayesian model comparison led to the conclusion that, in the ZIP–CP, model Y_{1t} must represent the number of card payments and Y_{2t}—cash payments (not vice versa); see Marzec and Osiewalski (2012). The necessity of establishing which count variable is the first one comes from the asymmetric structure of the bivariate model under consideration.
Now we present the results obtained for the full dataset, which includes non-cardholders. Similarly as Marzec and Osiewalski (2012), we have modelled raw data, without weights indicating the degree of representativeness of individual observations; such weights were used by Polasik et al. (2012a) and Marzec et al. (2013). The motivation to use weighted (adjusted) data amounts to adequately represent the population from which the sample has been drawn. Information about demographic characteristics, such as gender, age, marital status, and place of residence, are used to develop the weights. In this paper we model raw data, as we do not focus on representativeness issues.
Typical (average or most frequent) values of explanatory variables.
Source: authors’ elaboration
Explanatory variable | T = 2518 | T_{1} = 1190 | T_{2} = 1328 |
---|---|---|---|
All observations | Cardholders | Non-cardholders | |
Gender (1—man, 0—woman) | 0 | 0 | 0 |
Age (years) | 41.2 | 40.1 | 42.2 |
Marital status (1—married, 0—not married) | 1 | 1 | 0 |
Residence (1—city, 0—countryside) | 1 | 1 | 1 |
Monthly income (thousand PLN) | 2.9 | 3.3 | 2.5 |
Education (years of schooling) | 12.3 | 13.2 | 11.5 |
Access to Internet at home (1—yes, 0—no) | 1 | 1 | 0 |
Fraction (%) of ones in the case of dichotomous explanatory variables.
Source: authors’ elaboration
Explanatory variable | T = 2518 | T_{1} = 1190 | T_{2} = 1328 |
---|---|---|---|
All observations | Cardholders | Non-cardholders | |
Gender (1—man, 0—woman) | 44 | 45 | 42 |
Marital status (1—married, 0—not married) | 56 | 65 | 48 |
Residence (1—city, 0—countryside) | 63 | 71 | 56 |
Access to Internet at home (1—yes, 0—no) | 61 | 76 | 49 |
In our empirical research we have used the statistical model presented in (30), together with the joint prior distribution, proposed in the previous section and assuming independence among all parameters of the trivariate model. Taking advantage of posterior independence, which results from the separability of the likelihood in (31) and prior independence, we have used three independent Metropolis–Hastings chains in order to simulate from the posterior distribution in each part of our model. That is, we have separately estimated (β_{1}β_{2}, α, δ) in the ZIP–CP model (M_{1}), β_{2,0} in the Poisson model for the number of cash payments for non-cardholders (M_{2}) and β_{3} in the logit model M_{3}. The total number of parameters is 34.
Posterior means and standard deviations of the parameters of each part of the trivariate model.
Source: authors’ calculations
Referring to the assumed prior distribution of parameters, we see that our N(0, 1) priors appear relatively vague in this application, because the posterior standard deviations are much lower than the prior standard deviations and almost all posterior means are in the interval (− 2, 2), and most of them in [− 1, 1]. We checked that our results are robust to changes in the prior distribution. On the other hand, in studies (like ours) where the number of observations is large, prior sensitivity becomes much less important.
For cardholders we see (in M_{1}) that all seven explanatory variables that we have used are obviously important to explain the number of cash payments. But only the access to Internet, education and income significantly (and positively) affect the number of card payments. In the pure Poisson model for non-cardholders (M_{2}), not all seven variables are important to explain the number of cash payments—gender, income and age are not. Our results show that a cardholder’s education, being in a marriage and the access to Internet have a negative effect on cash payments. Note, however, that the impact of these three variables on the number of cash payments is positive for non-cardholders. Living in a city will lead to more frequent use of cash as the payment method for both consumer types. Additionally, there is significant positive influence of age only on cash payments in the case of cardholders. In the logit model (M_{3}), five variables (except for gender and age) are the determinants of possessing a bank card. We confirmed that being in a marriage, living in a city, having higher income, staying in education for a longer period and having the access to Internet increase the probability of having a bank card.
Let us stress the differences between posterior distributions of the parameters describing the number of cash payments in M_{1} and M_{2}. In the case of four explanatory variables (marital status, income, education and the access to Internet), the signs of the posterior means are different. As the standard deviations of most of the parameters are small, we suspect that the equality β_{2} = β_{2,0} does not hold.
Posterior means of correlation coefficients between (Y_{1t}, Y_{2t}), averaged over observations.
Source: authors’ calculations
Correlation coefficient | Average posterior mean | |
---|---|---|
Corr(Y_{1t}, Y_{2t}|θ) | 0.072 (for T = 2518) | |
– | Cardholders (Y_{3t} = 1) | Non-cardholders (Y_{3t} = 0) |
Corr(Y_{1t}, Y_{2t}|θ) | 0.065 (for T_{1} = 1190) | 0.079 (for T_{2} = 1328) |
Corr(Y_{1t}, Y_{2t}|Y_{3t} = 1, θ) | 0.073 (for T_{1} = 1190) | – |
5 Concluding remarks
The trivariate discrete distribution and Bayesian statistical model have been proposed in order to jointly model two count variables in the case where one of them can be degenerate. Our statistical model amounts to using a zero–one variable to switch between two separate models for count variables. The first model is bivariate and the second one is only univariate—but from the same class as the conditional part of the bivariate model. While the proposed modelling scheme is quite general, the choice of the sub-models (the building blocks of the trivariate structure) is rather specific and can be changed. Simplicity is the main criterion in choosing the ZIP–CP model (as the bivariate specification for count variables) and the logistic model (for the zero–one switching variable); both lead to a tractable trivariate model. Replacing the logistic part by a different dichotomous specification—e.g. based on a skewed Student t distribution and allowing for interactions of explanatory variables (see Osiewalski and Marzec 2004b)—is not difficult and may improve the data fit. However, replacing the ZIP–CP specification, which is the main part of our trivariate model, would be much more difficult. Using alternative structures for two related count variables is left for future research.
As far as the prior specification is concerned, our particular form of the prior distribution can easily be changed, but two crucial properties should be kept in mind. The separability of the likelihood function can be fully exploited only under prior independence of parameters describing sub-models, so their independence is a natural element of each prior specification. Also remind that particular, standard normal prior distribution (that we have assumed for each individual parameter) is not important if the number of observations is large—like in our empirical example. Obviously, small samples would require sensitivity analysis within a larger class of prior distributions (e.g. Student t with unknown degrees of freedom).
In the proposed Bayesian model one can easily use our Lindley-type test (the Bayesian counterpart of the F or Chi squared tests) in order to verify the fundamental restriction, which makes the parameters describing the non-degenerate count variable identical for both values of the switching zero–one variable. It would be interesting to use formal Bayesian model comparison (through Bayes factors and posterior model probabilities) for testing different specifications that could appear in future research. This would require an efficient estimator of the marginal data density value in each model. It seems that, in the case of the Markov Chain Monte Carlo simulations of the posterior distribution, the corrected arithmetic mean estimator proposed by Pajor (2017) is an appropriate tool.
Our trivariate model is constructed in such a way that separability of the likelihood function is preserved. Thus it is a useful tool to examine the consequences of sample selection caused by deleting all observations with only one non-degenerate count variable (i.e. deleting the whole subsample of non-cardholders). In our empirical example we have shown that inference on individual parameters is not affected by the sample selection error, since the restriction linking parameters of two sub-models is not supported by the data. We have also shown that any deeper inference on correlation between two count variables—that is, inference on the unconditional correlation coefficient as opposed to the conditional one—is possible only within our full trivariate specification.
Let us stress that the proposed trivariate model always enables making inference for all available data, without applying any preliminary tests. Instead, our model itself constitutes a useful testing framework, in particular for testing particular conditions that lead to sample selection errors. This is the main contribution of the paper.
Notes
Acknowledgements
The authors acknowledge support from research funds granted to the Faculty of Management at Cracow University of Economics, within the framework of the subsidy for the maintenance of research potential.
References
- Aitchison J, Ho CH (1989) The multivariate Poisson-log normal distribution. Biometrika 76:643–653MathSciNetCrossRefzbMATHGoogle Scholar
- Anderson TW, Darling DA (1954) A test of goodness of fit. J Am Stat Assoc 49:765–769CrossRefzbMATHGoogle Scholar
- Baioa G, Blangiardo M (2010) Bayesian hierarchical model for the prediction of football results. J Appl Stat 37(2):253–264MathSciNetCrossRefGoogle Scholar
- Berkhout P, Plug E (2004) A bivariate Poisson count data model using conditional probabilities. Stat Neerl 58:349–364MathSciNetCrossRefzbMATHGoogle Scholar
- Bermúdez L, Karlis D (2011) Bayesian multivariate Poisson models for insurance ratemaking. Insur Math Econ 48:226–236MathSciNetCrossRefzbMATHGoogle Scholar
- Bounie D, Francois A (2006) Cash, check or bank card? The effects of transaction characteristics on the use of payment instruments. http://ssrn.com/paper=891791—SSRN eLibrary. Accessed 26 July 2017
- Brijs T, Karlis D, Swinnen G, Vanhoof K, Wets G, Marchanda P (2004) A multivariate Poisson mixture model for marketing applications. Stat Neerl 58:322–348MathSciNetCrossRefzbMATHGoogle Scholar
- Brijs T, Van de Bossche F, Wets G, Karlis D (2006) A model for identifying and ranking dangerous accident locations: a case study in Flanders. Stat Neerl 60:457–476MathSciNetCrossRefzbMATHGoogle Scholar
- Cameron AC, Trivedi PK (1998) Regression analysis of count data. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
- Cameron AC, Trivedi PK (2005) Microeconometrics: methods and application. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
- Famoye F (2010) On the bivariate negative binomial regression model. J Appl Stat 37:969–981MathSciNetCrossRefGoogle Scholar
- Famoye F, Singh KP (2006) Zero-inflated generalized Poisson regression model with an application to domestic violence data. J Data Sci 4:117–130Google Scholar
- Gamerman D (1998) Markov chain Monte Carlo. Stochastic simulation for Bayesian inference. Chapman and Hall, LondonzbMATHGoogle Scholar
- Goczek Ł, Witkowski B (2015) The determinants of cash-free transactions. The National Bank of Poland Working Paper Series no. 146Google Scholar
- Goczek Ł, Witkowski B (2016) Determinants of card payments. Appl Econ 48:1530–1543CrossRefGoogle Scholar
- Kalckreuth U, Schmidt T, Stix H (2014) Choosing and using payment instruments: evidence from German microdata. Empir Econ 46:1019–1055CrossRefGoogle Scholar
- Kocherlakota S, Kocherlakota K (1992) Bivariate discrete distributions. Marcel Dekker, New YorkzbMATHGoogle Scholar
- Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14CrossRefzbMATHGoogle Scholar
- Lee J, Jung BC, Jin SH (2009) Tests for zero inflation in a bivariate zero-inflated Poisson model. Stat Neerl 63:400–417MathSciNetCrossRefGoogle Scholar
- Lindley DV (1965) Introduction to probability and statistics from a Bayesian view point. Part 2: inference. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
- Ma J, Kockelman K, Damien P (2008) A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods. Accid Anal Prev 40:964–975CrossRefGoogle Scholar
- Marzec J, Osiewalski J (2008) Bayesian inference on technology and cost efficiency of bank branches. Bank i Kredyt 39:29–43Google Scholar
- Marzec J, Osiewalski J (2012) Dwuwymiarowy model typu ZIP-CP w łącznej analizie zmiennych licznikowych. Folia Oeconomica Cracoviensia 53:5–20Google Scholar
- Marzec J, Polasik M, Fiszeder P (2013) Wykorzystanie gotówki i karty płatniczej w punktach handlowo-usługowych w Polsce: zastosowanie dwuwymiarowego modelu Poissona. Bank i Kredyt 44:375–402Google Scholar
- McHale I, Scarf P (2007) Modelling soccer matches using bivariate discrete distributions with general dependence structure. Stat Neerl 61:432–445MathSciNetCrossRefzbMATHGoogle Scholar
- Ophem Van H (1999) A general method to estimate correlated discrete random variables. Econ Theory 15:228–237MathSciNetzbMATHGoogle Scholar
- Osiewalski J (2012) Dwuwymiarowy rozkład ZIP-CP i jego momenty w analizie zależności między zmiennymi licznikowymi, [in:] Spotkania z królową nauk (Księga jubileuszowa dedykowana Profesorowi Edwardowi Smadze). Wydawnictwo Uniwersytetu Ekonomicznego w Krakowie, Kraków, pp 147–154Google Scholar
- Osiewalski J, Marzec J (2004a) Uogólnienie dychotomicznego modelu probitowego z wykorzystaniem skośnego rozkładu Studenta. Przegląd Statystyczny 51:13–24Google Scholar
- Osiewalski J, Marzec J (2004b) Model dwumianowy II rzędu i skośny rozkład Studenta w analizie ryzyka kredytowego. Folia Oeconomica Cracoviensia 45:63–83Google Scholar
- Osiewalski J, Steel MF (1993) Una perspectiva bayesiana en selección de modelos, Cuadernos Economicos 55/3, pp 327–351 (A Bayesian perspective on model selection, original English version available at: http://www.cyfronet.krakow.pl/~eeosiewa/pubo.htm)
- Pajor A (2017) Estimating the marginal likelihood using the arithmetic mean identity. Bayesian Anal 12:261–287MathSciNetCrossRefzbMATHGoogle Scholar
- Polasik M (2015) Stan i potencjał rozwoju sieci akceptacji kart płatniczych w Polsce. Acta Universitatis Nicolai Copernici, Ekonomia 46:23–58CrossRefGoogle Scholar
- Polasik M, Maciejewski K (2009) Innowacyjne usługi płatnicze w Polsce i na świecie. Materiały i Studia NBP no. 241, NBP, WarszawaGoogle Scholar
- Polasik M, Marzec J, Fiszeder P, Górka J (2012a) Modelowanie wykorzystania metod płatności detalicznych na rynku polskim. Materiały i Studia NBP no. 265, NBP, WarszawaGoogle Scholar
- Polasik M, Wiśniewski TP, Lightfoot G (2012b) Modelling customers’ intentions to use contactless cards. Int J Bank Acc Finance 4:203–231Google Scholar
- Shahtahmassebi G, Moyeed R (2016) An application of the generalized Poisson difference distribution to the Bayesian modelling of football scores. Stat Neerl 70(3):260–273MathSciNetCrossRefGoogle Scholar
- Stavins J (2016) The effect of demographics on payment behavior: panel data with sample selection. Federal Reserve Bank of Boston Working Paper No. 16-5Google Scholar
- Tsou TS (2016) Robust likelihood inference for multivariate correlated count data. Comput Stat 31:845–857MathSciNetCrossRefzbMATHGoogle Scholar
- Winkelman R (2008) Econometric analysis of count data. Springer, BerlinGoogle Scholar
- Yu B, Mykland P (1998) Looking at Markov samplers through cusum path plots: a simple diagnostic idea. Stat Comput 8:275–286CrossRefGoogle Scholar
- Zellner A (1971) An introduction to Bayesian inference in econometrics. Wiley, New YorkzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.