Abstract
Many environmental applications, such as species abundance studies, rainfall monitoring or tornado count reports, yield data with a preponderance of zero counts. Although standard statistical distributions may not fit these data, a large body of literature has been dedicated to methods for modeling zero-inflated data. One type of regression model for zero-inflated data is categorized as a mixture model. Mixture models postulate two types of zeros, represented using a latent variable, and model their probabilities separately. The latent classification of zeros may be of particular interest as it can provide important clues to physical characteristics associated with, for example, habitat suitability or resistance to disease or pest infestations. Different zero-inflated models can be developed depending on the biological and physical characteristics of the application at hand. Here, several zero-inflated spatial models are applied to a case study of spruce weevil (Pissodesstrobi) infestations in a Sitka spruce tree plantation. The data illustrate the unique features distinguished by various models and show the importance of using expert knowledge to inform model structures that in turn provide insight into underlying biological processes driving the probability of belonging to the zero, resistant, component. For instance, one model focuses on individually resistant trees located among infested trees. Another focuses on clusters of resistant trees which are likely located in unsuitable habitats. We apply six models: a standard generalized linear model (GLM); an overdispersion model; a random effects zero-inflated model; a conditional autoregressive random effects model (CAR); a multivariate CAR (MCAR) model; and a model developed using discrete random effects to accommodate spatial outliers. We discuss the distinct features identified by the zero-inflated spatial models and make recommendations regarding their application in general.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, D.K., Gelfand, A.E., Citron-Pousty, S.: Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 9, 341–355 (2002)
Ainsworth, L.M., Dean, C.B.: Zero-inflated spatial models: web supplement. http://www.stat.sfu.ca/~dean/students/ainsworth.html#nav (2007)
Ainsworth, L.M., Dean, C.B.: Detection of local and global outliers in mapping studies. Environmetrics 19, 21–37 (2008)
Alfo, M., Maruotti, A.: Two-part regression models for longitudinal zero-inflated count data. Can. J. Stat. 38 (2), 197–216 (2010)
Besag, J., York, J., Mollié, A.: Bayesian image restoration with two applications in spatial statistics. Ann. Inst. Stat. Math. 43 (1), 1–59 (1991)
Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)
Chen, J., Knalili, A.: Order selection in finite mixture models. with a nonsmooth penalty, J. Am. Stat. Assoc. 103, 1674–1683 (2008)
Consul, P.C., Jain, G.C.: A generalization of the Poisson distribution. Technometrics 15 (4), 791–799 (1973)
Diao, L., Cook, R., Lee, K.: A copula model for marked point processes. Lifetime Data Anal. 19, 463–489 (2013)
Dobbie, M.J., Welsh, A.H.: Modelling correlated zero-inflated count data. Aust. N. Z. J. Stat. 43, 431–444 (2001)
Eberly, L.E., Carlin, B.P.: Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Stat. Med. 19, 2279–2294 (2000)
Feng, C.X., Dean, C.B.: Joint analysis of multivariate spatial count and zero-heavy count outcomes using common spatial factor models. Environmetrics 23 (6), 493–508 (2012)
Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515–533 (2006)
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York (2007)
Gelman, A., Rubin, D.: Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. 7, 457–511 (1992)
Gelman, A., Meng, X., Stern, H.: Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Stat. Sin. 6, 733–807 (1996)
Gilks, W.R., Wild, P.: Adaptive rejection sampling for Gibbs sampling. Appl. Stat. 41 (2), 337–348 (1992)
Hall, D.B.: Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)
Hall, D.B., Zhang, Z.: Marginal models for zero-inflated clustered data. Stat. Model. 4, 161–180 (2004)
Hardin, J.W., Hilbe, J.M., Hible, J.: Generalized Linear Models and Extensions, 2nd edn. Stata Press, Texas (2007)
Hasan, M.T., Sneddon, G.: Zero-inflated Poisson regression for longitudinal data. Commun. Stat. Simul. Comput. 38 (3), 638–653 (2009)
Hasan, T., Sneddon, G., Ma, R.: Pattern-mixture zero-inflated mixed models for longitudinal unbalanced count data with excessive zeros. Biom. J. 51 (6), 946–960 (2009)
Hatfield, L., Boye, M., Hackshaw, M., Carlin, B.: Multilevel Bayesian models for survival times and longitudinal patient-reported outcomes with many zeros. J. Am. Stat. Assoc. 107 (499), 875–885 (2012)
He, F., Alfaro, R.: White pine Weevil (Coleoptera: Curculionidae) attack on white spruce: spatial and temporal patterns. Environ. Entomol. 26 (4), 888–895 (1997)
Heilbron, D.: Zero-altered and other regression models for count data with added zeros. Biom. J. 36 (5), 531–547 (1994)
Jin, X., Carlin, B.P., Banerjee, S.: Generalized hierarchical multivariate CAR models for areal data. Biometrics 61, 950–961 (2005). doi:10.1111/j.1541–0420.2005.00359.x
Johnson, V.: A Bayesian χ 2 test for goodness of fit. Ann. Stat. 32 (6), 2361–2384 (2004)
Kuhnert, P.M., Martin, T.G., Mengersen, K., Possingham, H.P.: Assessing the impacts of grazing levels on bird density in woodland habitat: a Bayesian approach using expert opinion. Environmetrics 16, 717–747 (2005)
Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14 (1992)
Lawless, J.F.: Negative binomial and mixed Poisson regression. Commun. Stat. 15, 209–225 (1987)
Lawson, A.B., Clark, A.: Spatial mixture relative risk models applied to disease mapping. Stat. Med. 21 (3), 359–370 (2002)
Liang, K., Zeger, S.: Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986)
Lunn, D., Jackson, C., Best, N., Thomas, A., Speigelhalter, D.: The Bugs Book - A Practical Introduction to Bayesian Analysis. CRC Press, Chapman and Hall, Boca Raton (2012)
Martin, T.G., Wintle, B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., Low-Choy, S.J., Tyre, A.J., Possingham, H.P.: Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol. Lett. 8, 1235–1246 (2005)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, New York (1989)
Mullahy, J.: Specification and testing of some modified count data models. J. Econ. 33, 341–365 (1986)
Nathoo, F.: Joint spatial modeling of recurrent infection and growth with processes under intermittent observation. Biometrics 66, 336–346 (2010). doi:10.1111/j.1541-0420.2009.01305.x
Nathoo, F., Dean, C.B.: A mixed mover-stayer model for spatiotemporal two-state processes. Biometrics 63, 881–891 (2007). doi:10.1111/j.1541-0420.2007.00752.x
Rathbun, S.L., Fei, S.: A spatial zero-inflated Poisson regression model for oak regeneration. Environ. Ecol. Stat. 13, 409–426 (2006)
Ridout, M., Demetrio, C.G.B., Hinde, J.: Models for count data with many zeros. In: International Biometric Conference, Cape Town (1998)
Rodrigues-Motta, M., Pinheiro, H.P., Martins, E.G., Araujo, M.S., dos Reis, S.F.: Multivariate models for correlated count data. J. Appl. Stat. 40 (7), 1586–1596 (2013)
Spiegelhalter, D., Thomas, A., Best, N., Lunn, D.: WinBUGS User Manual Version 1.4. Medical Research Council Biostatistics Unit, Cambridge (2003)
Stroup, W.W.: Generalized linear mixed models, modern concepts, methods and applications. CRC Press, Taylor & Francis Group, New York (2013)
Tzala, E., Best, N.: Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat. Methods Med. Res. 17, 97–118 (2008)
Velarde, L.G.C., Migon, H.S., Pereira, B.B.: Spate-time modeling of rainfall data. Environmetrics 15, 561–576 (2004)
Ver Hoef, J.M., Jansen, J.K.: Space-time zero-inflated count models of Harbour Seals. Environmetrics 18 (7), 697–712 (2007)
Wang, K., Yau, K.K.W., Lee, A.H.: A zero-inflated Poisson mixed model to analyze diagnosis related groups with majority of same-day hospital stays. Comput. Methods Programs Biomed. 68, 195–203 (2002)
Wikle, C.K., Anderson, C.J.: Climatological analysis of tornado report counts using a hierarchical Bayesian spatio-temporal model. J. Geophys. Res. Atmos. 108, 9005 (2003). doi:10.1029/2002JD002806
Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Appl. Stat. 31 (2), 144–148 (1982)
Acknowledgements
We would like to thank the Natural Sciences and Engineering Research Council of Canada for research funding. We thank Erin Lundy and Alisha Albert-Green for their assistance with the literature review for this paper. We would also like to thank all those who have provided valuable feedback on this work: Giovani da Silva for his review and helpful comments, the ISS-2015 Symposium audience for their thoughtful questions, and the ISS reviewer and Proceedings editor for their useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ainsworth, L.M., Dean, C.B., Joy, R. (2016). Zero-Inflated Spatial Models: Application and Interpretation. In: Sutradhar, B. (eds) Advances and Challenges in Parametric and Semi-parametric Analysis for Correlated Data. Lecture Notes in Statistics(), vol 218. Springer, Cham. https://doi.org/10.1007/978-3-319-31260-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-31260-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31258-3
Online ISBN: 978-3-319-31260-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)