Skip to main content

Modeling Zero-Inflated Microbiome Data

  • Chapter
  • First Online:
Book cover Statistical Analysis of Microbiome Data with R

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

In this chapter, we introduce and illustrate how to model zero-inflated microbiome data. In Sect. 12.1, we briefly introduce modeling zero-inflated data. The remaining of this chapter is organized as follows: Sect. 12.2 introduce zero-inflated Poisson (ZIP) and negative binomial model (ZINB) and their implementations in real microbiome data. Section 12.3 introduce zero-hurdle Poisson (ZHP) and zero-hurdle negative binomial (ZHNB) and implement them with the same data set. The zero-inflated beta regression model with random-effects (ZIBR) is covered and illustrated in Sect. 12.4. We conclude this chapter by a summary and discussion in Sect. 12.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agresti, A. 2002. Categorical data analysis. Hoboken, New Jersey, Sons, Inc., Publication.

    Google Scholar 

  • Aho, K., D. Derryberry, et al. 2014. Model selection for ecologists: The worldviews of AIC and BIC. Ecology 95 (3): 631–636.

    Article  Google Scholar 

  • Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. 2nd international symposium on information theory, Budapest: Akademiai Kiado.

    Google Scholar 

  • Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6): 716–723.

    Article  MathSciNet  Google Scholar 

  • Atkins, D., and R. Gallop. 2007. Rethinking how family researchers model infrequent outcomes: A tutorial on count regression and zero-inflated models. Journal of Family Psychology 21 (4): 726.

    Article  Google Scholar 

  • Bin, C.Y. 2002. Zero-inflated models for regression analysis of count data: A study of growth and development. Statistics in Medicine 21 (10): 1461–1469.

    Article  Google Scholar 

  • Bohning, D., E. Dietz, et al. 1999. The zero-inflated poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society. Series A (Statistics in Society) 162 (2): 195–209.

    Article  Google Scholar 

  • Brewer, M.J., A. Butler, et al. 2016. The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity. Methods in Ecology and Evolution 7 (6): 679–692.

    Article  Google Scholar 

  • Burnham, K.P., and D.R. Anderson. 2004. Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research 33 (2): 261–304.

    Article  MathSciNet  Google Scholar 

  • Cameron, A.C., and P.K. Trivedi. 2013. Regression analysis of count data. New York: Cambridge University Press.

    Book  Google Scholar 

  • Campbell, M.J., D. Machin, et al. 1991. Coping with extra Poisson variability in the analysis of factors influencing vaginal ring expulsions. Statistics in Medicine 10 (2): 241–254.

    Article  Google Scholar 

  • Chen, E.Z., and H. Li. 2016. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32 (17): 2611–2617.

    Article  Google Scholar 

  • Chipeta, M.G., B.M. Ngwira, et al. 2014. Zero adjusted models with applications to analysing helminths count data. BMC Research Notes 7: 856.

    Article  Google Scholar 

  • Cohen, A.C. 1963. Estimation in mixtures of discrete distributions. Proceedings of the international symposium on discrete distributions, Montreal, Quebec.

    Google Scholar 

  • Cragg, J.G. 1971. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39 (5): 829–844.

    Article  Google Scholar 

  • Desjardins, C.D. 2016. Modeling zero-inflated and overdispersed count data: An empirical study of school suspensions. The Journal of Experimental Education 84 (3): 449–472.

    Article  Google Scholar 

  • Dwivedi, A.K., S.N. Dwivedi, et al. 2010. Statistical models for predicting number of involved nodes in breast cancer patients. Health 2 (7): 641–651.

    Article  Google Scholar 

  • Fettweis, J.M., J.P. Brooks, et al. 2014. Differences in vaginal microbiome in African American women versus women of European ancestry. Microbiology 160 (Pt 10): 2272–2282.

    Article  Google Scholar 

  • Freund, D.A., T.J. Kniesner, et al. 1999. Dealing with the common econometric problems of count data with excess zeros, endogenous treatment effects, and attrition bias. Economics Letters 62 (1): 7–12.

    Article  Google Scholar 

  • Gonzalez, A., A. King, et al. 2012. Characterizing microbial communities through space and time. Current Opinion in Biotechnology 23 (3): 431–436.

    Article  Google Scholar 

  • Graveley, B.R., and A.N. Brooks, et al. 2011. The developmental transcriptome of Drosophila melanogaster. Nature 471.

    Google Scholar 

  • Hall, D.B. 2000. Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics 56 (4): 1030–1039.

    Article  MathSciNet  Google Scholar 

  • Heilbron, D.C. 1994. Zero-altered and other regression models for count data with added zeros. Biometrical Journal 36 (5): 531–547.

    Article  Google Scholar 

  • Hinde, J., and C. Demétrio. 1998. Overdispersion: Models and estimation. Computational Statistics & Data Analysis 27 (2): 151.

    Article  Google Scholar 

  • Hu, M.-C., M. Pavlicova, et al. 2011. Zero-inflated and hurdle models of count data with extra zeros: Examples from an HIV-risk reduction intervention trial. The American Journal of Drug and Alcohol Abuse 37 (5): 367–375.

    Article  Google Scholar 

  • Johnson, N.L., and S. Kotz. 1969. Distributions in statistics: Discrete distributions. Boston, MA: Haughton Mifflin.

    MATH  Google Scholar 

  • Karazsia, B.T., and M.H.M. van Dulmen. 2008. Regression models for count data: Illustrations using longitudinal predictors of childhood injury*. Journal of Pediatric Psychology 33 (10): 1076–1084.

    Article  Google Scholar 

  • Lambert, D. 1992. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34 (1): 1–14.

    Article  Google Scholar 

  • Lee, A.H., M.R. Stevenson, et al. 2002. Modeling young driver motor vehicle crashes: Data with extra zeros. Accident Analysis and Prevention 34 (4): 515–521.

    Article  Google Scholar 

  • Lee, D., R.N. Baldassano, et al. 2015. Comparative effectiveness of nutritional and biological therapy in North American children with active Crohn’s Disease. Inflammatory Bowel Diseases 21 (8): 1786–1793.

    Article  Google Scholar 

  • Lewis, J.D., E.Z. Chen, et al. 2015. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s Disease. Cell Host & Microbe 18 (4): 489–500.

    Article  Google Scholar 

  • Lewsey, J.D., and W.M. Thomson. 2004. The utility of the zero-inflated Poisson and zero-inflated negative binomial models: A case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status. Community Dentistry and Oral Epidemiology 32 (3): 183–189.

    Article  Google Scholar 

  • Long, J.S. 1997. Regression models for categorical and limited dependent variables. Thousand Oaks, CA, USA: Sage Publications.

    MATH  Google Scholar 

  • Ma, B., L.J. Forney, et al. 2012. Vaginal microbiome: Rethinking health and disease. Annual Review of Microbiology 66 (1): 371–389.

    Article  Google Scholar 

  • Martin, T.G., B.A. Wintle, et al. 2005. Zero tolerance ecology: Improving ecological inference by modelling the source of zero observations. Ecology Letters 8 (11): 1235–1246.

    Article  Google Scholar 

  • Min, Y., and A. Agresti. 2005. Random effect models for repeated measures of zero-inflated count data. Statistical Modelling 5 (1): 1–19.

    Article  MathSciNet  Google Scholar 

  • Mullahy, J. 1986. Specification and testing of some modified count data models. Journal of Econometrics 33 (3): 341–365.

    Article  MathSciNet  Google Scholar 

  • Ospina, R., and S.L.P. Ferrari. 2012. A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis 56 (6): 1609–1623.

    Article  MathSciNet  Google Scholar 

  • Peng, X., G. Li, et al. 2015. Zero-inflated beta regression for differential abundance analysis with metagenomics data. Journal of Computational Biology 16: 16.

    Google Scholar 

  • Petrova, M.I., E. Lievens, et al. 2015. Lactobacillus species as biomarkers and agents that can promote various aspects of vaginal health. Frontiers in Physiology 6 (81).

    Google Scholar 

  • Potts, J.M., and J. Elith. 2006. Comparing species abundance models. Ecological Modelling 199 (2): 153–163.

    Article  Google Scholar 

  • Romero, R., S.S. Hassan, et al. 2014. The composition and stability of the vaginal microbiota of normal pregnant women is different from that of non-pregnant women. Microbiome 2: 4.

    Article  Google Scholar 

  • Rose, C.E., S.W. Martin, et al. 2006. On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of Biopharmaceutical Statistics 16 (4): 463–481.

    Article  MathSciNet  Google Scholar 

  • Schwarz, G. 1978. Estimating the dimension of a model. The Annals of Statistics 6 (2): 461–464.

    Article  MathSciNet  Google Scholar 

  • Shopova, E. 2001. Lactobacillus spp. as part of the normal microflora and as pathogens in humans. Akush Ginekol 42 (2): 22–25.

    Google Scholar 

  • Sileshi, G., G. Hailu, et al. 2009. Traditional occupancy–abundance models are inadequate for zero-inflated ecological count data. Ecological Modelling 220 (15): 1764–1775.

    Article  Google Scholar 

  • Tu, W., and H. Liu. 2014. Zero-inflated data. Wiley StatsRef: statistics reference online. Chichester: Wiley.

    Google Scholar 

  • Vuong, Q.H. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57 (2): 307–333.

    Article  MathSciNet  Google Scholar 

  • Wang, J., L.B. Thingholm, et al. 2016. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nature Genetics 48 (11): 1396–1406.

    Article  Google Scholar 

  • Welsh, A.H., R.B. Cunningham, et al. 1996. Modelling the abundance of rare species: Statistical models for counts with extra zeros. Ecological Modelling 88 (1): 297–308.

    Article  Google Scholar 

  • Winkelmann, R., and K.F. Zimmermann. 1995. Recent developments in count data modelling: Theory and application. Journal of Economic Surveys 9 (1): 1–24.

    Article  Google Scholar 

  • Xia, Y., D. Morrison-Beedy, et al. 2012. Modeling count outcomes from HIV risk reduction interventions: A comparison of competing statistical models for count responses. AIDS Research and Treatment 2012: 11 pages.

    Article  Google Scholar 

  • Xu, L., A.D. Paterson, et al. 2015. Assessment and selection of competing models for zero-inflated microbiome data. PLoS ONE 10 (7): e0129606.

    Article  Google Scholar 

  • Yan, H., R. Potu, et al. 2013. Dietary fat content and fiber type modulate hind gut microbial community and metabolic markers in the pig. PLoS One 8: e59581.

    Article  Google Scholar 

  • Yau, K., K. Wang, et al. 2003. Zero‐inflated negative binomial mixed regression modeling of over‐dispersed count data with extra zeros. Biometrical Journal 45 (4): 437.

    Article  MathSciNet  Google Scholar 

  • Yau, K.K.W., A.H. Lee, et al. 2004. Modeling zero-inflated count series with application to occupational health. Computer Methods and Programs in Biomedicine 74 (1): 47–52.

    Article  Google Scholar 

  • Yusuf, O., T. Bello, et al. 2017. Zero inflated poisson and zero inflated negative binomial models with application to number of falls in the elderly. Biostatistics and Biometrics Open Access Journal 1 (4): 555566.

    Google Scholar 

  • Zeileis, A., C. Kleiber, et al. 2008. Regression models for count data in R. Journal of Statistical Software 27 (8): 1–25.

    Article  Google Scholar 

  • Zuur, A.F., E.N. Ieno, et al. 2009. Mixed effects models and extensions in ecology with R. New York, NY: Springer Science & Business Media, LLC.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinglin Xia .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xia, Y., Sun, J., Chen, DG. (2018). Modeling Zero-Inflated Microbiome Data. In: Statistical Analysis of Microbiome Data with R. ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-13-1534-3_12

Download citation

Publish with us

Policies and ethics