Skip to main content
Log in

Compositional analysis of overdispersed counts using generalized estimating equations

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Multivariate abundance data are commonly collected in ecology, and used to explore questions of “community composition”—how relative abundance of different taxa changes with environmental conditions. In this paper, we propose a log-linear marginal modeling approach for analyzing such compositional count data, via generalized estimating equations. This method exploits the multiplicative nature of log-linear models for counts, by reparameterizing models that describe marginal effects on mean abundance. This allows partitioning into “main effects” and compositional effects, which is appealing for interpretation. We apply the proposed approach to reanalyze compositional counts of benthic invertebrates from Delaware Bay, and data of invertebrate communities inhabiting Acacia plants in eastern Australia. In both cases we resort to a resampling approach to make inferences about regression parameters, because the number of clusters was not large compared to cluster size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London

    Google Scholar 

  • Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26: 32–46

    Google Scholar 

  • Andrew NR, Hughes L (2005) Arthropod community structure along a latitudinal gradient: implications for future impacts of climate change. Austral Ecol 30: 281–297

    Article  Google Scholar 

  • Billheimer D, Cardoso T, Freeman E, Guttorp P, Ko H, Silkey M (1997) Natural variability of benthic species composition in the Delaware Bay. Environ Ecol Stat 4: 95–115

    Article  Google Scholar 

  • Billheimer D, Guttorp P, Fagan WF (2001) Statistical interpretation of species composition. J Am Stat Assoc 96: 1205–1214

    Article  Google Scholar 

  • Chaganty N (1997) An alternative approach to the analysis of longitudinal data via generalized estimating equations. J Stat Plan Inference 63: 39–54

    Article  Google Scholar 

  • Crowder M (1995) On the use of a working correlation matrix in using generalised linear models for repeated measures. Biometrika 82(2): 407–410

    Article  Google Scholar 

  • Davison AC, Hinkley DV (1997) Bootstrap methods and their application. Cambridge University Press, Cambridge

    Google Scholar 

  • Drum M, McCullagh P (1993) Regression models for discrete longitudinal responses: comment. Stat Sci 8(3): 300–301

    Article  Google Scholar 

  • Duong T (2005) ks: Kernel smoothing. http://web.maths.unsw.edu.au/~tduong, R package version 1.3.4

  • Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, New York

    Google Scholar 

  • Hardin JW, Hilbe JM (2002) Generalized estimating equations. Chapman & Hall, Boca Raton

    Book  Google Scholar 

  • Hilbe JM (2007) Negative binomial regression. Cambridge University Press, Cambridge

    Google Scholar 

  • Lahiri SN (2003) Resampling methods for dependent data. Springer, New York

    Google Scholar 

  • Lawless JF (1987) Negative binomial and mixed Poisson regression. Can J Stat 15: 209–225

    Article  Google Scholar 

  • Leps J, Smilauer P (2003) Multivariate analysis of ecological data using CANOCO. The Univeristy Press, Cambridge

    Book  Google Scholar 

  • Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73: 13–22

    Article  Google Scholar 

  • Mancl LA, DeRouen TA (2001) A covariance estimator for GEE with improved small-sample properties. Biometrics 57(1): 126–134

    Article  PubMed  CAS  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London

    Google Scholar 

  • Pan W (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57: 120–125

    Article  PubMed  CAS  Google Scholar 

  • Shults J, Chaganty NR (1998) Analysis of serially correlated data using quasi-least squares. Biometrics 54: 1622–1630

    Article  Google Scholar 

  • Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16(3): 275–289

    Article  Google Scholar 

  • Warton DI (2008) Raw data graphing: an informative but under-utilized tool for the analysis of multivariate abundances. Austral Ecol 33(3): 290–300

    Article  Google Scholar 

  • Warton DI (in press) Regularized sandwich estimators for analysis of high dimensional data using generalized estimating equations. Biometrics

  • Zeger SL, Liang KY (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42: 121–130

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David I. Warton.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Warton, D.I., Guttorp, P. Compositional analysis of overdispersed counts using generalized estimating equations. Environ Ecol Stat 18, 427–446 (2011). https://doi.org/10.1007/s10651-010-0145-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-010-0145-9

Keywords

Navigation