Skip to main content

Modeling Over-Dispersed Microbiome Data

  • Chapter
  • First Online:

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

However, count data is not purely relative—the count pair (1, 2) carries different information than counts of (1000, 2000) even though the relative amounts of the two components are the same.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Anders, S., and W. Huber. 2010. Differential expression analysis for sequence count data. Genome Biology 11 (10): R106.

    Article  Google Scholar 

  • Anders, S., D.J. McCarthy, et al. 2013. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature Protocols 8 (9): 1765–1786.

    Article  Google Scholar 

  • Bacon-Shone, J. 2008. Discrete and continuous compositions. In Proceedings of CODAWORK’08, The 3rd Compositional Data Analysis Workshop, ed. J. Daunis-i Estadella and J. E. Fernández. Girona: University of Girona.

    Google Scholar 

  • Baggerly, K.A., L. Deng, et al. 2003. Differential expression in SAGE: Accounting for normal between-library variation. Bioinformatics 19 (12): 1477–1483.

    Article  Google Scholar 

  • Bottomly, D., N.A.R. Walter, et al. 2011. Evaluating gene expression in C57BL/6 J and DBA/2 J mouse striatum using RNA-seq and microarrays. PLoS ONE 6 (3): e17820.

    Article  Google Scholar 

  • Bourgon, R., R. Gentleman, et al. 2010. Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences 107 (21): 9546–9551.

    Article  Google Scholar 

  • Bullard, J.H., E. Purdom, et al. 2010. Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinformatics 11 (1): 94.

    Article  Google Scholar 

  • Cameron, A.C., and P.K. Trivedi. 1998. Regression analysis of count data. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  • Charlson, E.S., J. Chen, et al. 2010. Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS ONE 5 (12): e15216.

    Article  Google Scholar 

  • Chen, Y., D. McCarthy, et al. 2017. edgeR: Differential expression analysis of digital gene expression data User’s Guide. (Last revised September 15, 2017): 1–115.

    Google Scholar 

  • Costea, P. I., G. Zeller, et al. 2017. Towards standards for human fecal sample processing in metagenomic studies. Nature Biotechnology (advance online publication).

    Google Scholar 

  • Cui, X., J.T. Hwang, et al. 2005. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6 (1): 59–75.

    Article  Google Scholar 

  • Dillies, M.-A., A. Rau, et al. 2013. A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics 14 (6): 671–683.

    Article  Google Scholar 

  • Greenacre, M. 2011. Compositional data and correspondence analysis. In Compositional data analysis: Theory and applications, ed. V. Pawlowsky-Glahn, and A. Buccianti, 104–113. Chichester, UK: Wiley.

    Google Scholar 

  • Harati, S., J.H. Phan, et al. 2014. Investigation of factors affecting RNA-seq gene expression calls. Proceedings of Conference of IEEE Engineering in Medicine and Biology Society 5 (10): 6944805.

    Google Scholar 

  • Harris, R. A., T. Wang, et al. 2010. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat Biotechnol 28.

    Article  Google Scholar 

  • Kuczynski, J., C.L. Lauber, et al. 2011. Experimental and analytical tools for studying the human microbiome. Nature Reviews Genetics 13 (1): 47–58.

    Article  Google Scholar 

  • Kvam, V.M., P. Liu, et al. 2012. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. American Journal of Botany 99 (2): 248–256.

    Article  Google Scholar 

  • Law, C.W., Y. Chen, et al. 2014. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15 (2): R29.

    Article  Google Scholar 

  • Li, H. 2015. Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application 2: 73–94.

    Article  Google Scholar 

  • Love, M.I., W. Huber, et al. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15 (12): 550.

    Article  Google Scholar 

  • Lovell, D., V. Pawlowsky-Glahn, et al. 2015. Proportionality: A valid alternative to correlation for relative data. PLoS Computational Biology 11 (3): e1004075.

    Article  Google Scholar 

  • Lu, J., J. K. Tomfohr, et al. 2005. Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach. BMC Bioinformatics 6.

    Article  Google Scholar 

  • Marioni, J.C., C.E. Mason, et al. 2008. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 18 (9): 1509–1517.

    Article  Google Scholar 

  • McCarthy, D.J., Y. Chen, et al. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40 (10): 4288–4297.

    Article  Google Scholar 

  • McCullagh, P., and J. Nelder. 1989. Generalized linearmodels. London, UK: Chapman & Hall/CRC.

    Google Scholar 

  • McMurdie, P.J., and S. Holmes. 2014. Waste not, want not: Why rarefying microbiome data is inadmissible. PLoS Computational Biology 10 (4): e1003531.

    Article  Google Scholar 

  • Munro, S.A., S.P. Lund, et al. 2014. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nature Communications 5: 5125.

    Article  Google Scholar 

  • Murdoch, D.J., Y.-L. Tsai, et al. 2008. P-values are random variables. The American Statistician 62 (3): 242–245.

    Article  MathSciNet  Google Scholar 

  • Nagalakshmi, U., Z. Wang, et al. 2008. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320.

    Article  Google Scholar 

  • Nookaew, I., M. Papini, et al. 2012. A comprehensive comparison of RNA-seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: A case study in Saccharomyces cerevisiae. Nucleic Acids Research 40 (20): 10084–10097.

    Article  Google Scholar 

  • Oshlack, A., M.D. Robinson, et al. 2010. From RNA-seq reads to differential expression results. Genome Biology 11 (12): 220.

    Article  Google Scholar 

  • Rapaport, F., R. Khanin, et al. 2013. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biology 14 (9): R95–R95.

    Article  Google Scholar 

  • Rau, A., M. Gallopin, et al. 2013. Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics 29 (17): 2146–2152.

    Article  Google Scholar 

  • Robinson, M.D., D.J. McCarthy, et al. 2010. edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 (1): 139–140.

    Article  Google Scholar 

  • Robinson, M.D., and A. Oshlack. 2010. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11 (3): R25–R25.

    Article  Google Scholar 

  • Robinson, M.D., and G.K. Smyth. 2007. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23 (21): 2881–2887.

    Article  Google Scholar 

  • Robinson, M.D., and G.K. Smyth. 2008. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9 (2): 321–332.

    Article  Google Scholar 

  • Sha, Y., J. H. Phan, et al. 2015. Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. Conference Proceedings: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 6461–6464.

    Google Scholar 

  • Smyth, G.K. 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3: 12.

    Article  MathSciNet  Google Scholar 

  • Soneson, C., and M. Delorenzi. 2013. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14 (91): 1471–2105.

    Google Scholar 

  • Sultan, M., M.H. Schulz, et al. 2008. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 (5891): 956–960.

    Article  Google Scholar 

  • Wang, L., Z. Feng, et al. 2010. DEGseq: An R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26 (1): 136–138.

    Article  Google Scholar 

  • Xia, Y., D. Morrison-Beedy, et al. 2012. Modeling count outcomes from HIV risk reduction interventions: A comparison of competing statistical models for count responses. AIDS Research and Treatment 2012: 11 pages.

    Article  Google Scholar 

  • Yu, D., W. Huber, et al. 2013. Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size. Bioinformatics 29 (10): 1275–1282.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinglin Xia .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xia, Y., Sun, J., Chen, DG. (2018). Modeling Over-Dispersed Microbiome Data. In: Statistical Analysis of Microbiome Data with R. ICSA Book Series in Statistics. Springer, Singapore. https://doi.org/10.1007/978-981-13-1534-3_11

Download citation

Publish with us

Policies and ethics