Skip to main content

From RNA-seq to Biological Inference: Using Compositional Data Analysis in Meta-Transcriptomics

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1849))

Abstract

The proper analysis of high-throughput sequencing datasets of mixed microbial communities (meta-transcriptomics) is substantially more complex than for datasets composed of single organisms. Adapting commonly used RNA-seq methods to the analysis of meta-transcriptome datasets can be misleading and not use all the available information in a consistent manner. However, meta-transcriptomic experiments can be investigated in a principled manner using Bayesian probabilistic modeling of the data at a functional level coupled with analysis under a compositional data analysis paradigm. We present a worked example for the differential functional evaluation of mixed-species microbial communities obtained from human clinical samples that were sequenced on an Illumina platform. We demonstrate methods to functionally map reads directly, conduct a compositionally appropriate exploratory data analysis, evaluate differential relative abundance, and finally identify compositionally associated (constant ratio) functions. Using these approaches we have found that meta-transcriptomic functional analyses are highly reproducible and convey significant information regarding the ecosystem.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Jiang Y, Xiong X, Danska J, Parkinson J (2016) Metatranscriptomic analysis of diverse microbial communities reveals core metabolic pathways and microbiome-specific functionality. Microbiome 4:2. https://doi.org/10.1186/s40168-015-0146-x

    Article  PubMed  PubMed Central  Google Scholar 

  2. Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB (2013) ANOVA-like differ- ential expression (aldex) analysis for mixed population rna-seq. PLoS One 8:e67019. https://doi.org/10.1371/journal.pone.0067019

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Macklaim MJ, Fernandes DA, Di Bella MJ, Hammond J-A, Reid G, Gloor GB (2013) Com- parative meta-RNA-seq of the vaginal microbiota and differential expression by Lactobacillus iners in health and dysbiosis. Microbiome 1:15. doi: https://doi.org/10.1186/2049-2618-1-12

    Article  Google Scholar 

  4. Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London, England

    Book  Google Scholar 

  5. van den Boogaart KG, Tolosana-Delgado R (2008) “Compositions”: a unified R package to analyze compositional data. Comput Geosci 34:320–338. https://doi.org/10.1016/j.cageo.2006.11.017

    Article  Google Scholar 

  6. Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. John Wiley & Sons

    Google Scholar 

  7. Bernstein JA, Khodursky AB, Lin P-H, Lin-Chao S, Cohen SN (2002) Global analysis of mRNA decay and abundance in escherichia coli at single-gene resolution using two-color fluorescent dna microarrays. Proc Natl Acad Sci 99:9697–9702

    Article  CAS  Google Scholar 

  8. Macklaim JM, Gloor GB, Anukam KC, Cribby S, Reid G (2011) At the crossroads of vaginal health and disease, the genome sequence of Lactobacillus iners AB-1. Proc Natl Acad Sci U S A 108(Suppl 1):4688–4695. https://doi.org/10.1073/pnas.1000086107

    Article  PubMed  Google Scholar 

  9. Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E (2017) Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin Microbiol Infect. https://doi.org/10.1016/j.cmi.2017.10.013

    Article  CAS  Google Scholar 

  10. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V, Wattam AR, Xia F, Stevens R (2014) The seed and the rapid annotation of microbial genomes using subsystems technology (rast). Nucleic Acids Res 42:D206–D214. https://doi.org/10.1093/nar/gkt1226

    Article  CAS  PubMed  Google Scholar 

  11. Mitra S, Rupek P, Richter DC, Urich T, Gilbert JA, Meyer F, Wilke A, Huson DH (2011) Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG. BMC Bioinform 12 Suppl 1:S21. https://doi.org/10.1186/1471-2105-12-S1-S21

    Article  Google Scholar 

  12. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010) KEGG for represen- tation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38:D355–D360. https://doi.org/10.1093/nar/gkp896

    Article  CAS  PubMed  Google Scholar 

  13. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ (2017) Microbiome datasets are compositional: and this is not optional. Front Microbiol 8:2224. https://doi.org/10.3389/fmicb.2017.02224

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gloor GB, Macklaim JM, Vu M, Fernandes AD (2016) Compositional uncertainty should not be ignored in high-throughput sequencing data analysis. Aust J Stat 45:73–87. https://doi.org/10.17713/ajs.v45i4.122

    Article  Google Scholar 

  15. Lovell D, Pawlowsky-Glahn V, Egozcue JJ, Marguerat S, Bähler J (2015) Proportionality: a valid alternative to correlation for relative data. PLoS Comput Biol 11:e1004075. https://doi.org/10.1371/journal.pcbi.1004075

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Quinn TP, Erb I, Richardson MF, Crowley TM (2017) Understanding sequencing data as compositions: an outlook and review. bioRxiv. https://doi.org/10.1101/206425

  17. Aitchison J (1983) Principal component analysis of compositional data. Biometrika 70:57–65

    Article  Google Scholar 

  18. Egozcue JJ, Pawlowsky-Glahn V, Gloor GB (2018) Linear association in compositional data analysis. Aust J Stat 47:3–31

    Article  Google Scholar 

  19. Palarea-Albaladejo J, Martín-Fernández JA (2015) ZCompositions—R package for mul- tivariate imputation of left-censored data under a compositional approach. Chemom Intel Lab Syst 143, 85:–96. https://doi.org/10.1016/j.chemolab.2015.02.019

    Article  CAS  Google Scholar 

  20. Jaynes ET, Bretthorst GL (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge

    Book  Google Scholar 

  21. Thorsen J, Brejnrod A, Mortensen M, Rasmussen MA, Stokholm J, Al-Soud WA, Sørensen S, Bisgaard H, Waage J (2016) Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome 4:62. https://doi.org/10.1186/s40168-016-0208-8

    Article  PubMed  PubMed Central  Google Scholar 

  22. Bian G, Gloor GB, Gong A, Jia C, Zhang W, Hu J, Zhang H, Zhang Y, Zhou Z, Zhang J, Burton JP, Reid G, Xiao Y, Zeng Q, Yang K, Li J The gut microbiota of healthy aged chinese is similar to that of the healthy young. mSphere 2:e00327–e00317. https://doi.org/10.1128/mSphere.00327-17

  23. Goneau LW, Hannan TJ, MacPhee RA, Schwartz DJ, Macklaim JM, Gloor GB, Razvi H, Reid G, Hultgren SJ, Burton JP (2015) Subinhibitory antibiotic therapy alters recurrent urinary tract infection pathogenesis through modulation of bacterial virulence and host immunity. MBio 6. https://doi.org/10.1128/mBio.00356-15

  24. McMillan A, Rulisa S, Sumarah M, Macklaim JM, Renaud J, Bisanz JE, Gloor GB, Reid G (2015) A multi-platform metabolomics approach identifies highly specific biomarkers of bacterial diversity in the vagina of pregnant and non-pregnant women. Sci Rep 5:14174. https://doi.org/10.1038/srep14174

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. McMurrough TA, Dickson RJ, Thibert SMF, Gloor GB, Edgell DR (2014) Control of catalytic efficiency by a coevolving network of catalytic and noncatalytic residues. Proc Natl Acad Sci U S A 111:E2376–E2383. https://doi.org/10.1073/pnas.1322352111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Aitchison J, Greenacre M (2002) Biplots of compositional data. J Royal Stat Soc Ser C (Appl Stat) 51:375–392

    Article  Google Scholar 

  27. Hawinkel S, Mattiello F, Bijnens L, Thas O (2017) A broken promise: microbiome differential abundance methods do not control the false discovery rate. Brief Bioinform bbx104

    Google Scholar 

  28. Quinn T, Richardson MF, Lovell D, Crowley T (2017) Propr: an R-package for identifying proportionally abundant features using compositional data analysis. bioRxiv. https://doi.org/10.1101/104935

  29. Erb I, Quinn T, Lovell D, Notredame C (2017) Differential proportionality—a normalization-free approach to differential gene expression. bioRxiv. https://doi.org/10.1101/134536

  30. Gloor GB, Reid G (2016) Compositional analysis: A valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol 62:692–703. https://doi.org/10.1139/cjm-2015-0821

    Article  CAS  PubMed  Google Scholar 

  31. Gloor GB, Macklaim JM, Fernandes AD (2016) Displaying variation in large datasets: Plotting a visual summary of effect sizes. J Comput Graph Stat 25:971–979. https://doi.org/10.1080/10618600.2015.1131161

    Article  Google Scholar 

  32. Erb I, Notredame C (2016) How should we measure proportionality on relative gene expression data? Theory Biosci 135:21–36

    Article  CAS  Google Scholar 

  33. Gierliński M, Cole C, Schofield P, Schurch NJ, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson G, Owen-Hughes T, Blaxter M, Barton GJ (2015) Statistical models for rna-seq data derived from a two-condition 48-replicate experiment. Bioinformatics 31:3625–3630. https://doi.org/10.1093/bioinformatics/btv425

    Article  PubMed  PubMed Central  Google Scholar 

  34. Halsey LG, Curran-Everett D, Vowler SL, Drummond GB (2015) The fickle p value generates irreproducible results. Nat Methods 12:179–185. https://doi.org/10.1038/nmeth.3288

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gregory B. Gloor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Macklaim, J.M., Gloor, G.B. (2018). From RNA-seq to Biological Inference: Using Compositional Data Analysis in Meta-Transcriptomics. In: Beiko, R., Hsiao, W., Parkinson, J. (eds) Microbiome Analysis. Methods in Molecular Biology, vol 1849. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8728-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8728-3_13

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8726-9

  • Online ISBN: 978-1-4939-8728-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics