Processing and Analyzing Human Microbiome Data

Zhu, Xuan; Wang, Jian; Reyes-Gibby, Cielito; Shete, Sanjay

doi:10.1007/978-1-4939-7274-6_31

Processing and Analyzing Human Microbiome Data

Xuan Zhu³,
Jian Wang³,
Cielito Reyes-Gibby⁴ &
…
Sanjay Shete^3,5

Protocol
First Online: 05 October 2017

3825 Accesses
4 Citations
2 Altmetric

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1666))

An erratum to this publication is available online at https://doi.org/10.1007/978-1-4939-7274-6_32

Abstract

The human microbiome is associated with complex disorders such as diabetes, cancer, obesity and cardiovascular disorders. Recent technological developments have allowed researchers to fully quantify the composition of the microbiome using culture-independent approaches, resulting in a large amount of microbiome data, which provide invaluable opportunities to assess the important contributions of the microbiome to human health and disease. In this chapter, we discuss and evaluate multiple statistical approaches for processing, summarizing, and analyzing microbiome data. Specifically, we provide programming scripts for processing microbiome data using QIIME and calculating alpha and beta diversities, assessing the association between diversities and outcomes of interest using R programs, as well as interpretation of results. We illustrate the methods in the context of analyzing the foregut microbiome in esophageal adenocarcinoma.

The original version of this chapter was revised. A correction to this chapter can be found at https://doi.org/10.1007/978-1-4939-7274-6_32

This is a preview of subscription content, log in via an institution.

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Change history

05 September 2018
The original version of this chapter was inadvertently published without including the dbGaP acknowledgment. The updated chapter now contains that information.

References

Ursell LK, Metcalf JL, Parfrey LW, Knight R (2012) Defining the human microbiome. Nutr Rev 70(Suppl 1):S38–S44
Article PubMed PubMed Central Google Scholar
Li H (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Appl 2:73–94
Article Google Scholar
Backhed F, Ley RE, Sonnenburg JL, Peterson DA, Gordon JI (2005) Host-bacterial mutualism in the human intestine. Science 307:1915–1920
Article CAS PubMed Google Scholar
Human Microbiome Project (2016) About HMP metagenomic sequencing & analysis. http://hmpdacc.org/micro_analysis/microbiome_ analyses.php
Claus SP, Guillou H, Ellero-Simatos S (2016) The gut microbiota: a major player in the toxicity of environmental pollutants? NPJ Biofilms Microbiomes 2:16003
Article PubMed PubMed Central Google Scholar
National Institutes of Health (2016) NIH Human Microbiome Project defines normal bacterial makeup of the body. https://www.nih.gov/news-events/news-releases/nih-human-microbiome-project-defines-normal-bacterial-makeup-body
Hartstra AV, Bouter KEC, Backhed F, Nieuwdorp M (2015) Insights into the role of the microbiome in obesity and type 2 diabetes. Diabetes Care 38:159–165
Article CAS PubMed Google Scholar
Tang WHW, Hazen SL (2014) The contributory role of gut microbiota in cardiovascular disease. J Clin Invest 124:4204–4211
Article CAS PubMed PubMed Central Google Scholar
Dulal S, Keku TO (2014) Gut microbiome and colorectal adenomas. Cancer J 20:225–231
Article CAS PubMed PubMed Central Google Scholar
Illumina (2016) Introduction to human microbiome analysis, Survey the genomes of entire communities. http://www.illumina.com/areas-of-interest/microbiology/human-microbiome-analysis.html
Woo PCY, Lau SKP, Teng JLL, Tse H, Yuen KY (2008) Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories. Clin Microbiol Infect 14:908–934
Article CAS PubMed Google Scholar
Clarridge JE (2004) Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin Microbiol Rev 17:840–862
Article CAS PubMed PubMed Central Google Scholar
Hamady M, Knight R (2009) Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19:1141–1152
Article CAS PubMed PubMed Central Google Scholar
Fiona Stewart EY (2014) Addressing challenges in microbiome DNA analysis, NEB UK Expressions
Google Scholar
Brooks JP (2016) Challenges for case-control studies with microbiome data. Ann Epidemiol 26:336–341
Google Scholar
Yang L, Chaudhary N, Baghdadi J, Pei Z (2014) Microbiome in reflux disorders and esophageal adenocarcinoma. Cancer J 20:207–210
Article CAS PubMed PubMed Central Google Scholar
Gilles A, Meglecz E, Pech N, Ferreira S, Malausa T, Martin JF (2011) Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12:245
Article PubMed PubMed Central Google Scholar
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, Gormley N, Gilbert JA, Smith G, Knight R (2012) Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6:1621–1624
Article CAS PubMed PubMed Central Google Scholar
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541
Article CAS PubMed PubMed Central Google Scholar
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336
Article CAS PubMed PubMed Central Google Scholar
Erica Plummer JT, Bulach DM, Garland SM, Tabrizi SN (2015) A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. J Proteomics Bioinform 8:283–291
Google Scholar
Navas-Molina JA, Peralta-Sanchez JM, Gonzalez A, McMurdie PJ, Vazquez-Baeza Y, Xu ZJ, Ursell LK, Lauber C, Zhou HW, Song SJ, Huntley J, Ackermann GL, Berg-Lyons D, Holmes S, Caporaso JG, Knight R (2013) Advancing our understanding of the human microbiome using QIIME. Methods Enzymol 531:371–444
Article CAS PubMed PubMed Central Google Scholar
Mir K, Neuhaus K, Bossert M, Schober S (2013) Short barcodes for next generation sequencing. PLoS One 8:e82933
Article CAS PubMed PubMed Central Google Scholar
Edgar RC (2013) UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 10:996–998
Article CAS Google Scholar
Scitable by nature education (2016) Primer. http://www.nature.com/scitable/definition/primer-305
De Beuf K, De Schrijver J, Thas O, Van Criekinge W, Irizarry RA, Clement L (2012) Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model. BMC Bioinformatics 13:303
Article PubMed PubMed Central Google Scholar
Si XF, Baselga A, Leprieur F, Song X, Ding P (2016) Selective extinction drives taxonomic and functional alpha and beta diversities in island bird assemblages. J Anim Ecol 85:409–418
Article PubMed Google Scholar
McMurdie PJ, Holmes S (2013) Phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 8:e61217
Article CAS PubMed PubMed Central Google Scholar
Hill MO (1973) Diversity and evenness: a unifying notation and its consequences. Ecology 54:427–432
Article Google Scholar
Li K, Bihan M, Yooseph S, Methe BA (2012) Analyses of the microbial diversity across the human microbiome. PLoS One 7:e32118
Article CAS PubMed PubMed Central Google Scholar
Lande R (1996) Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos 76:5–13
Article Google Scholar
Basualdo CV (2011) Choosing the best non-parametric richness estimator for benthic macroinvertebrates databases. Rev Soc Entomol Argent 70(1–2):27–38
Google Scholar
Sandra D, Williamson KB (2013) Species richness and diversity of a terrestrial insular environment: serpentine of the Barberton Greenstone Belt, South Africa. Int J Biodivers Conserv 5(5):296–310
Google Scholar
Morris EK, Caruso T, Buscot F, Fischer M, Hancock C, Maier TS, Meiners T, Muller C, Obermaier E, Prati D, Socher SA, Sonnemann I, Waschke N, Wubet T, Wurst S, Rillig MC (2014) Choosing and using diversity indices: insights for ecological applications from the German Biodiversity Exploratories. Ecol Evol 4:3514–3524
Article PubMed PubMed Central Google Scholar
Nagendra H (2002) Opposite trends in response for the Shannon and Simpson indices of landscape diversity. Appl Geogr 22:175–186
Article Google Scholar
Saucedo-Garcia A, Anaya AL, Espinosa-Garcia FJ, Gonzalez MC (2014) Diversity and communities of foliar endophytic fungi from different agroecosystems of Coffea arabica L. in two regions of Veracruz, Mexico. PLoS One 9:e98454
Article CAS PubMed PubMed Central Google Scholar
Williams VL, Witkowski ETF, Balkwill K (2005) Application of diversity indices to appraise plant availability in the traditional medicinal markets of Johannesburg, South Africa. Biodivers Conserv 14:2971–3001
Article Google Scholar
Colwell RK (2009) Biodiversity: concepts, patterns, and measurement. In: Levin SA (ed) The Princeton guide to ecology. Princeton University Press, Princeton, NJ, pp 257–263
Google Scholar
Fisher RA, Corbet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12:42–58
Article Google Scholar
Magurran AE (2004) Measuring biological diversity. Blackwell Publishing, Oxford, UK
Google Scholar
Hughes JB, Hellmann JJ, Ricketts TH, Bohannan BJ (2001) Counting the uncountable: statistical approaches to estimating microbial diversity. Appl Environ Microbiol 67:4399–4406
Article CAS PubMed PubMed Central Google Scholar
Chao A, Ma MC, Yang MCK (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80:193–201
Article Google Scholar
Gotelli NJ, Colwell RK (2010) Estimating species richness. In: Magurran AE, McGill BJ (eds) Frontiers in measuring biodiversity. Oxford University, New York, pp 39–54
Google Scholar
Chao A, Chazdon RL, Colwell RK, Shen TJ (2005) A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecol Lett 8:148–159
Article Google Scholar
Soininen J (2010) Species turnover along abiotic and biotic gradients: patterns in space equal patterns in time? Bioscience 60:433–439
Article Google Scholar
Koleff P, Gaston KJ, Lennon JJ (2003) Measuring beta diversity for presence-absence data. J Anim Ecol 72:367–382
Article Google Scholar
Biology-forums (2016) Species turnover. http://biology-forums.com/definitions/index.php/Species_turnover
Shirkhorshidi AS, Aghabozorgi S, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS One 10:e0144059
Article CAS PubMed PubMed Central Google Scholar
Emran SM, Ye N (2002) Robustness of Chi-square and Canberra distance metrics for computer intrusion detection. Qual Reliab Eng Int 18:19–28
Article Google Scholar
Giuseppe Jurman SR, Visintainer R, Furlanello C (2009) Canberra distance on ranked lists. Advances in ranking–NIPS 09 workshop, pp 22–27
Google Scholar
Hennig C, Hausdorf B (2006) Design of dissimilarity measures: a new dissimilarity between species distribution areas. In: Batagelj V, Bock H-H, Ferligoj A, Žiberna A (eds) Stud class data anal. Springer, Berlin, Heidelberg, pp 29–37
Google Scholar
Anderson MJ, Millar RB (2004) Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern New Zealand. J Exp Mar Biol Ecol 305:191–221
Article Google Scholar
Horn HS (1966) Measurement of overlap in comparative ecological studies. Am Nat 100:419
Article Google Scholar
Anderson MJ, Ellingsen KE, McArdle BH (2006) Multivariate dispersion as a measure of beta diversity. Ecol Lett 9:683–693
Article Google Scholar
Cao Y, Williams WP, Bark AW (1997) Similarity measure bias in river benthic Aufwuchs community analysis. Water Environ Res 69:95–106
Article CAS Google Scholar
Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71:8228–8235
Article CAS PubMed PubMed Central Google Scholar
Clarke KR, Somerfield PJ, Chapman MG (2006) On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages. J Exp Mar Biol Ecol 330:55–80
Article Google Scholar
Fukuyama J, McMurdie PJ, Dethlefsen L, Relman DA, Holmes S (2012) Comparisons of distance methods for combining covariates and abundances in microbiome studies. Pac Symp Biocomput:213–224
Google Scholar
Lozupone CA, Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104:11436–11440
Article CAS PubMed PubMed Central Google Scholar
Schloss PD (2008) Evaluating different approaches that test whether microbial communities have the same structure. ISME J 2:265–275
Article PubMed Google Scholar
Ives AR, Helmus MR (2010) Phylogenetic metrics of community similarity. Am Nat 176:E128–E142
Article PubMed Google Scholar
McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297
Article Google Scholar
Zhao N, Chen J, Carroll IM, Ringel-Kulka T, Epstein MP, Zhou H, Zhou JJ, Ringel Y, Li HZ, Wu MC (2015) Testing in microbiome-profiling studies with MiRKAT, the Microbiome regression-based kernel association test. Am J Hum Genet 96:797–807
Article CAS PubMed PubMed Central Google Scholar
Zhu X, Wang J, Peng B, Shete S (2016) Empirical estimation of sequencing error rates using smoothing splines. BMC Bioinformatics 17:177
Article CAS PubMed PubMed Central Google Scholar
Scealy JL, Welsh AH (2011) Regression for compositional data by using distributions defined on the hypersphere. J R Stat Soc B 73:351–375
Article Google Scholar
Kent JT (1982) The Fisher-Bingham distribution on the sphere. J R Stat Soc B 44:71–80
Google Scholar
Aitchison J (1982) The statistical-analysis of compositional data. J R Stat Soc B 44:139–177
Google Scholar
Shi PX, Zhang AR, Li HZ (2016) Regression analysis for microbiome compositional data. Ann Appl Stat 10:1019–1040
Article Google Scholar
Fisher CK, Mehta P (2014) Identifying keystone species in the human gut microbiome from metagenomic timeseries using sparse linear regression. PLoS One 9:e102451
Article CAS PubMed PubMed Central Google Scholar
Chen EZ, Li HZ (2016) A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32:2611–2617
Article CAS PubMed PubMed Central Google Scholar
Gevers D, Knight R, Petrosino JF, Huang K, McGuire AL, Birren BW, Nelson KE, White O, Methe BA, Huttenhower C (2012) The Human Microbiome Project: a community resource for the healthy human microbiome. PLoS Biol 10:e1001377
Article CAS PubMed PubMed Central Google Scholar
Edgar RC (2016) UNCROSS: filtering of high-frequency cross-talk in 16S amplicon reads. doi:10.1101/088666
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6:610–618
Article CAS PubMed Google Scholar
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072
Article CAS PubMed PubMed Central Google Scholar
R Core Team (2016) R: a language and environment for statistical computing. R foundation for statistical computing. https://www.R-project.org/
van den Boogaart KG, Tolosana R, Bren M (2014) compositions: Compositional Data Analysis. R Package Version 1:40–1. http://CRAN.R-project.org/package=compositions
Oksanen J, Guillaume Blanchet F, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Henry M, Stevens H (2016) vegan: Community Ecology Package. R package version 2.3-5. http://CRAN.R-project.org/package=vegan
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290
Article CAS Google Scholar
Bokulich NA, Subramanian S, Faith JJ, Gevers D, Gordon JI, Knight R, Mills DA, Caporaso JG (2013) Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat Methods 10:57–59
Google Scholar
Walters WA, Caporaso JG, Lauber CL, Berg-Lyons D, Fierer N, Knight R (2011) PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27:1159–1161
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

Funding support for the Study of Foregut Microbiome in Development of Esophageal Adenocarcinoma was provided by the National Cancer Institute (UH3CA140233) through the Human Microbiome Project of the NIH Roadmap Initiative. Data for the Foregut Microbiome study were provided by Zhiheng Pei, MD, PhD, on behalf of his collaborators at New York University School of Medicine, the J. Craig Venter Institute, and Lawrence Berkeley National Laboratory.

Author information

Authors and Affiliations

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
Xuan Zhu, Jian Wang & Sanjay Shete
Department of Emergency Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
Cielito Reyes-Gibby
Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
Sanjay Shete

Authors

Xuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cielito Reyes-Gibby
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Shete
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjay Shete .

Editor information

Editors and Affiliations

Case Western Reserve University, Cleveland, OH, USA
Robert C. Elston

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Zhu, X., Wang, J., Reyes-Gibby, C., Shete, S. (2017). Processing and Analyzing Human Microbiome Data. In: Elston, R. (eds) Statistical Human Genetics. Methods in Molecular Biology, vol 1666. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7274-6_31

Download citation

DOI: https://doi.org/10.1007/978-1-4939-7274-6_31
Published: 05 October 2017
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7273-9
Online ISBN: 978-1-4939-7274-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics