Metagenomics: Focusing on the Haystack

  • Indu Khatri
  • Meenakshi AnuragEmail author


Metagenomics enables the genomics study of uncultured microorganisms, using inexpensive sequencing methods. This chapter provides a concise but comprehensive overview of the current computational methods in metagenomics and the recent progress made. The strategies, methods, software, and protocols generally used for metagenomics analysis of all environmental communities are discussed. Moreover, the challenges in the field of metagenomics, including applications where metagenomics analysis has opened up ways of investigating symbiosis, metabolic pathway construction in metagenomes, gene family enrichments, and disease association studies, are discussed.


Metagenomics Environment genomics Human microbiome Sequencing Computational tools Data analysis 


  1. Anagnostopoulos I, Herbst H, Niedobitek G, Stein H (1989) Demonstration of monoclonal EBV genomes in Hodgkin’s disease and Ki-1-positive anaplastic large cell lymphoma by combined Southern blot and in situ hybridization. Blood 74:810–816PubMedGoogle Scholar
  2. Antharam VC, Li EC, Ishmael A, Sharma A, Mai V et al (2013) Intestinal dysbiosis and depletion of butyrogenic bacteria in Clostridium difficile infection and nosocomial diarrhea. J Clin Microbiol 51:2884–2892CrossRefGoogle Scholar
  3. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75CrossRefGoogle Scholar
  4. Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795CrossRefGoogle Scholar
  5. Bergstrom A, Skov TH, Bahl MI, Roager HM, Christensen LB et al (2014) Establishment of intestinal microbiota during early life: a longitudinal, explorative study of a large cohort of Danish infants. Appl Environ Microbiol 80:2889–2900CrossRefGoogle Scholar
  6. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K et al (2007) CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinf 8:209CrossRefGoogle Scholar
  7. Blaser M, Bork P, Fraser C, Knight R, Wang J (2013) The microbiome explored: recent insights and future challenges. Nat Rev Microbiol 11:213–217CrossRefGoogle Scholar
  8. Brady A, Salzberg SL (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6:673–676CrossRefGoogle Scholar
  9. Brulc JM, Antonopoulos DA, Miller MEB, Wilson MK, Yannarell AC et al (2009) Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci U S A 106:1948–1953CrossRefGoogle Scholar
  10. Burke C, Steinberg P, Rusch D, Kjelleberg S, Thomas T (2011) Bacterial community assembly based on functional genes rather than species. Proc Natl Acad Sci 108:14288–14293CrossRefGoogle Scholar
  11. Campbell JH, Foster CM, Vishnivetskaya T, Campbell AG, Yang ZK et al (2012) Host genetic and environmental effects on mouse intestinal microbiota. ISME J 6:2033–2044CrossRefGoogle Scholar
  12. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336CrossRefGoogle Scholar
  13. Case RJ, Boucher Y, Dahllöf I, Holmström C, Doolittle WF, Kjelleberg S (2007) Use of 16S rRNA and rpoB genes as molecular markers for microbial ecology studies. Appl Environ Microbiol 73:278–288CrossRefGoogle Scholar
  14. Caspi R, Altman T, Billington R, Dreher K, Foerster H et al (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res 42:D459–D471CrossRefGoogle Scholar
  15. Chaturvedi AK, Engels EA, Pfeiffer RM, Hernandez BY, Xiao W et al (2011) Human papillomavirus and rising oropharyngeal cancer incidence in the United States. J Clin Oncol 29:4294–4301CrossRefGoogle Scholar
  16. Cho I, Yamanishi S, Cox L, Methe BA, Zavadil J et al (2012) Antibiotics in early life alter the murine colonic microbiome and adiposity. Nature 488:621–626CrossRefGoogle Scholar
  17. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283–1287CrossRefGoogle Scholar
  18. Colwell RK, Mao CX, Chang J (2004) Interpolating, Extrapolating, and comparing incidence-based species accumulation curves. Ecology 85:2717–2727CrossRefGoogle Scholar
  19. Consortium THMP (2012) Structure, function and diversity of the healthy human microbiome. Nature 486:207–214CrossRefGoogle Scholar
  20. Daling JR, Madeleine MM, Johnson LG, Schwartz SM, Shera KA et al (2004) Human papillomavirus, smoking, and sexual practices in the etiology of anal cancer. Cancer 101:270–280CrossRefGoogle Scholar
  21. Danino T, Prindle A, Kwong GA, Skalak M, Li H et al (2015) Programmable probiotics for detection of cancer in urine. Sci Transl Med 7:289ra284CrossRefGoogle Scholar
  22. Dave M, Higgins PD, Middha S, Rioux KP (2012) The human gut microbiome: current knowledge, challenges, and future directions. Transl Res: J Lab Clin Med 160:246–257CrossRefGoogle Scholar
  23. Davis MPA, van Dongen S, Abreu-Goodger C, Bartonicek N, Enright AJ (2013) Kraken: A set of tools for quality control and analysis of high-throughput sequence data. Methods 63:41–49CrossRefGoogle Scholar
  24. de Crécy-Lagard V (2014) Variations in metabolic pathways create challenges for automated metabolic reconstructions: Examples from the tetrahydrofolate synthesis pathway. Comput Struct Biotechnol J 10:41–50CrossRefGoogle Scholar
  25. De Filippo C, Ramazzotti M, Fontana P, Cavalieri D (2012) Bioinformatic approaches for functional annotation and pathway inference in metagenomics data. Brief Bioinform 13:696–710CrossRefGoogle Scholar
  26. Delmont TO, Robe P, Clark I, Simonet P, Vogel TM (2011) Metagenomic comparison of direct and indirect soil DNA extraction approaches. J Microbiol Methods 86:397–400CrossRefGoogle Scholar
  27. Desai N, Antonopoulos D, Gilbert JA, Glass EM, Meyer F (2012) From genomics to metagenomics. Curr Opin Biotechnol 23:72–76CrossRefGoogle Scholar
  28. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL et al (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072CrossRefGoogle Scholar
  29. Dominguez-Bello MG, Costello EK, Contreras M, Magris M, Hidalgo G et al (2010) Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns. Proc Natl Acad Sci U S A 107:11971–11975CrossRefGoogle Scholar
  30. Escobar-Zepeda A, Vera-Ponce de León A, Sanchez-Flores A (2015) The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front Genet 6:348CrossRefGoogle Scholar
  31. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230CrossRefGoogle Scholar
  32. Forster SC, Browne HP, Kumar N, Hunt M, Denise H et al (2016) HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes. Nucleic Acids Res 44:D604–D609CrossRefGoogle Scholar
  33. Franzosa EA, Huang K, Meadow JF, Gevers D, Lemon KP et al (2015) Identifying personal microbiomes using metagenomic codes. Proc Natl Acad Sci U S A 112:E2930–E2938CrossRefGoogle Scholar
  34. Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL et al (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37:D136–D140CrossRefGoogle Scholar
  35. Gianoulis TA, Raes J, Patel PV, Bjornson R, Korbel JO et al (2009) Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc Natl Acad Sci U S A 106:1374–1379CrossRefGoogle Scholar
  36. Gilbert JA, Field D, Swift P, Thomas S, Cummings D et al (2010) The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS ONE 5:e15545CrossRefGoogle Scholar
  37. Gillison ML, Chaturvedi AK, Lowy DR (2008) HPV prophylactic vaccines and the potential prevention of noncervical cancers in both men and women. Cancer 113:3036–3046CrossRefGoogle Scholar
  38. Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F (2010) Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc 2010:pdb.prot5368CrossRefGoogle Scholar
  39. Grissa I, Vergnaud G, Pourcel C (2007) CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52–W57CrossRefGoogle Scholar
  40. Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol 5:R245–R249CrossRefGoogle Scholar
  41. Haque MM, Bose T, Dutta A, Reddy CV, Mande SS (2015) CS-SCORE: rapid identification and removal of human genome contaminants from metagenomic datasets. Genomics 106:116–121CrossRefGoogle Scholar
  42. Henle G, Henle W (1976) Epstein-Barr virus-specific IgA serum antibodies as an outstanding feature of nasopharyngeal carcinoma. Int J Cancer 17:1–7CrossRefGoogle Scholar
  43. Hoff KJ, Lingner T, Meinicke P, Tech M (2009) Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res 37:W101–W105CrossRefGoogle Scholar
  44. Huson DH, Beier S, Flade I, Górska A, El-Hadidi M et al (2016) MEGAN community edition – interactive exploration and analysis of large-scale microbiome sequencing data. PLOS Comput Biol 12:e1004957CrossRefGoogle Scholar
  45. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32:277D–280DCrossRefGoogle Scholar
  46. Kim D, Hofstaedter CE, Zhao C, Mattei L, Tanes C et al (2017) Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5:52CrossRefGoogle Scholar
  47. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW et al (2008) Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res 36:2230–2239CrossRefGoogle Scholar
  48. Krebs C (2014) Species diversity measures. In: Ecological methodology. Addison-Wesley Educational Publishers, Inc, BostonGoogle Scholar
  49. Kristiansson E, Hugenholtz P, Dalevi D (2009) ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics 25:2737–2738CrossRefGoogle Scholar
  50. Kultima JR, Sunagawa S, Li J, Chen W, Chen H et al (2012) MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS ONE 7:e47656CrossRefGoogle Scholar
  51. Lasken RS (2009) Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochem Soc Trans 37:450–453CrossRefGoogle Scholar
  52. Leung HCM, Yiu SM, Yang B, Peng Y, Wang Y et al (2011) A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27:1489–1495CrossRefGoogle Scholar
  53. Leung SF, Chan KC, Ma BB, Hui EP, Mo F et al (2014) Plasma Epstein-Barr viral DNA load at midpoint of radiotherapy course predicts outcome in advanced-stage nasopharyngeal carcinoma. Ann Oncol 25:1204–1208CrossRefGoogle Scholar
  54. Liu B, Pop M (2011) MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets. BMC Proc 5:S9CrossRefGoogle Scholar
  55. Liu B, Gibbons T, Ghodsi M, Pop M (2010) MetaPhyler: taxonomic profiling for metagenomic sequences. In: 2010 I.E. international conference on Bioinformatics and Biomedicine (BIBM). IEEE, Hong Kong, pp 95–100CrossRefGoogle Scholar
  56. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964CrossRefGoogle Scholar
  57. Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R (2012) Diversity, stability and resilience of the human gut microbiota. Nature 489:220–230CrossRefGoogle Scholar
  58. Luo C, Rodriguez-R LM, Konstantinidis KT (2013) A user’s guide to quantitative and comparative analysis of metagenomic datasets. Methods Enzymol 531:525–547CrossRefGoogle Scholar
  59. Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K et al (2007) IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 36:D534–D538CrossRefGoogle Scholar
  60. Markowitz VM, Mavromatis K, Ivanova NN, Chen I-MA, Chu K, Kyrpides NC (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25:2271–2278CrossRefGoogle Scholar
  61. McHardy AC, Martín HG, Tsirigos A, Hugenholtz P, Rigoutsos I (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72CrossRefGoogle Scholar
  62. Muller J, Szklarczyk D, Julien P, Letunic I, Roth A et al (2010) eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res 38:D190–D195CrossRefGoogle Scholar
  63. Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40:e155–e155CrossRefGoogle Scholar
  64. Noguchi H, Taniguchi T, Itoh T (2008) MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15:387–396CrossRefGoogle Scholar
  65. Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27:i94–i101CrossRefGoogle Scholar
  66. Pride DT, Meinersmann RJ, Wassenaar TM, Blaser MJ (2003) Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res 13:145–158CrossRefGoogle Scholar
  67. Prosser JI (2010) Replicate or lie. Environ Microbiol 12:1806–1810CrossRefGoogle Scholar
  68. Qin J, Li Y, Cai Z, Li S, Zhu J et al (2012) A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490:55–60CrossRefGoogle Scholar
  69. Raes J, Korbel JO, Lercher MJ, von Mering C, Bork P (2007) Prediction of effective genome size in metagenomic samples. Genome Biol 8:R10CrossRefGoogle Scholar
  70. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191–e191CrossRefGoogle Scholar
  71. Rutayisire E, Huang K, Liu Y, Tao F (2016) The mode of delivery affects the diversity and colonization pattern of the gut microbiota during the first year of infants’ life: a systematic review. BMC Gastroenterol 16:86CrossRefGoogle Scholar
  72. Scarpellini E, Ianiro G, Attili F, Bassanelli C, De Santis A, Gasbarrini A (2015) The human gut microbiota and virome: Potential therapeutic implications. Dig Liver Dis 47:1007–1012CrossRefGoogle Scholar
  73. Schouls LM, Schot CS, Jacobs JA (2003) Horizontal transfer of segments of the 16S rRNA genes between species of the Streptococcus anginosus group. J Bacteriol 185:7241–7246CrossRefGoogle Scholar
  74. Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M et al (2007) TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35:D260–D264CrossRefGoogle Scholar
  75. Shannon CE (1948) A mathematical theory of communication, Part I. Bell Syst Tech J 27:379–423. CrossRefGoogle Scholar
  76. Simpson EH (1949) Measurement of diversity. Nature 163:688CrossRefGoogle Scholar
  77. Singleton DR, Richardson SD, Aitken MD (2011) Pyrosequence analysis of bacterial communities in aerobic bioreactors treating polycyclic aromatic hydrocarbon-contaminated soil. Biodegradation 22:1061–1073CrossRefGoogle Scholar
  78. Su X, Pan W, Song B, Xu J, Ning K (2014) Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization. PLoS ONE 9:e89323CrossRefGoogle Scholar
  79. Sun S, Chen J, Li W, Altintas I, Lin A et al (2011) Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res 39:D546–D551CrossRefGoogle Scholar
  80. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinform 4:41CrossRefGoogle Scholar
  81. Teeling H, Glockner FO (2012) Current opportunities and challenges in microbial metagenome analysis – a bioinformatic perspective. Brief Bioinform 13:728–742CrossRefGoogle Scholar
  82. Thomas T, Gilbert J, Meyer F (2012) Metagenomics – a guide from sampling to data analysis. Microb Inf Exp 2:3CrossRefGoogle Scholar
  83. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI (2006) An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027–1131CrossRefGoogle Scholar
  84. Urbaniak C, Gloor GB, Brackstone M, Scott L, Tangney M, Reid G (2016) The Microbiota of Breast Tissue and Its Association with Breast Cancer. Appl Environ Microbiol 82:5039–5048CrossRefGoogle Scholar
  85. von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T et al (2007) Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 315:1126–1130CrossRefGoogle Scholar
  86. Walsh DA, Bapteste E, Kamekura M, Doolittle WF (2004) Evolution of the RNA polymerase B′ subunit gene (rpoB′) in Halobacteriales: a complementary molecular marker to the SSU rRNA gene. Mol Biol Evol 21:2340–2351CrossRefGoogle Scholar
  87. Weymann D, Laskin J, Roscoe R, Schrader KA, Chia S, Yip S, Cheung WY, Gelmon KA, Karsan A, Renouf DJ, Marra M, Regier DA (2017) The cost and cost trajectory of whole-genome analysis guiding treatment of patients with advanced cancers. Mol Genet Genomic Med 5:251–260CrossRefGoogle Scholar
  88. Weyrich LS, Dixit S, Farrer AG, Cooper AJ, Cooper AJ (2015) The skin microbiome: associations between altered microbial communities and disease. Aust J Dermatol 56:268–274CrossRefGoogle Scholar
  89. White JR, Nagarajan N, Pop M (2009) Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5:e1000352CrossRefGoogle Scholar
  90. Williams HR, Lin TY (1971) Methyl- 14 C-glycinated hemoglobin as a substrate for proteases. Biochim Biophys Acta 250:603–607CrossRefGoogle Scholar
  91. Winer RL, Hughes JP, Feng Q, O’Reilly S, Kiviat NB et al (2006) Condom use and the risk of genital human papillomavirus infection in young women. N Engl J Med 354:2645–2654CrossRefGoogle Scholar
  92. Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6:e1000667CrossRefGoogle Scholar
  93. Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9:R151CrossRefGoogle Scholar
  94. Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL et al (2006) Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. PLoS Biol 4:e188CrossRefGoogle Scholar
  95. Wu H, Esteve E, Tremaroli V, Khan MT, Caesar R et al (2017) Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug. Nat Med 23:850–858CrossRefGoogle Scholar
  96. Ye Y, Doak TG (2009) A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 5:e1000465CrossRefGoogle Scholar
  97. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ et al (2007) The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 5:e16CrossRefGoogle Scholar
  98. Zheng H, Wu H (2010) Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol 8:995–1011CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Leiden University Medical Center, Leiden UniversityLeidenThe Netherlands
  2. 2.Lester & Sue Smith Breast Center & Department of MedicineBaylor College of MedicineHoustonUSA

Personalised recommendations