Bioinformatics for Genomes and Metagenomes in Ecology Studies

  • Douglas B. Rusch
  • Jason Miller
  • Konstantinos Krampis
  • Andrey Tovchigrechko
  • Granger Sutton
  • Shibu Yooseph
  • Karen E. Nelson
Part of the Advanced Topics in Science and Technology in China book series (ATSTC)


Major technological developments in the field of microbial ecology are redefining the science, moving the focus of research away from studies of individual isolates and species that are studied under carefully controlled conditions in the laboratory, towards the study of entire communities of organisms in their natural environments. Ever more efficient sequencing technologies mean that we can generate huge volumes of sequence data — shifting the cost burden from sequence generation to sequence analysis. The bioinformatic techniques for managing and analyzing both the new types of data and the vastly increased volumes of data are transforming our understanding of life and its interdependencies. These data sets, in conjunction with bioinformatics are enhancing our understanding of microbial diversity and microbial ecology in many different environments. In this chapter, we provide an overview of some of the genomic, metagenomic and informatics approaches currently being used and or being developed for the study of microbial diversity and ecology.


Environmental Microbiology Cloud Service Provider Cluster Regularly Interspaced Short Palindromic Repeat Human Microbiome Project Assembly Software 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Metzker M L. Sequencing technologies — the next generation. Nature Reviews Genetics, 2009, 11: 31–46.CrossRefPubMedGoogle Scholar
  2. [2]
    Mylvaganam S, Dennis P P. Sequence heterogeneity between the two genes encoding 16S rRNA from the halophilic archaebacterium Haloarcula marismortui. Genetics, 1992, 130: 399–410.PubMedCentralPubMedGoogle Scholar
  3. [3]
    López-López A, Benlloch S, Bonfá M, et al. Intragenomic 16S rDNA divergence in Haloarcula marismortui is an adaptation to different temperatures. Journal of molecular evolution, 2007, 65: 687–696.CrossRefPubMedGoogle Scholar
  4. [4]
    Pei A Y, Oberdorf W E, Nossa C W, et al. Diversity of 16S rRNA genes within individual prokaryotic genomes. Applied and environmental microbiology, 2010,76: 3886–3897.PubMedCentralCrossRefPubMedGoogle Scholar
  5. [5]
    Ray A E, Connon S A, Sheridan P P, et al. Intragenomic heterogeneity of the 16S rRNA gene in strain UFO1 caused by a 100 - bp insertion in helix 6. FEMS microbiology ecology, 2010, 72: 343–353.CrossRefPubMedGoogle Scholar
  6. [6]
    Unno T, Jang J, Han D, et al. Use of barcoded pyrosequencing and shared OTUs to determine sources of fecal bacteria in watersheds. Environmental science & technology, 2010, 44: 7777–7782.CrossRefGoogle Scholar
  7. [7]
    Thompson F L, Bruce T, Gonzalez A, et al. Coastal bacterioplankton community diversity along a latitudinal gradient in Latin America by means of V6 tag pyrosequencing. Arch Microbiol, 2011, 193: 105–114.CrossRefPubMedGoogle Scholar
  8. [8]
    Whittaker R H. Evolution and measurement of species diversity. Taxon, 1972, 21: 213–251.CrossRefGoogle Scholar
  9. [9]
    DeSantis T Z, Hugenholtz P, Larsen N, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and environmental microbiology, 2006, 72: 5069–5072.PubMedCentralCrossRefPubMedGoogle Scholar
  10. [10]
    Pruesse E, Quast C, Knittel K, et al. SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic acids research, 2007, 35: 7188–7196.PubMedCentralCrossRefPubMedGoogle Scholar
  11. [11]
    Cole J, Wang Q, Cardenas E, et al. The ribosomal database project: Improved alignments and new tools for rRNA analysis. Nucleic acids research, 2009, 37: D141–D145.CrossRefGoogle Scholar
  12. [12]
    Wu D, Hartman A, Ward N, et al. An automated phylogenetic tree-based small subunit rRNA taxonomy and alignment pipeline (STAP). P1oS one, 2008, 3: e2566.Google Scholar
  13. [13]
    Bond P L, Hugenholtz P, Keller J, et al. Bacterial community structures of phosphate-removing and non-phosphate-removing activated sludges from sequencing batch reactors. Applied and Environmental Microbiology, 1995, 61: 1910–1916.PubMedCentralPubMedGoogle Scholar
  14. [14]
    McCaig A E, Glover L A, Prosser J I. Molecular analysis of bacterial community structure and diversity in unimproved and improved upland grass pastures. Applied and Environmental Microbiology, 1999, 65: 1721–1730.PubMedCentralPubMedGoogle Scholar
  15. [15]
    Schloss P D, Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Applied and environmental microbiology, 2005, 71: 1501–1506.PubMedCentralCrossRefPubMedGoogle Scholar
  16. [16]
    Shuldiner A R, Nirula A, Roth J. Hybrid DNA artifact from PCR of closely related target sequences. Nucleic acids research, 1989, 17: 4409.PubMedCentralCrossRefPubMedGoogle Scholar
  17. [17]
    Hugenholtz P, Huber T. Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. International Journal of Systematic and Evolutionary Microbiology, 2003, 53: 289–293.CrossRefPubMedGoogle Scholar
  18. [18]
    Ashelford K E, Chuzhanova N A, Fry J C, et al. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Applied and Environmental Microbiology, 2005, 71: 7724–7736.PubMedCentralCrossRefPubMedGoogle Scholar
  19. [19]
    Komatsoulis G A, Waterman M S. A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations. Applied and Environmental Microbiology, 1997, 63: 2338–2346.PubMedCentralPubMedGoogle Scholar
  20. [20]
    Huber T, Faulkner G, Hugenholtz P. Bellerophon: A program to detect chimeric sequences in multiple sequence alignments. Bioinformatics, 2004, 20: 2317–2319.CrossRefPubMedGoogle Scholar
  21. [21]
    Schloss P D, Westcott S L, Ryabin T, et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology, 2009, 75: 7537–7541.PubMedCentralCrossRefPubMedGoogle Scholar
  22. [22]
    Sogin M L, Morrison H G, Huber J A, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences, 2006, 103: 12115–12120.CrossRefGoogle Scholar
  23. [23]
    Hamady M, Knight R. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome research, 2009, 19: 1141–1152.PubMedCentralCrossRefPubMedGoogle Scholar
  24. [24]
    Turnbaugh P J, Hamady M, Yatsunenko T, et al. A core gut microbiome in obese and lean twins. Nature, 2008, 457: 480–484.PubMedCentralCrossRefPubMedGoogle Scholar
  25. [25]
    Caporaso J G, Lauber C L, Walters W A, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, 2011, 108: 4516–4522.CrossRefGoogle Scholar
  26. [26]
    Liu Z, Lozupone C, Hamady M, et al. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic acids research, 2007, 35: e120.Google Scholar
  27. [27]
    Reeder J, Knight R. The “rare biosphere”: A reality check. Nature Methods, 2009, 6: 636–637.CrossRefPubMedGoogle Scholar
  28. [28]
    Kunin V, Engelbrektson A, Ochman H, et al. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental microbiology, 2010, 12: 118–123.CrossRefPubMedGoogle Scholar
  29. [29]
    Quince C, Lanzén A, Curtis T P, et al. Accurate determination of microbial diversity from 454 pyrosequencing data. Nature methods, 2009, 6: 639–641.CrossRefPubMedGoogle Scholar
  30. [30]
    Huse S M, Welch D M, Morrison H G, et al. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology, 2010, 12: 1889–1898.PubMedCentralCrossRefPubMedGoogle Scholar
  31. [31]
    Magurran A E. Ecological diversity and its measurement. Princeton: Princeton university press, 1988.CrossRefGoogle Scholar
  32. [32]
    Caporaso J G, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nature methods, 2010, 7: 335–336.PubMedCentralCrossRefPubMedGoogle Scholar
  33. [33]
    Venter J C, Remington K, Heidelberg J F, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science, 2004, 304: 66–74.CrossRefPubMedGoogle Scholar
  34. [34]
    Rusch D B, Halpern A L, Sutton G, et al. The Sorcerer II global ocean sampling expedition: Northwest Atlantic through eastern tropical Pacific. PLoS biology, 2007, 5: e77.Google Scholar
  35. [35]
    Yooseph S, Sutton G, Rusch D B, et al. The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS biology, 2007, 5: e16. CrossRefGoogle Scholar
  36. [36]
    Sharon I, Alperovitch A, Rohwer F, et al. Photosystem I gene cassettes are present in marine virus genomes. Nature, 2009, 461: 258–262.PubMedCentralCrossRefPubMedGoogle Scholar
  37. [37]
    Comeau A M, Arbiol C, Krisch H. Gene network visualization and quantitative synteny analysis of more than 300 marine T4-like phage scaffolds from the GOS metagenome. Molecular biology and evolution, 2010, 27: 1935–1944.PubMedCentralCrossRefPubMedGoogle Scholar
  38. [38]
    Sorokin V A, Gelfand M S, Artamonova II. Evolutionary dynamics of clustered irregularly interspaced short palindromic repeat systems in the ocean metagenome. Applied and environmental microbiology, 2010, 76: 2136–2144.PubMedCentralCrossRefPubMedGoogle Scholar
  39. [39]
    Peterson J, Garges S, Giovanni M, et al. The NIH human microbiome project. Genome research, 2009, 19: 2317–2323.PubMedCentralCrossRefPubMedGoogle Scholar
  40. [40]
    Yeoman C J, Yildirim S, Thomas S M, et al. Comparative genomics of Gardnerella vaginalis strains reveals substantial differences in metabolic and virulence potential. PLoS One, 2010, 5: e12411.Google Scholar
  41. [41]
    Nelson K E, Weinstock G M, Highlander S K, et al. A catalog of reference genomes from the human microbiome. Science (New York, NY), 2010, 328: 994.CrossRefGoogle Scholar
  42. [42]
    Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 2010, 464: 59–65.PubMedCentralCrossRefPubMedGoogle Scholar
  43. [43]
    Brulc J M, Antonopoulos D A, Miller MEB, et al. Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proceedings of the National Academy of Sciences, 2009, 106: 1948–1953.CrossRefGoogle Scholar
  44. [44]
    Swanson K S, Dowd S E, Suchodolski J S, et al. Phylogenetic and gene-centric metagenomics of the canine intestinal microbiome reveals similarities with humans and mice. The ISME Journal, 2010, 5: 639–649.PubMedCentralCrossRefPubMedGoogle Scholar
  45. [45]
    Qu A, Brulc J M, Wilson M K, et al. Comparative metagenomics reveals host specific metavirulomes and horizontal gene transfer elements in the chicken cecum microbiome. PLoS One, 2008, 3: e2945.Google Scholar
  46. [46]
    Yildirim S, Yeoman C J, Sipos M, et al. Characterization of the fecal microbiome from non-human wild primates reveals species specific microbial communities. PLoS One, 2010, 5: e13963.Google Scholar
  47. [47]
    Allen H K, Cloud-Hansen K A, Wolinski J M, et al. Resident microbiota of the gypsy moth midgut harbors antibiotic resistance determinants. DNA and cell biology, 2009, 28: 109–117.CrossRefPubMedGoogle Scholar
  48. [48]
    Suen G, Scott J J, Aylward F O, et al. An insect herbivore microbiome with high plant biomass-degrading capacity. PLoS genetics, 2010, 6: e1001129.Google Scholar
  49. [49]
    Bishop-Lilly K A, Turell M J, Willner K M, et al. Arbovirus detection in insect vectors by rapid, high-throughput pyrosequencing. PLoS neglected tropical diseases, 2010, 4: e878.Google Scholar
  50. [50]
    Bench S R, Hanson T E, Williamson K E, et al. Metagenomic characterization of chesapeake bay virioplankton. Applied and Environ-mental Microbiology, 2007, 73: 7629–7641.CrossRefGoogle Scholar
  51. [51]
    Day J M, Ballard L L, Duke M V, et al. Metagenomic analysis of the turkey gut RNA virus community. Virol J, 2010, 7: 313.PubMedCentralCrossRefPubMedGoogle Scholar
  52. [52]
    Reyes A, Haynes M, Hanson N, et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature, 2010, 466: 334–338.PubMedCentralCrossRefPubMedGoogle Scholar
  53. [53]
    Sanger F, Coulson A R, Barrell B G, et al. Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing. J Mol Biol, 1980, 143: 161–178.CrossRefPubMedGoogle Scholar
  54. [54]
    Fleischmann R D, Adams M D, White O, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995, 269: 496–512.CrossRefPubMedGoogle Scholar
  55. [55]
    Sutton G G, White O, Adams M D, et al. TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1995, 1: 9–19.CrossRefGoogle Scholar
  56. [56]
    Adams M D, Celniker S E, Holt R A, et al. The genome sequence of Drosophila melanogaster. Science, 2000, 287: 2185–2195.CrossRefPubMedGoogle Scholar
  57. [57]
    Myers E W, Sutton G G, Delcher A L, et al. A whole-genome assembly of Drosophila. Science, 2000, 287: 2196–2204.CrossRefPubMedGoogle Scholar
  58. [58]
    Istrail S, Sutton G G, Florea L, et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101: 1916–1921.PubMedCentralCrossRefPubMedGoogle Scholar
  59. [59]
    Pop M. Genome assembly reborn: Recent computational challenges. Briefings in bioinformatics, 2009, 10: 354–366.PubMedCentralCrossRefPubMedGoogle Scholar
  60. [60]
    Miller J R, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics, 2010, 95: 315.PubMedCentralCrossRefPubMedGoogle Scholar
  61. [61]
    Miller J R, Delcher A L, Koren S, et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 2008, 24: 2818–2824.PubMedCentralCrossRefPubMedGoogle Scholar
  62. [62]
    Niu B, Fu L, Sun S, et al. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC bioinformatics, 2010, 11: 187.PubMedCentralCrossRefPubMedGoogle Scholar
  63. [63]
    Teal T K, Schmidt T M. Identifying and removing artificial replicates from 454 pyrosequencing data. Cold Spring Harbor Protocols, 2010, 2010: prot5409.CrossRefGoogle Scholar
  64. [64]
    Rusch D B, Martiny A C, Dupont C L, et al. Characterization of Prochlorococcus clades from iron-depleted oceanic regions. Proc Natl Acad Sci USA, 2010, 107: 16184–16189.PubMedCentralCrossRefPubMedGoogle Scholar
  65. [65]
    Woyke T, Tighe D, Mavromatis K, et al. One bacterial cell, one complete genome. PLoS One, 2010, 5: e10314.Google Scholar
  66. [66]
    McHardy A C, Martin H G, Tsirigos A, et al. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods, 2007, 4: 63–72.CrossRefPubMedGoogle Scholar
  67. [67]
    Brady A, Salzberg S L. Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models. Nat Methods, 2009, 6: 673–676.PubMedCentralCrossRefPubMedGoogle Scholar
  68. [68]
    Lucks J B, Nelson D R, Kudla G R, et al. Genome landscapes and bacteriophage codon usage. PLoS Comput Biol, 2008, 4: e1000001.Google Scholar
  69. [69]
    Haft D H, Selengut J, Mongodin E F, et al. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS computational biology, 2005, 1: e60.Google Scholar
  70. [70]
    Barrangou R, Fremaux C, Deveau H, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science, 2007, 315: 1709–1712.CrossRefPubMedGoogle Scholar
  71. [71]
    Yooseph S, Nealson K H, Rusch D B, et al. Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature, 2010, 468: 60–66.CrossRefPubMedGoogle Scholar
  72. [72]
    Camacho C, Coulouris G, Avagyan V, et al. BLAST+: Architecture and applications. BMC Bioinformatics, 2009, 10: 421.PubMedCentralCrossRefPubMedGoogle Scholar
  73. [73]
    Wooley J C, Godzik A, Friedberg I. A primer on metagenomics. PLoS computational biology, 2010, 6: e1000667.Google Scholar
  74. [74]
    Piganeau G, Moreau H. Screening the Sargasso Sea metagenome for data to investigate genome evolution in Ostreococcus (Prasinophyceae, Chlorophyta). Gene, 2007, 406: 184–190.CrossRefPubMedGoogle Scholar
  75. [75]
    Piganeau G, Desdevises Y, Derelle E, et al. Picoeukaryotic sequences in the Sargasso sea metagenome. Genome Biol, 2008, 9: R5.CrossRefGoogle Scholar
  76. [76]
    Johnson M, Zaretskaya I, Raytselis Y, et al. NCBI BLAST: A better web interface. Nucleic Acids Res, 2008, 36: W5-W9.CrossRefGoogle Scholar
  77. [77]
    Sansom C. Up in a cloud? Nat Biotechnol, 2010, 28: 13–15.CrossRefPubMedGoogle Scholar
  78. [78]
    Lasken R. Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochemical Society Transactions, 2009, 37: 450.CrossRefPubMedGoogle Scholar

Copyright information

© Zhejiang University Press, Hangzhou and Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Douglas B. Rusch
    • 1
  • Jason Miller
    • 1
  • Konstantinos Krampis
    • 1
  • Andrey Tovchigrechko
    • 1
  • Granger Sutton
    • 1
  • Shibu Yooseph
    • 1
  • Karen E. Nelson
    • 1
  1. 1.J. Craig Venter InstituteRockvilleUSA

Personalised recommendations