Skip to main content

MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function

  • Protocol
  • First Online:
Microbial Environmental Genomics (MEG)

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1399))

Abstract

Approaches in molecular biology, particularly those that deal with high-throughput sequencing of entire microbial communities (the field of metagenomics), are rapidly advancing our understanding of the composition and functional content of microbial communities involved in climate change, environmental pollution, human health, biotechnology, etc. Metagenomics provides researchers with the most complete picture of the taxonomic (i.e., what organisms are there) and functional (i.e., what are those organisms doing) composition of natively sampled microbial communities, making it possible to perform investigations that include organisms that were previously intractable to laboratory-controlled culturing; currently, these constitute the vast majority of all microbes on the planet. All organisms contained in environmental samples are sequenced in a culture-independent manner, most often with 16S ribosomal amplicon methods to investigate the taxonomic or whole-genome shotgun-based methods to investigate the functional content of sampled communities. Metagenomics allows researchers to characterize the community composition and functional content of microbial communities, but it cannot show which functional processes are active; however, near parallel developments in transcriptomics promise a dramatic increase in our knowledge in this area as well.

Since 2008, MG-RAST (Meyer et al., BMC Bioinformatics 9:386, 2008) has served as a public resource for annotation and analysis of metagenomic sequence data, providing a repository that currently houses more than 150,000 data sets (containing 60+ tera-base-pairs) with more than 23,000 publically available. MG-RAST, or the metagenomics RAST (rapid annotation using subsystems technology) server makes it possible for users to upload raw metagenomic sequence data in (preferably) fastq or fasta format. Assessments of sequence quality, annotation with respect to multiple reference databases, are performed automatically with minimal input from the user (see Subheading 4 at the end of this chapter for more details). Post-annotation analysis and visualization are also possible, directly through the web interface, or with tools like matR (metagenomic analysis tools for R, covered later in this chapter) that utilize the MG-RAST API (http://api.metagenomics.anl.gov/api.html) to easily download data from any stage in the MG-RAST processing pipeline. Over the years, MG-RAST has undergone substantial revisions to keep pace with the dramatic growth in the number, size, and types of sequence data that accompany constantly evolving developments in metagenomics and related -omic sciences (e.g., metatranscriptomics).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wilkening J, Wilke A, Desai N et al (2009) Using clouds for metagenomics. A case study. In: IEEE Cluster, 2009

    Google Scholar 

  2. Angiuoli S, Matalka M, Gussman A et al (2011) Clovr, a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12:356

    Article  PubMed  PubMed Central  Google Scholar 

  3. Meyer F, Paarmann D, D’Souza M et al (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  4. Field D, Amaral-Zettler L, Cochrane G et al (2011) The genomic standards consortium. PLoS Biol 9:e1001088

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  5. Wilke A, Harrison T, Wilkening J et al (2012) The m5nr, a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC Bioinformatics 13:141

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  6. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  PubMed  CAS  Google Scholar 

  7. Kent WJ (2002) Blat—the blast-like alignment tool. Genome Res 12:656–664

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  8. Brooksbank C, Bergman MT, Apweiler R et al (2014) The European Bioinformatics Institute’s data resources 2014. Nucleic Acids Res 42(Database issue):D18–D25

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  9. Reference Genome Group of the Gene Ontology Consortium (2009) The Gene Ontology’s Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol 5:e1000431

    Article  Google Scholar 

  10. Markowitz VM, Ivanova NN, Szeto E et al (2008) IMG/M, a data management and analysis system for metagenomes. Nucleic Acids Res 36(Database issue):D534–D538

    PubMed  CAS  PubMed Central  Google Scholar 

  11. Kanehisa M (2002) The KEGG database. Novartis Found Symp 247:91–101

    Article  PubMed  CAS  Google Scholar 

  12. Benson DA, Cavanaugh M, Clark K (2013) Genbank. Nucleic Acids Res 41(Database issue):D36–D42

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  13. Dwivedi B, Schmieder R, Goldsmith DB et al (2012) PhiSiGns: an online tool to identify signature genes in phages and design PCR primers for examining phage diversity. BMC Bioinformatics 13:37

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  14. Overbeek R, Begley T, Butler RM et al (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33:5691–5702

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  15. Magrane M, Uniprot Consortium (2011) UniProt knowledgebase: a hub of integrated protein data. Database (Oxford). doi:10.1093/database/bar009

    Google Scholar 

  16. Snyder EE, Kampanya N, Lu J et al (2007) PATRIC: the VBI PathoSystems resource integration center. Nucleic Acids Res 35(Database issue):D401–D406

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  17. Jensen LJ, Julien P, Kuhn M et al (2008) Eggnog: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 36(Database issue):D250–D254

    PubMed  CAS  PubMed Central  Google Scholar 

  18. Tang W, Wilkening J, Desai N, Gerlach W, Wilke A, Meyer F (2013) A scalable data analysis platform for metagenomics. Proceedings of the 2013 International Conference on Big Data

    Google Scholar 

  19. Bischof, J., Wilke, A., Gerlach, W., Harrison, T., Paczian, T., Tang, W., Trimble, W., Wilkening, J., Desai, N. and Meyer, F. (2015), Shock: Active Storage for Multicloud Streaming Data Analysis, 2nd IEEE/ACM International Symposium on Big Data Computing, Limassol, Cyprus, 2015

    Google Scholar 

  20. Cox MP, Peterson DA, Biggs PJ (2010) Solexaqa: at-a-glance quality assessment of illumina second-generation sequencing data. BMC Bioinformatics 11:485

    Article  PubMed  PubMed Central  Google Scholar 

  21. Huse SM, Huber JA, Morrison HG et al (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8:R143

    Article  PubMed  PubMed Central  Google Scholar 

  22. Gomez-Alvarez V, Teal TK, Schmidt TM (2009) Systematic artifacts in metagenomes from complex microbial communities. ISME J 3:1314–1317

    Article  PubMed  Google Scholar 

  23. Keegan KP, Trimble WL, Wilkening J et al (2012) A platform-independent method for detecting errors in metagenomic sequencing data, Drisee. PLoS Comput Biol 8:e1002541

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  24. Langmead B, Trapnell C, Pop M et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  PubMed Central  Google Scholar 

  25. Trimble WL, Keegan KP, D’Souza M et al (2012) Short-read reading-frame predictors are not created equal, sequence error causes loss of signal. BMC Bioinformatics 13:183

    Article  PubMed  PubMed Central  Google Scholar 

  26. Rho M, Tang H, Ye Y (2009) Fraggenescan, Predicting genes in short and error prone reads. Nucleic Acids Res 38:e191

    Article  Google Scholar 

  27. Edgar RC (2010) Search and clustering orders of magnitude faster than blast. Bioinformatics 26:2460–2461

    Article  PubMed  CAS  Google Scholar 

  28. Caporaso JG, Kuczynski J, Stombaugh J et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  29. Huson DH, Auch AF, Qi J et al (2007) Megan analysis of metagenomic data. Genome Res 17:377–386

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  30. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75. doi:10.1186/1471-2164-9-75

    Article  PubMed  PubMed Central  Google Scholar 

  31. Pruesse E, Quast C, Knittel K et al (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  32. DeSantis TZ, Hugenholtz P, Larsen N et al (2006) Greengenes: a Chimera-Checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  33. Cole JR, Chai B, Marsh TL et al (2003) The ribosomal database project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res 31:442–443

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  34. Yilmaz P, Kottmann R, Field D et al (2011) Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 29:415–420

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  35. Bolotin A, Quinquis B, Sorokin A et al (2005) Clustered regularly interspaced short palindrome repeats (CRISPRS) have spacers of extrachromosomal origin. Microbiology 151:2551–2561

    Article  PubMed  CAS  Google Scholar 

  36. Reeder J, Knight R (2009) The ‘rare biosphere’, a reality check. Nat Methods 6:636–637

    Article  PubMed  CAS  Google Scholar 

  37. Ondov BD, Bergman NH, Phillippy AM (2011) Interactive metagenomic visualization in a web browser. BMC Bioinformatics 12:385

    Article  PubMed  PubMed Central  Google Scholar 

  38. Gerlach, W., Tang, W., Keegan, K., Harrison, T., Wilke, A., Bischof, J., D’Souza, M., Devoid, S., Murphy-Olson, D., and Desai, N. (2014) Skyport – Container-based execution environment management for multi-cloud scientific workflows. In Proc. 5th Int’l Workshop on Data-Intensive Computing in the Clouds. IEEE Press, pp. 25–32

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Folker Meyer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Keegan, K.P., Glass, E.M., Meyer, F. (2016). MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function. In: Martin, F., Uroz, S. (eds) Microbial Environmental Genomics (MEG). Methods in Molecular Biology, vol 1399. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3369-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3369-3_13

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3367-9

  • Online ISBN: 978-1-4939-3369-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics