Skip to main content

Metagenome Assembly and Contig Assignment

  • Protocol
  • First Online:
Microbiome Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1849))

Abstract

The recent development of metagenomic assembly has revolutionized metagenomic data analysis, thanks to the improvement of sequencing techniques, more powerful computational infrastructure and the development of novel algorithms and methods. Using longer assembled contigs rather than raw reads improves the process of metagenomic binning and annotation significantly, ultimately resulting in a deeper understanding of the microbial dynamics of the metagenomic samples being analyzed. In this chapter, we demonstrate a typical metagenomic analysis pipeline including raw read quality evaluation and trimming, assembly and contig binning. Alternative tools that can be used for each step are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pell J, Hintze A, Canino-Koning R et al (2012) Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc Natl Acad Sci U S A 109:13272–13277. https://doi.org/10.1073/pnas.1121464109

    Article  PubMed  PubMed Central  Google Scholar 

  2. Sangwan N, Xia F, Gilbert JA (2016) Recovering complete and draft population genomes from metagenome datasets. Microbiome 4:8. https://doi.org/10.1186/s40168-016-0154-5

    Article  PubMed  PubMed Central  Google Scholar 

  3. Kang DD, Froula J, Egan R, Wang Z (2014) MetaBAT: Metagenome binning based on abundance and tetranucleotide frequency. No. LBNL-7106E. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA

    Google Scholar 

  4. Alneberg J, Bjarnason BS, de Bruijn I, et al (2014) Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146. doi: https://doi.org/10.1038/nmeth.3103

    Article  CAS  PubMed  Google Scholar 

  5. Sieber CMK, Probst AJ, Sharrar A et al (2017) Recovery of genomes from metagenomes via a dereplication, aggregation, and scoring strategy. bioRxiv:107789

    Google Scholar 

  6. Vollmers J, Wiegand S, Kaster AK (2017) Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective—not only size matters! PLoS One 12:e0169662

    Article  PubMed  PubMed Central  Google Scholar 

  7. Sczyrba A, Hofmann P, Belmann P et al (2017) Critical Assessment of Metagenome Interpretation—a benchmark of computational metagenomics software. bioRxiv:99127. https://doi.org/10.1101/099127

  8. Awad S, Irber L, Brown CT (2017) Evaluating metagenome assembly on a simple defined community with many strain variants. bioRxiv:155358

    Google Scholar 

  9. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. https://doi.org/10.1093/bioinformatics/btv033

    Article  CAS  PubMed  Google Scholar 

  11. Li R, Zhu H, Ruan J et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272. https://doi.org/10.1101/gr.097261.109

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Luo R, Liu B, Xie Y et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18. https://doi.org/10.1186/2047-217X-1-18

    Article  PubMed  PubMed Central  Google Scholar 

  13. Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090. https://doi.org/10.1093/bioinformatics/btv697

    Article  CAS  PubMed  Google Scholar 

  14. Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. https://doi.org/10.7717/peerj.1165

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. https://doi.org/10.1101/GR.186072.114 gr.186072.114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetJ 17:10. https://doi.org/10.14806/ej.17.1.200

    Article  Google Scholar 

  19. Zhang Q, Awad S, Brown CT (2015) Crossing the streams: a framework for streaming analysis of short DNA sequencing reads. PeerJ Preprints. https://doi.org/10.7287/peerj.preprints.890v1

  20. Crusoe MR, Alameldin HF, Awad S et al (2015) The khmer software package: enabling efficient nucleotide sequence analysis. F1000Res 4:900. https://doi.org/10.12688/f1000research.6924.1

    Article  PubMed  PubMed Central  Google Scholar 

  21. Zhang Q, Pell J, Canino-Koning R et al (2014) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One 9:e101271. https://doi.org/10.1371/journal.pone.0101271

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048. https://doi.org/10.1093/bioinformatics/btw354

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. https://doi.org/10.1101/gr.074492.107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. https://doi.org/10.1089/cmb.2012.0021

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40:e155–e155. https://doi.org/10.1093/nar/gks678

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834. https://doi.org/10.1101/gr.213959.116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428. https://doi.org/10.1093/bioinformatics/bts174

    Article  CAS  PubMed  Google Scholar 

  28. Boisvert S, Raymond F, Godzaridis É et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122. https://doi.org/10.1186/gb-2012-13-12-r122

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Brown CT, Howe A, Zhang Q et al (2012) A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv preprint arXiv 1203:4802

    Google Scholar 

  30. Howe AC, Jansson JK, Malfatti SA et al (2014) Tackling soil diversity with the assembly of large, complex metagenomes. Proc Natl Acad Sci U S A 111:4904–4909. https://doi.org/10.1073/pnas.1402564111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. https://doi.org/10.1186/gb-2014-15-3-r46

    Article  PubMed  PubMed Central  Google Scholar 

  32. Gregor I, Dröge J, Schirmer M et al (2016) PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 4:e1603. https://doi.org/10.7717/peerj.1603

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Dröge J, Gregor I, McHardy AC (2015) Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31:817–824. https://doi.org/10.1093/bioinformatics/btu745

    Article  CAS  PubMed  Google Scholar 

  34. Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386. https://doi.org/10.1101/gr.5969107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Markowitz VM, Chen IMA, Chu K et al (2013) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 42(D1):D568–D573

    Article  PubMed  PubMed Central  Google Scholar 

  36. Wilke A, Bischof J, Gerlach W et al (2015) The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res. https://doi.org/10.1093/nar/gkv1322

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wu Y, Simmons BA, Singer SW (2015) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics:1–2. https://doi.org/10.1093/bioinformatics/btv638

    Article  PubMed  Google Scholar 

  38. Imelfort M, Parks D, Woodcroft BJ et al (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2:e603. https://doi.org/10.7717/peerj.603

    Article  PubMed  PubMed Central  Google Scholar 

  39. Lin H-H, Liao Y-C (2016) Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep. https://doi.org/10.1038/srep24175

  40. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM arXiv Preprint arXiv:1303.3997

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Zhang, Q. (2018). Metagenome Assembly and Contig Assignment. In: Beiko, R., Hsiao, W., Parkinson, J. (eds) Microbiome Analysis. Methods in Molecular Biology, vol 1849. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8728-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8728-3_12

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8726-9

  • Online ISBN: 978-1-4939-8728-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics