Advertisement

Applied Microbiology and Biotechnology

, Volume 103, Issue 1, pp 69–82 | Cite as

Bioinformatics tools to assess metagenomic data for applied microbiology

  • Otávio G. G. Almeida
  • Elaine C. P. De Martinis
Mini-Review

Abstract

The reduction of the price of DNA sequencing has resulted in the emergence of large data sets to handle and analyze, especially in microbial ecosystems, which are characterized by high taxonomic and functional diversities. To assess the properties of these complex ecosystems, a conceptual background of the application of NGS technology and bioinformatics analysis to metagenomics is required. Accordingly, this article presents an overview of the evolution of knowledge of microbial ecology from traditional culture-dependent methods to culture-independent methods and the last frontier in knowledge, metagenomics. Topics that will be covered include sample preparation for NGS, starting with total DNA extraction and library preparation, followed by a brief discussion of the chemistry of NGS to help provide an understanding of which bioinformatics pipeline approach may be helpful for achieving a researcher’s goals. The importance of selecting appropriate sequencing coverage and depth parameters to obtain a suitable measure of microbial diversity is discussed. As all DNA sequencing processes produce base-calling errors that compromise data analysis, including genome assembly and microbial functional analysis, dedicated software is presented and conceptually discussed with regard to potential applications in the general microbial ecology field.

Keywords

Metagenomics NGS Applied bioinformatics Microbial diversity 

Notes

Acknowledgments

ECP De Martinis is a fellow of National Council for Scientific and Technological Development, Brazil (grant #6762/2006-4) and she is grateful for a Research Grant from São Paulo Research Foundation (FAPESP), Brazil (grant # 2017/18928-0). OGG Almeida is grateful to São Paulo Research Foundation (FAPESP), Brazil, for a Ph.D. fellowship (grant #2017/13759-6).

Funding information

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Bag S, Saha B, Mehta O, Anbumani D, Naveen K, Dayal M, Pant A, Kumar P, Saxena S, Allin KH, Hansen T, Arumugam M, Vestergaard H, Pedersen O, Pereira V, Abraham P, Tripathi R, Wadhwa N, Bhatnagar S, Prakash VG, Radha V, Anjana RM, Mohan V, Takeda K, Kurakawa T, Nair GB, Das B (2016) An improved method for high qualitymetagenomics DNA extraction from human and environmental samples. Sci Rep 6.  https://doi.org/10.1038/srep26775
  2. Boisvert S, Raymond F, Godzaridis E, Laviolette F, Corbeil J (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122.  https://doi.org/10.1186/gb-2012-13-12-r122 CrossRefPubMedPubMedCentralGoogle Scholar
  3. Buermans HPJ, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochim Biophys Acta 1842:1932–1941.  https://doi.org/10.1016/j.bbadis.2014.06.015 CrossRefPubMedGoogle Scholar
  4. Chao A, Jost L (2012) Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology 93:2533–2547.  https://doi.org/10.1890/11-1952 CrossRefPubMedGoogle Scholar
  5. Cocolin L, Mataragas M, Bourdichon F, Doulgeraki A, Pilet MF, Jagadeesan B, Rantsiou K, Phister T (2017) Next generation microbial risk assessment meta-omics: the next need for integration. Int J Food Microbiol.  https://doi.org/10.1016/j.ijfoodmicro.2017.11.008
  6. Corley SM, MacKenzie KL, Beverdam A, Roddam LF, Wilkins MR (2017) Differentially expressed genes from RNA-seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols. BMC Genomics 18:399.  https://doi.org/10.1186/s12864-017-3797-0 CrossRefPubMedPubMedCentralGoogle Scholar
  7. Escobar-Zepeda A, Léon AVP, Sanchez-Flores A (2015) The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front Genet 6.  https://doi.org/10.3389/fgene.2015.00348
  8. Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res 8(3):186–194CrossRefGoogle Scholar
  9. Felczykowska A, Krajewska A, Zielińska S, Łoś JM (2015a) Sampling, metadata, and DNA extraction- importante steps in metagenomic studies. Acta Biochim Pol.  https://doi.org/10.18388/abp.2014_916
  10. Felczykowska A, Krajewska A, Zielińska S, Łoś JM, Bloch SK, Nejman-Faleńczyk B (2015b) Metagenomics. Acta Biochim Pol.  https://doi.org/10.18388/abp.2014_917
  11. Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T, Huang X, Jovanovich SB, Nelson JR, Schloss JA, Schwartz DC, Vezenov DV (2009) The challenges of sequencing by synthesis. Nat Biotechnol 27:1013–1023.  https://doi.org/10.1038/nbt.1585 CrossRefPubMedGoogle Scholar
  12. Fullwood MJ, Wei CL, Liu ET, Ruan Y (2009) Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genomeanalyses. Genome Res.  https://doi.org/10.1101/gr.074906.107
  13. Garza DR, Dutilh BE (2015) From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems. Cell Mol Life Sci 72:4287–4308.  https://doi.org/10.1007/s00018-015-2004-1 CrossRefPubMedPubMedCentralGoogle Scholar
  14. Goodwin S, McPherson JD, McCombie R (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351.  https://doi.org/10.1038/nrg.2016.49 CrossRefPubMedGoogle Scholar
  15. Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, Ordoukhanian P (2014) Library construction for next-generation sequencing: overviews and challenges. Biotech 56:61–4, 66, 68, passim.  https://doi.org/10.2144/000114133 CrossRefGoogle Scholar
  16. Hooper SD, Dalevi D, Pati A, Mavromatis K, Ivanova NN, Kyrpides NC (2010) Estimating DNA coverage and abundance in metagenomes using a gamma approximation. Bioinformatics.  https://doi.org/10.1093/bioinformatics/btp687 CrossRefPubMedGoogle Scholar
  17. Hugenholtz P, Pace NR (1996) Identifying microbial diversity in the natural environment: a molecular phylogenetic approach. Trends Biotechnol 14:190–197.  https://doi.org/10.1016/0167-7799(96)10025-1 CrossRefPubMedGoogle Scholar
  18. Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386.  https://doi.org/10.1101/gr.5969107 CrossRefPubMedPubMedCentralGoogle Scholar
  19. Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R (2016) Megan Community edition – interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12:e1004957.  https://doi.org/10.1371/journal.pcbi.1004957 CrossRefPubMedPubMedCentralGoogle Scholar
  20. Josefsen MH, Andersen SC, Christensen J, Hoorfar J (2015) Microbial food safety: potential of DNA extraction methods for use in diagnostic metagenomics. J Microbiol Methods 114:30–34.  https://doi.org/10.1016/j.mimet.2015.04.016 CrossRefPubMedGoogle Scholar
  21. Keisam S, Romi W, Ahmed G, Jeyaram K (2016) Quantifying the biases in metagenome mining for realistic assessment of microbial ecology of naturally fermented foods. Sci Rep 6.  https://doi.org/10.1038/srep34155
  22. Li D, Liu CM, Luo R, Sadakane K, Lam TW (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676.  https://doi.org/10.1093/bioinformatics/btv033 CrossRefPubMedGoogle Scholar
  23. Lundin D, Severin I, Logue JB, Östman O, Andersson AF, Lindström ES (2012) Which sequencing depth is sufficient to describe patterns in bacterial α- and β- diversity? Environ Microbiol Rep 4:367–372.  https://doi.org/10.1111/j.1758-2229.2012.00345.x CrossRefPubMedGoogle Scholar
  24. Marchesi JR, Ravel J (2015) The vocabulary of microbiome research: a proposal. Microbiome 3:31.  https://doi.org/10.1186/s40168-015-0094-5 CrossRefPubMedPubMedCentralGoogle Scholar
  25. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17.  https://doi.org/10.14806/ej.17.1.200
  26. Marzorati M, Wittebolle L, Boon N, Daffonchio D, Verstraete W (2008) How to get more out of molecular fingerprints pratical tools to microbial ecology. Environ Microbiol 10:1571–1581.  https://doi.org/10.1111/j.1462-2920.2008.01572.x CrossRefPubMedGoogle Scholar
  27. Mayo B, Rachid CTCC, Alegría A, Leite AMO, Peixoto RS, Delgado S (2014) Impact of next generation sequencing techniques in food microbiology. Curr Genomics 15:293–309.  https://doi.org/10.2174/1389202915666140616233211 CrossRefPubMedPubMedCentralGoogle Scholar
  28. McGinn S, Gut IG (2013) DNA sequencing- spanning the generations. New Biotechnol 30:366–372.  https://doi.org/10.1016/j.nbt.2012.11.012 CrossRefGoogle Scholar
  29. Metzker ML (2010) Sequencing technologies- the next generation. Nat Rev Genet 11:31–46.  https://doi.org/10.1038/nrg2626 CrossRefPubMedGoogle Scholar
  30. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc.  https://doi.org/10.1101/pdb.prot5448
  31. Meyer F, Paarman D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodrigues A, Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The metagenomics RAST server- a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinf 9:386.  https://doi.org/10.1186/1471-2105-9-386 CrossRefGoogle Scholar
  32. Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090.  https://doi.org/10.1093/bioinformatics/btv697 CrossRefPubMedGoogle Scholar
  33. Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327.  https://doi.org/10.1016/j.ygeno.2010.03.001 CrossRefPubMedPubMedCentralGoogle Scholar
  34. Muyzer G (1999) DGGE/TGGE a method for identifying genes from natural ecosystems. Curr Opin Microbiol 2:317–322.  https://doi.org/10.1016/S1369-5274(99)80055-1 CrossRefPubMedGoogle Scholar
  35. Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res.  https://doi.org/10.1093/nar/gks678
  36. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPADES: a new versatile metagenomic assembler. Genome Res 27:824–834.  https://doi.org/10.1101/gr.213959.116 CrossRefPubMedPubMedCentralGoogle Scholar
  37. Ogram A (2000) Soil molecular microbial ecology at age 20: methodological challenges for the future. Soil Biol Biochem.  https://doi.org/10.1016/S0038-0717(00)00088-2
  38. Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I (2015) Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 9:BBI.S12462.  https://doi.org/10.4137/BBI.S12462 CrossRefGoogle Scholar
  39. Pabalan N, Jarjanazi H, Steiner TS (2014) Meta-analysis in microbiology. Indian J Med Microbiol 32:229.  https://doi.org/10.4103/0255-0857.136547 CrossRefPubMedGoogle Scholar
  40. Patel RK, Jain M (2012) NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7:e30619.  https://doi.org/10.1371/journal.pone.0030619 CrossRefPubMedPubMedCentralGoogle Scholar
  41. Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) META-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27:i94–i101.  https://doi.org/10.1093/bioinformatics/btr216 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Peng Y, Leung HCM, Yiu M, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428.  https://doi.org/10.1093/bioinformatics/bts174 CrossRefPubMedGoogle Scholar
  43. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35:833–844.  https://doi.org/10.1038/nbt.3935 CrossRefPubMedGoogle Scholar
  44. Ranjan R, Rani A, Metwally A, McGee HS, Perkins DL (2016) Analysis of the microbiome: advantages of whole genome shotgun versus 16S amplicon sequencing. Biochem Biophys Res Commun.  https://doi.org/10.1016/j.bbrc.2015.12.083
  45. Rhoades A, Au KF (2015) PacBio sequencing and its applications. Genomics, Proteomics Bioinformatics 13:278–289.  https://doi.org/10.1016/j.gpb.2015.08.002 CrossRefGoogle Scholar
  46. Rhodes J, Beale MA, Fisher MC (2014) Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq. PLoS One 9:e113501.  https://doi.org/10.1371/journal.pone.0113501 CrossRefPubMedPubMedCentralGoogle Scholar
  47. Rodriguez-R LM, Konstantinidis KT (2014a) Estimating coverage in metagenomic data sets and why it matters. ISME J.  https://doi.org/10.1038/ismej.2014.76
  48. Rodriguez-R LM, Konstantinidis KT (2014b) Nonpareil: a redundancy based approach to assess the level of coverage in metagenomic datasets. Bioinformatics 30:629–635.  https://doi.org/10.1093/bioinformatics/btt584 CrossRefPubMedGoogle Scholar
  49. Salonen A, Nikkilä J, Jalanka-Tuovinen J, Immonen O, Rajilić-Stojanović M, Kekkonen RA, Palva A, de Vos WM (2010) Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J Microbiol Methods.  https://doi.org/10.1016/j.mimet.2010.02.007
  50. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. PNAS 74(12):5463–5467CrossRefGoogle Scholar
  51. Schadt EE, Truner S, Kasarskis A (2010) A window into third-generation sequencing. Hum Mol Genet 19:R227–R240.  https://doi.org/10.1093/hmg/ddq416 CrossRefPubMedGoogle Scholar
  52. Schloss PD, Handelsman J (2003) Biotechnological prospects from metagenomics. Curr Opin Biotechnol 14(3):303–310CrossRefGoogle Scholar
  53. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Strez B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, plataform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541.  https://doi.org/10.1128/AEM.01541-09 CrossRefPubMedPubMedCentralGoogle Scholar
  54. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding Consortium (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. PNAS 109:6241–6246.  https://doi.org/10.1073/pnas.1117018109 CrossRefPubMedGoogle Scholar
  55. Scholz MB, Lo CC, Chain PSG (2012) Next generation sequencing and bioinformatics bottlenecks: the current state of metagenomic data analysis. Curr Opin Biotechnol 23:9–15.  https://doi.org/10.1016/j.copbio.2011.11.013 CrossRefPubMedGoogle Scholar
  56. Shokralla S, Spall JL, Gibson JF, Hajibabaei M (2012) Next-generation sequencing technologies for environmental DNA research. Mol Ecol 21:1794–1805.  https://doi.org/10.1111/j.1365-294X.2012.05538.x CrossRefPubMedGoogle Scholar
  57. Sims D, Sudbery I, IIott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analysis. Nat Rev Genet 15:121–132.  https://doi.org/10.1038/nrg3642 CrossRefPubMedGoogle Scholar
  58. Sinha R, Abnet CC, White O, Knight R, Huttenhower C (2015) The microbiome quality control project: baseline study design and future directions. Genome Biol 16:276.  https://doi.org/10.1186/s13059-015-0841-8 CrossRefPubMedPubMedCentralGoogle Scholar
  59. Su C, Lei L, Duan Y, Zhang KQ, Yang J (2012) Culture-independent methods for studying environmental microorganisms: methods, application, and perspective. Appl Microbiol Biotechnol 93:993–1003.  https://doi.org/10.1007/s00253-011-3800-7 CrossRefPubMedGoogle Scholar
  60. Thomas T, Gilbert J, Meyer F (2012) Metagenomics- a guide from sampling to data analysis. Microb Inform Exp 2:3.  https://doi.org/10.1186/2042-5783-2-3 CrossRefPubMedPubMedCentralGoogle Scholar
  61. Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M (2013) MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 14:R2.  https://doi.org/10.1186/gb-2013-14-1-r2 CrossRefPubMedPubMedCentralGoogle Scholar
  62. van der Walt AJ, van Goethem MW, Ramond JB, Makhalanyane TP, Reva O, Cowan DA (2017) Assembling metagenomes, one community at a time. BMC Genomics.  https://doi.org/10.1186/s12864-017-3918-9
  63. Van Djick EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426.  https://doi.org/10.1016/j.tig.2014.07.001 CrossRefGoogle Scholar
  64. Van Nieuwerburgh F, Thompson RC, Ledesma J, Deforce D, Gaasterland T, Ordoukhanian P, Head SR (2012) Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination. Nucleic Acids Res.  https://doi.org/10.1093/nar/gkr1000
  65. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522–530.  https://doi.org/10.1016/j.tibtech.2009.05.006 CrossRefPubMedGoogle Scholar
  66. Wesolowska-Andersen A, Bahl MI, Carvalho V, Kristiansen K, Sicheritz-Pontén T, Gupta R, Licht TR (2014) Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomics analysis. Microbiome 2:19.  https://doi.org/10.1186/2049-2618-2-19 CrossRefPubMedPubMedCentralGoogle Scholar
  67. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46.  https://doi.org/10.1186/gb-2014-15-3-r46 CrossRefPubMedPubMedCentralGoogle Scholar
  68. Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6:e1000667.  https://doi.org/10.1371/journal.pcbi.1000667 CrossRefPubMedPubMedCentralGoogle Scholar
  69. Xu J (2006) Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances. Mol Ecol 15:1713–1731.  https://doi.org/10.1111/j.1365-294X.2006.02882.x CrossRefPubMedGoogle Scholar
  70. Zhou Q, Su X, Ning K (2014) Assessment of quality control approaches for metagenomic data analysis. Sci Rep 4.  https://doi.org/10.1038/srep06957

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculdade de Ciências Farmacêuticas de Ribeirão PretoUniversidade de São PauloSão PauloBrazil

Personalised recommendations