Abstract
With several microbes discovered and rediscovered, there is a growing need to understand their lineage, origin, and occurrence. Given the microbial diversity and its importance in global impact, it would be interesting to explore the microbial resources to better disseminate the phenotyping, epidemiological investigations, screening, and metagenomics. There are several bioinformatics tools to mention from phylogenetic taxa, sequence and structural relationship, evolutionary mechanisms, horizontal gene transfer, and importantly functional genomics. Here we bring an overview of genomic tools that have aided identifying isolates, species, and subspecies of uncultured microorganisms and inferring their functional roles. While giving a gist of tools, we also discuss the features and limitations of these tools in the light of the emergence of next-generation sequencing (NGS) technologies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adato O, Ninyo N, Gophna U, Snir S (2015) Detecting horizontal gene transfer between closely related taxa. PLoS Comput Biol 11(10):e1004408
Boc A, Diallo AB, Makarenkov V (2012) T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res 40(Web Server issue):W573–W579
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Angly FE, Willner D, Prieto-Davó A, Edwards RA, Schmieder R, Vega-Thurber R, Antonopoulos DA, Barott K, Cottrell MT, Desnues C et al (2009) The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 5:e1000593
Bansal AK (2005) Bioinformatics in microbial biotechnology – a mini review. Microb Cell Factories 4:19
Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28:i283–i291
Blin K, Wolf T, Chevrette MG, Lu X, Schwalen CJ, Kautsar SA et al (2017) antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification. Nucleic Acids Res 45(W1):W36–W41
Bray NL, Pimentel H, Melsted P, Pachter L (2016, Aug 9) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(8):888
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336
Cardona C, Weisenhorn P, Henry C, Gilbert JA (2016) Network-based metabolic analysis and microbial community modeling. Curr Opin Microbiol 31:124–131
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159
Cimermancic P, Medema MH, Claesen J, Kurita K, Wieland Brown LC, Mavrommatis K et al (2014) Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158:412–421
Cruz-Morales P, Martínez-Guerrero CE, Morales-Escalante MA, Yáñez-Guerra LA, Kopp JF, Feldmann J et al (2015) Recapitulation of the evolution of biosynthetic gene clusters reveals hidden chemical diversity on bacterial genomes. bioRxiv. https://doi.org/10.1101/020503
David R, Maddisoni KS, Wayne PM (2007) The tree of life web project. Zootaxa 1668(1):19–40
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S, Lefort V, Lescot M, Claverie JM, Gascuel O (2008) Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36(suppl_2):W465–W469
Dessimoz C, Daniel M, Gaston HG (2008) DLIGHT—lateral gene transfer detection using pairwise evolutionary distances in a statistical framework, vol 4955. Springer, pp 315–330
Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, Banfield JF (2009) Community-wide analysis of microbial genome sequence signatures. Genome Biol 10:R85
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195
Feijao P, Yao HT, Fornika D, Gardy J, Hsiao W, Chauve C, Chindelevitch L (2018) MentaLiST—a fast MLST caller for large MLST schemes. Microbial Genomics 4(2):e000146
Flandrois JP, Perrière G, Gouy M (2015) leBIBIQBPP: a set of databases and a webtool for automatic phylogenetic analysis of prokaryotic sequences. BMC Bioinf 16:251
Fraser CM, Eisen J, Fleischmann RD, Ketchum KA, Peterson S (2000) Comparative genomics and understanding of microbial biology. Emerg Infect Dis 6:505–512
Gupta A, Jordan IK, Rishishwar L (2017) String MLST: a fast k-mer based tool for multilocus sequence typing. Bioinformatics 33:119–121
Hamady M, Knight R (2009) Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19:1141–1152
Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol 5:R245–R249
He Z, Zhang H, Gao S, Lercher MJ, Chen WH, Hu S (2016) Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res 44(W1):W236–W241
Huson DH, Mitra S, Ruscheweyh HJ, Weber N, Schuster SC (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21:1552–1560
Inouye M, Conway TC, Zobel J, Holt KE (2012) Short read sequence typing (SRST): multi-locus sequence types from short reads. BMC Genomics 13:338
Jaziri F, Parisot N, Abid A, Denonfoux J, Ribière C, Gasc C, Boucher D, Brugère JF, Mahul A, Hill DR, Peyretaillade E, Peyret P (2014) PhylOPDb: a 16S rRNA oligonucleotide probe database for prokaryotic identification. Database (Oxford) 2014(0):bau036
Jeanmougin F, Thompson JD, Gouy M, Higgins DG, Gibson TJ (1998) Multiple sequence alignment with Clustal X. Trends Biochem Sci 23:403–405
Jordan GE, Piel WH (2008, Jul 15) PhyloWidget: web-based visualizations for the tree of life. Bioinformatics 24(14):1641–1642
Jungck JR, Khiripet N, Viruchpinta R, Maneewattanapluk J (2006) Evolutionary bioinformatics: making meaning of microbes, molecules, maps with evolution as the primary lens, biologists can benefit when they bring a variety of data sets to their phylogenetic analyses. Microbe Mag 1:365–371
Juul S, Izquierdo F, Hurst A, Dai X, Wright A, Kulesha E, Pettett R, Turner DJ (2015) What’s in my pot? Real-time species identification on the min ION. bioRxiv. https://doi.org/10.1101/030742
Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND (2010) SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol 47:736–741
Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721–1729
Köser CU, Ellington MJ, Cartwright EJ, Gillespie SH, Brown NM, Farrington M, Holden MT, Dougan G, Bentley SD, Parkhill J, Peacock SJ (2012) Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog 8(8):e1002824
Laffy PW, Wood-Charlson EM, Turaev D, Weynberg KD, Botté ES, van Oppen MJ, Webster NS, Rattei T (2016) HoloVir: a workflow for investigating the diversity and function of viruses in invertebrate Holobionts. Front Microbiol 7:822
Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Vega Thurber RL, Knight R, Beiko RG, Huttenhower C (2013) Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31:814–821
Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-Pontén T, Ussery DW, Aarestrup FM, Lund O (2012) Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol 50:1355–1361
Letunic I, Bork P (2016) Display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245
Li H (2015) Microbiome, Metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Appl 2:73–94
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714
Li MH, Ung PM, Zajkowski J, Garneau-Tsodikova S, Sherman DH (2009) Automated genome mining for natural products. BMC Bioinf 10:185
Li Y, Wang H, Nie K, Zhang C, Zhang Y, Wang J, Niu P, Ma X (2016) VIP: an integrated pipeline for metagenomics of virus identification and discovery. Sci Rep 6:23774
Loiseau C, Hatte V, Andrieu C, Barlet L, Cologne A et al (2017) PanGeneHome: a web Interface to analyze microbial pangenomes. J Bioinf Com Sys Bio 1(2):108
Mansour A (2009) Genes, genomes and genomics ©2009 global science books. Phylip and Phylogenetics
Matsen FA, Kodner RB, Armbrust EV (2010) Pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinf 11:538
Matthews LJ, Rosenberger AL (2008) Taxon combinations, parsimony analysis (PAUP∗), and the taxonomy of the yellow-tailed woolly monkey, Lagothrix flavicauda. Am J Phys Anthropol 137:245–255
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6:610–618
Menzel P, Ng KL, Krogh A (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 7:11257
Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinf 9:386
Nakano K, Shiroma A, Shimoji M, Tamotsu H, Ashimine N, Ohki S, Shinzato M, Minami M, Nakanishi T, Teruya K, Satou K, Hirano T (2017) Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area. Hum Cell 30:149–161
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I (2015) Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinf Biol Insights 9:75–88
Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann M, Mikhailova N, Rubin E, Ivanova NN, Kyrpides NC (2016) Uncovering earth’s virome. Nature 536(7617):425–430
Page AJ, Alikhan NF, Carleton HA, Seemann T, Keane JA, Katz LS (2017) Comparison of classical multi-locus sequence typing software for next-generation sequencing data. Microb Genom 3:e000124
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25(7):1043–55
Prentice MB (2004) Bacterial comparative genomics. Genome Biol 5(8):338
Quainoo S, Coolen JPM, van Hijum SAFT, Huynen MA, Melchers WJG, van Schaik W, Wertheim HFL (2017) Whole-genome sequencing of bacterial pathogens: the future of nosocomial outbreak analysis. Clin Microbiol Rev 30:1015–1063
Ramanan VK, Shen L, Moore JH, Saykin AJ (2012) Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet 28:323–332
Rampelli S, Soverini M, Turroni S, Quercia S, Biagi E, Brigidi P, Candela M (2016) Virome scan: a new tool for metagenomic viral community profiling. BMC Genomics 17:165
Riesenfeld CS, Schloss PD, Handelsman J (2004) Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 38:525–552
Rohwer F, Thurber RV (2009) Viruses manipulate the marine environment. Nature 459:207–212
Roosaare M, Vaher M, Kaplinski L, Möls M, Andreson R, Lepamets M, Kõressaar T, Naaber P, Kõljalg S, Remm M (2017) Strain seeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees. Peer J 5:e3353
Rosindell J, Harmon LJ (2012) One zoom: a fractal explorer for the tree of life. PLoS Biol 10(10):e1001406
Roux S, Emerson JB, Eloe-Fadrosh EA, Sullivan MB (2017) Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. Peer J 5:e3817
Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9:811–814
Shifman A, Ninyo N, Gophna U, Snir S (2014) Phylo SI: a new genome-wide approach for prokaryotic phylogeny. Nucleic Acids Res 42(4):2391–2404
Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF (2018) Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3:836–843
Singh DP, Prabha R, Rai A, Arora DK (2012) Bioinformatics-assisted microbiological research: tasks, developments and upcoming challenges. Am J Bioinform 1:10–19
Skinnider MA, Dejong CA, Rees PN, Johnston CW, Li H, Webster AL, Wyatt MA, Magarvey NA (2015) Genomes to natural products prediction informatics for secondary metabolomes (PRISM). Nucleic Acids Res 43(20):9645–9662
Snir S, Wolf YI, Koonin EV (2012) Universal Pacemaker of genome evolution. PLoS Comput Biol 8(11):e1002785
Song W, Steensen K, Thomas T (2017) HgtSIM: a simulator for horizontal gene transfer (HGT) in microbial communities. Peer J 5:e4015
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102:13950–13955
Tettelin H, David R, Cattuto C, Medini D (2008) Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 11:472–477
Tewolde R, Dallman T, Schaefer U, Sheppard CL, Ashton P, Pichon B, Ellington M, Swift C, Green J, Underwood A (2016) MOST: a modified MLST typing tool based on short read sequencing. Peer J 4:e2308
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Tithi SS, Aylward FO, Jensen RV, Zhang L (2018) Fast Virome explorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data. Peer J 6:e4227
Trappe K, Marschall T, Renard BY (2016) Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. Bioinformatics 32:i595–i604
Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M (2013) MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 14(1):R2
Tringe SG, Rubin EM (2005) Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet 6:805–814
Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12(10):902–903
Tu Q, He Z, Zhou J (2014) Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Res 42:e67
Ussery DW, Wassenaar TM, Borini S (2009) Computing for comparative microbial genomics. Springer, London, p 270
Weber T, Kim HU (2016) The secondary metabolite bioinformatics portal: computational tools to fecilitate synthetic biology of secondary metabolite prediction. Synth Syst Biotechnol 1:69–79
Weber T, Charusanti P, Musiol-Kroll EM, Jiang X, Tong Y, Kim HU, Lee SY (2015) Metabolic engineering of antibiotic factories: new tools for antibiotic production in actinomycetes. Trends Biotechnol 33:15–26
Xiao J, Zhang Z, Wu J, Yu J (2015) A brief review of software tools for Pangenomics. Genomics Proteomics Bioinformatics 13:73–76
Zhulin IB (2015) Databases for microbiologists. J Bacteriol 197(15):2458–2467
Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR (2012) The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One 7(3):e34064
Acknowledgments
We gratefully acknowledge the resources provided by Biotechnology Information System (BTIS) - Sub Distributed Information Centre (Sub-DIC), BISR funded by Department of Biotechnology, Government of India. The authors wish to acknowledge the contributions of Mr. Narendra Meena, Ms. Pragya Chaturvedi, and Ms. Sweta Shrotriya in collecting the information and checking the tools. The encouragement provided by Prof. P. Ghosh, Executive Director, BISR is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Upadhyayula, R.S., Solanki, P.S., Suravajhala, P., Medicherla, K.M. (2019). Bioinformatics Tools for Microbial Diversity Analysis. In: Satyanarayana, T., Johri, B., Das, S. (eds) Microbial Diversity in Ecosystem Sustainability and Biotechnological Applications. Springer, Singapore. https://doi.org/10.1007/978-981-13-8315-1_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-8315-1_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8314-4
Online ISBN: 978-981-13-8315-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)