Skip to main content

Comparative Genomics for Prokaryotes

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1704))

Abstract

Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes have been sequenced. In prokaryotes, gene architecture lacks the intron-exon structure of eukaryotic genes (with an occasional exception). These two facts mean that there is an abundance of data for prokaryotic genomes, and that they are easier to study than the more complex eukaryotic genomes. In this chapter, we provide an overview of genome comparison tools that have been developed primarily (sometimes exclusively) for prokaryotic genomes. We cover methods that use only the DNA sequences, methods that use only the gene content, and methods that use both data types.

This is a preview of subscription content, log in via an institution.

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Hyatt D et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119

    Article  PubMed  PubMed Central  Google Scholar 

  2. Delcher AL et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069

    Article  CAS  PubMed  Google Scholar 

  5. Tatusova T et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44(14):6614–6624

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Markowitz VM et al (2012) IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40(Database issue):D115–D122

    Article  CAS  PubMed  Google Scholar 

  7. Overbeek R et al (2014) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42(Database issue):D206–D214

    Article  CAS  PubMed  Google Scholar 

  8. Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 12:35

    Article  PubMed  PubMed Central  Google Scholar 

  9. Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lang JM, Darling AE, Eisen JA (2013) Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8(4):e62510

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113

    Article  PubMed  PubMed Central  Google Scholar 

  13. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552

    Article  CAS  PubMed  Google Scholar 

  14. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Perriere G, Thioulouse J (1996) On-line tools for sequence retrieval and multivariate statistics in molecular biology. Comput Appl Biosci 12(1):63–69

    CAS  PubMed  Google Scholar 

  16. Tettelin H et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102(39):13950–13955

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154

    Article  CAS  PubMed  Google Scholar 

  18. Marschall T (2016) Computational pan-genomics: status, promises and challenges. Brief Bioinform bbw089

    Google Scholar 

  19. Kaas RS et al (2012) Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia Coli genomes. BMC Genomics 13:577

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Rouli L et al (2014) Genomic analysis of three African strains of bacillus anthracis demonstrates that they are part of the clonal expansion of an exclusively pathogenic bacterium. New Microbes New Infect 2(6):161–169

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696–7701

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Snipen L, Almoy T, Ussery DW (2009) Microbial comparative pan-genomics using binomial mixture models. BMC Genomics 10:385

    Article  PubMed  PubMed Central  Google Scholar 

  23. Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Galperin MY et al (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43(Database issue):D261–D269

    Article  CAS  PubMed  Google Scholar 

  25. Ashburner M et al (2000) Gene ontology: tool for the unification of biology the gene ontology consortium. Nat Genet 25(1):25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Conesa A et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676

    Article  CAS  PubMed  Google Scholar 

  27. Setubal JC, Meidanis J (1997) Introduction to computational molecular biology. PWS, Boston, MA

    Google Scholar 

  28. Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):R12

    Article  PubMed  PubMed Central  Google Scholar 

  29. Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, New York

    Book  Google Scholar 

  30. Uricaru R et al (2015) YOC, a new strategy for pairwise alignment of collinear genomes. BMC Bioinformatics 16:111

    Article  PubMed  PubMed Central  Google Scholar 

  31. Darling AC et al (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14(7):1394–1403

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wattam AR et al (2014) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42(Database issue):D581–D591

    Article  CAS  PubMed  Google Scholar 

  33. Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874

    Article  CAS  PubMed  Google Scholar 

  34. Deloger M, El Karoui M, Petit MA (2009) A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J Bacteriol 191(1):91–99

    Article  CAS  PubMed  Google Scholar 

  35. Henz SR et al (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335

    Article  CAS  PubMed  Google Scholar 

  36. Meier-Kolthoff JP et al (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wulff NA et al (2014) The complete genome sequence of ‘Candidatus Liberibacter americanus’, associated with citrus huanglongbing. Mol Plant Microbe Interact 27(2):163–176

    Article  CAS  PubMed  Google Scholar 

  38. Akinosho H et al (2014) The emergence of clostridium thermocellum as a high utility candidate for consolidated bioprocessing applications. Front Chem 2:66

    Article  PubMed  PubMed Central  Google Scholar 

  39. Setubal JC et al (2009) Genome sequence of Azotobacter vinelandii, an obligate aerobe specialized to support diverse anaerobic metabolic processes. J Bacteriol 191(14):4534–4545

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Eisen JA et al (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1(6):RESEARCH0011

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported in part by a CNPq researcher fellowship (J.C.S. and N.F.A.); by CAPES grant 3385/2013 (BIGA project) (J.C.S. and N.F.A.); by Fundect-MS grants TO141/2016 and TO007/2015 (N.F.A); and by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and HumanServices, under contract no. HHSN272201400027C (A.R.W.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João C. Setubal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Setubal, J.C., Almeida, N.F., Wattam, A.R. (2018). Comparative Genomics for Prokaryotes. In: Setubal, J., Stoye, J., Stadler, P. (eds) Comparative Genomics. Methods in Molecular Biology, vol 1704. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7463-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7463-4_3

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7461-0

  • Online ISBN: 978-1-4939-7463-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics