Abstract
Due to their tendency to have a high recombination rate, bacterial genomes are highly diverse across different strains. This diversity may even be in the form of the presence or absence of entire genes; therefore, each strain might have its own combination of genes. The pan-genome represents the complete gene pool of a species. It is made up of the core genome (genes shared by all strains) and the accessory genome (genes shared by some strains and not all). The pan-genome can be considered to be a comprehensive reference genome for computational biology, and several tools have been developed for pan-genomics applications. The tools enable scientists to explore bacterial genomes with more flexibility considering all types of genetic variations. Pan-genomics has many applications in medicine such as the development of vaccines and drugs against pathogenic bacteria. In this chapter, we discuss the fundamental principles and algorithms for pan-genome analysis and introduce and compare the most recent computational tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Andrews S (2010) FASTQC. A quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Angiuoli SV et al (2011) Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinform. https://doi.org/10.1186/1471-2105-12-272
Auton A et al (2015) A global reference for human genetic variation. Nature. https://doi.org/10.1038/nature15393
Azarian T et al (2018) The impact of serotype-specific vaccination on phylodynamic parameters of Streptococcus pneumoniae and the pneumococcal pan-genome. PLoS Pathog. https://doi.org/10.1371/journal.ppat.1006966
Baier U, Beller T, Ohlebusch E (2015) Graphical pan-genome analysis with compressed suffix trees and the burrows-wheeler transform. Bioinformatics. https://doi.org/10.1093/bioinformatics/btv603
Behjati S, Tarpey PS (2013) What is next generation sequencing? Arch Dis Child Educ Pract Ed 98(6):236–238. https://doi.org/10.1136/archdischild-2013-304340
Beller T, Ohlebusch E (2016) A representation of a compressed de Bruijn graph for pan-genome analysis that enables search. Algorithms Mol Biol. https://doi.org/10.1186/s13015-016-0083-7
Benedict MN et al (2014) ITEP: an integrated toolkit for exploration of microbial pan-genomes. BMC Genomics. https://doi.org/10.1186/1471-2164-15-8
Blevins SM, Bronze MS (2010) Robert Koch and the “golden age” of bacteriology. Int J Infect Dis. https://doi.org/10.1016/j.ijid.2009.12.003
Blom J et al (2016) EDGAR 2.0: an enhanced software platform for comparative gene content analyses. Nucleic Acids Res. https://doi.org/10.1093/nar/gkw255
Brittnacher MJ et al (2011) PGAT: a multistrain analysis resource for microbial genomes. Bioinformatics. https://doi.org/10.1093/bioinformatics/btr418
Brynildsrud O et al (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17(1):238. https://doi.org/10.1186/s13059-016-1108-8
Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol. https://doi.org/10.1128/AEM.02411-13
D’Auria G et al (2010) Legionella pneumophila pangenome reveals strain-specific virulence factors. BMC Genomics. https://doi.org/10.1186/1471-2164-11-181
Delcher AL et al (2007) Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics. https://doi.org/10.1093/bioinformatics/btm009
Ding W, Baumdicker F, Neher RA (2017) panX: pan-genome analysis and exploration. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx977
Donati C et al (2010) Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol. https://doi.org/10.1186/gb-2010-11-10-r107
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. https://doi.org/10.1093/nar/gkr367
Gemmell MR et al (2018) Comparative genomics of campylobacter concisus: analysis of clinical strains reveals genome diversity and pathogenic potential. Emerg Microb Infect. https://doi.org/10.1038/s41426-018-0118-x
Gest H (2004) The discovery of microorganisms by Robert Hooke and Antoni van Leeuwenhoek, fellows of the Royal Society. Notes Records R Soc. https://doi.org/10.1098/rsnr.2004.0055
Gladman S, Seemann T (2008) Velvet optimiser. Free Softw Found. https://doi.org/10.1016/S0925-8574(99)00040-3
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. https://doi.org/10.1038/nrg.2016.49
Gordon A, Hannon GJ (2010) Fastx-toolkit. FASTQ/A short-reads pre-processing tools, http://hannonlab.cshl.edu/fastx_toolkit/
Gordon SP et al (2017) Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun. https://doi.org/10.1038/s41467-017-02292-8
Grebennikova TV et al (2018) The DNA of bacteria of the world ocean and the earth in cosmic dust at the international Space Station. Sci World J. https://doi.org/10.1155/2018/7360147
Gurevich A et al (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt086
Hadfield J et al (2018) Phandango: an interactive viewer for bacterial population genomics. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx610
He Z et al (2016) Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res. https://doi.org/10.1093/nar/gkw370
Holley G, Wittler R, Stoye J (2016) Bloom filter Trie: an alignment-free and reference-free data structure for pan-genome storage. Algorithms Mol Biol. https://doi.org/10.1186/s13015-016-0066-8
Huber W et al (2007) Graphs in molecular biology. BMC Bioinform. https://doi.org/10.1186/1471-2105-8-S6-S8
Hurgobin B, Edwards D (2017) SNP discovery using a Pangenome: has the single reference approach become obsolete? Biology 6(1):21. https://doi.org/10.3390/biology6010021
Inman JM et al (2018) Large-scale comparative analysis of microbial Pan-genomes using PanOCT. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty744
Iqbal Z et al (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet. https://doi.org/10.1038/ng.1028
Kara R, Robert JK (2018) Bacteria | cell, evolution, & classification | Britannica.com. Encyclopaedia Britannica, Inc
Keane JA et al (2016) SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial Genom. https://doi.org/10.1099/mgen.0.000056
Kokot M, Dlugosz M, Deorowicz S (2017) KMC 3: counting and manipulating k-mer statistics. Bioinformatics (Oxford, UK). https://doi.org/10.1093/bioinformatics/btx304
Laing C et al (2010) Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinform. https://doi.org/10.1186/1471-2105-11-461
Land M et al (2015) Insights from 20 years of bacterial genome sequencing. Funct Integrat Genom. https://doi.org/10.1007/s10142-015-0433-4
Lanska DJ (2014) Pasteur, Louis. In: Encyclopedia of the neurological sciences. https://doi.org/10.1016/B978-0-12-385157-4.00973-8
Larkin M et al (2007) ClustalW and ClustalX version 2. Bioinformatics. https://doi.org/10.1093/bioinformatics/btm404
Laslett D, Canback B (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. https://doi.org/10.1093/nar/gkh152
Lees JA et al (2018) pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty539
Leinonen R et al (2011) The European nucleotide archive. Nucleic Acids Res 39(Suppl 1). https://doi.org/10.1093/nar/gkq967
Limasset A et al (2016) Read mapping on de Bruijn graphs. BMC Bioinform. https://doi.org/10.1186/s12859-016-1103-9
Lukjancenko O et al (2013) PanFunPro: PAN-genome analysis based on FUNctional PROfiles. F1000 Res. https://doi.org/10.12688/f1000research.2-265.v1
Luo R et al (2015) Erratum to “SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler” [GigaScience, (2012), 1, 18]. GigaScience. https://doi.org/10.1186/s13742-015-0069-2
Maloy S (2013) Bacterial genetics. In: Encyclopedia of biodiversity: second edition. https://doi.org/10.1016/B978-0-12-384719-5.00431-7
Marcus S, Lee H, Schatz MC (2014) SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics. https://doi.org/10.1093/bioinformatics/btu756
Marschall T et al (2016) Computational Pan-genomics: status, promises and challenges. bioRxiv. https://doi.org/10.1101/043430
Mengoni A, Galardini M, Fondi M (2015) Bacterial Pangenomics: methods and protocols. Methods Mol Biol. https://doi.org/10.1007/978-1-4939-1720-4
Minkin I, Pham S, Medvedev P (2017) TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes. Bioinformatics (Oxford, UK). https://doi.org/10.1093/bioinformatics/btw609
Miyazaki S et al (2004) DDBJ in the stream of various biological data. Nucleic Acids Res 32(Database issue):D31–D34. https://doi.org/10.1093/nar/gkh127
Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt509
Nawrocki EP et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. https://doi.org/10.1093/nar/gku1063
Ostell J, McEntyre J (2007) The NCBI handbook. NCBI Bookshelf:1–8. https://doi.org/10.4016/12837.01
Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693. https://doi.org/10.1093/bioinformatics/btv421
Pandey P et al (2018) Squeakr: an exact and approximate k-mer counting system. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx636
Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform. https://doi.org/10.1093/bib/bbq020
Pedersen TL et al (2017) PanViz: interactive visualization of the structure of functionally annotated pangenomes. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw761
Cock PJA et al (2009) The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. https://doi.org/10.1093/nar/gkp1137
Petersen TN et al (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. https://doi.org/10.1038/nmeth.1701
Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One. https://doi.org/10.1371/journal.pone.0009490
Rasko DA et al (2008) The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. https://doi.org/10.1128/JB.00619-08
Rizk G, Lavenier D, Chikhi R (2013) DSK: K-mer counting with very low memory usage. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt020
Rouli L et al (2015) The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microb New Infect 7:72–85. https://doi.org/10.1016/j.nmni.2015.06.005
Sahl JW et al (2014) The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes. Peer J. https://doi.org/10.7717/peerj.332
Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74(12):5463–5467. https://doi.org/10.1073/pnas.74.12.5463
Santos AR et al (2013) PANNOTATOR: an automated tool for annotation of pan-genomes. Genet Mol Res. https://doi.org/10.4238/2013.August.16.2
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069. https://doi.org/10.1093/bioinformatics/btu153
Snipen L, Liland KH (2015) micropan: an R-package for microbial pan-genomics. BMC Bioinform. https://doi.org/10.1186/s12859-015-0517-0
Tettelin H et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci 102(39):13950–13955. https://doi.org/10.1073/pnas.0506758102
Thorpe HA et al (2018) Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. GigaScience. https://doi.org/10.1093/gigascience/giy015
Treangen TJ et al (2014) The harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. https://doi.org/10.1186/s13059-014-0524-x
Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol. https://doi.org/10.1016/j.mib.2014.11.016
‘WHO | Press release’ (2013) WHO. World Health Organization. Available at: http://www.who.int/whr/1996/media_centre/press_release/en/. Accessed 12 Sept 2018
Wilson RJ (2006) Graph theory. In: History of topology. https://doi.org/10.1016/B978-044482375-5/50018-3
Wozniak M, Wong L, Tiuryn J (2014) ECAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains. BMC Bioinform. https://doi.org/10.1186/1471-2105-15-65
Xiao J et al (2015) A brief review of software tools for pangenomics. Genomics Proteom Bioinform. https://doi.org/10.1016/j.gpb.2015.01.007
Zekic T, Holley G, Stoye J (2018) Pan-genome storage and analysis techniques. Methods Mol Biol. https://doi.org/10.1007/978-1-4939-7463-4_2
Zhao Y et al (2012) PGAP: Pan-genomes analysis pipeline. Bioinformatics. https://doi.org/10.1093/bioinformatics/btr655
Zhao Y et al (2014) PanGP: a tool for quickly analyzing bacterial pan-genome profile. Bioinformatics. https://doi.org/10.1093/bioinformatics/btu017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Iranzadeh, A., Mulder, N.J. (2019). Bacterial Pan-Genomics. In: Tripathi, V., Kumar, P., Tripathi, P., Kishore, A. (eds) Microbial Genomics in Sustainable Agroecosystems. Springer, Singapore. https://doi.org/10.1007/978-981-13-8739-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-8739-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8738-8
Online ISBN: 978-981-13-8739-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)