Bacterial Pan-Genomics

Iranzadeh, Arash; Mulder, Nicola Jane

doi:10.1007/978-981-13-8739-5_2

Arash Iranzadeh⁵ &
Nicola Jane Mulder⁵

777 Accesses
6 Citations

Abstract

Due to their tendency to have a high recombination rate, bacterial genomes are highly diverse across different strains. This diversity may even be in the form of the presence or absence of entire genes; therefore, each strain might have its own combination of genes. The pan-genome represents the complete gene pool of a species. It is made up of the core genome (genes shared by all strains) and the accessory genome (genes shared by some strains and not all). The pan-genome can be considered to be a comprehensive reference genome for computational biology, and several tools have been developed for pan-genomics applications. The tools enable scientists to explore bacterial genomes with more flexibility considering all types of genetic variations. Pan-genomics has many applications in medicine such as the development of vaccines and drugs against pathogenic bacteria. In this chapter, we discuss the fundamental principles and algorithms for pan-genome analysis and introduce and compare the most recent computational tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/vgteam/vg

References

Andrews S (2010) FASTQC. A quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Angiuoli SV et al (2011) Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinform. https://doi.org/10.1186/1471-2105-12-272
Auton A et al (2015) A global reference for human genetic variation. Nature. https://doi.org/10.1038/nature15393
Azarian T et al (2018) The impact of serotype-specific vaccination on phylodynamic parameters of Streptococcus pneumoniae and the pneumococcal pan-genome. PLoS Pathog. https://doi.org/10.1371/journal.ppat.1006966
Baier U, Beller T, Ohlebusch E (2015) Graphical pan-genome analysis with compressed suffix trees and the burrows-wheeler transform. Bioinformatics. https://doi.org/10.1093/bioinformatics/btv603
Behjati S, Tarpey PS (2013) What is next generation sequencing? Arch Dis Child Educ Pract Ed 98(6):236–238. https://doi.org/10.1136/archdischild-2013-304340
Article PubMed PubMed Central Google Scholar
Beller T, Ohlebusch E (2016) A representation of a compressed de Bruijn graph for pan-genome analysis that enables search. Algorithms Mol Biol. https://doi.org/10.1186/s13015-016-0083-7
Benedict MN et al (2014) ITEP: an integrated toolkit for exploration of microbial pan-genomes. BMC Genomics. https://doi.org/10.1186/1471-2164-15-8
Blevins SM, Bronze MS (2010) Robert Koch and the “golden age” of bacteriology. Int J Infect Dis. https://doi.org/10.1016/j.ijid.2009.12.003
Blom J et al (2016) EDGAR 2.0: an enhanced software platform for comparative gene content analyses. Nucleic Acids Res. https://doi.org/10.1093/nar/gkw255
Brittnacher MJ et al (2011) PGAT: a multistrain analysis resource for microbial genomes. Bioinformatics. https://doi.org/10.1093/bioinformatics/btr418
Brynildsrud O et al (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17(1):238. https://doi.org/10.1186/s13059-016-1108-8
Article CAS PubMed PubMed Central Google Scholar
Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol. https://doi.org/10.1128/AEM.02411-13
D’Auria G et al (2010) Legionella pneumophila pangenome reveals strain-specific virulence factors. BMC Genomics. https://doi.org/10.1186/1471-2164-11-181
Delcher AL et al (2007) Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics. https://doi.org/10.1093/bioinformatics/btm009
Ding W, Baumdicker F, Neher RA (2017) panX: pan-genome analysis and exploration. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx977
Donati C et al (2010) Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol. https://doi.org/10.1186/gb-2010-11-10-r107
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. https://doi.org/10.1093/nar/gkr367
Gemmell MR et al (2018) Comparative genomics of campylobacter concisus: analysis of clinical strains reveals genome diversity and pathogenic potential. Emerg Microb Infect. https://doi.org/10.1038/s41426-018-0118-x
Gest H (2004) The discovery of microorganisms by Robert Hooke and Antoni van Leeuwenhoek, fellows of the Royal Society. Notes Records R Soc. https://doi.org/10.1098/rsnr.2004.0055
Gladman S, Seemann T (2008) Velvet optimiser. Free Softw Found. https://doi.org/10.1016/S0925-8574(99)00040-3
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. https://doi.org/10.1038/nrg.2016.49
Gordon A, Hannon GJ (2010) Fastx-toolkit. FASTQ/A short-reads pre-processing tools, http://hannonlab.cshl.edu/fastx_toolkit/
Gordon SP et al (2017) Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun. https://doi.org/10.1038/s41467-017-02292-8
Grebennikova TV et al (2018) The DNA of bacteria of the world ocean and the earth in cosmic dust at the international Space Station. Sci World J. https://doi.org/10.1155/2018/7360147
Gurevich A et al (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt086
Hadfield J et al (2018) Phandango: an interactive viewer for bacterial population genomics. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx610
He Z et al (2016) Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res. https://doi.org/10.1093/nar/gkw370
Holley G, Wittler R, Stoye J (2016) Bloom filter Trie: an alignment-free and reference-free data structure for pan-genome storage. Algorithms Mol Biol. https://doi.org/10.1186/s13015-016-0066-8
Huber W et al (2007) Graphs in molecular biology. BMC Bioinform. https://doi.org/10.1186/1471-2105-8-S6-S8
Hurgobin B, Edwards D (2017) SNP discovery using a Pangenome: has the single reference approach become obsolete? Biology 6(1):21. https://doi.org/10.3390/biology6010021
Article PubMed Central Google Scholar
Inman JM et al (2018) Large-scale comparative analysis of microbial Pan-genomes using PanOCT. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty744
Iqbal Z et al (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet. https://doi.org/10.1038/ng.1028
Kara R, Robert JK (2018) Bacteria | cell, evolution, & classification | Britannica.com. Encyclopaedia Britannica, Inc
Keane JA et al (2016) SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial Genom. https://doi.org/10.1099/mgen.0.000056
Kokot M, Dlugosz M, Deorowicz S (2017) KMC 3: counting and manipulating k-mer statistics. Bioinformatics (Oxford, UK). https://doi.org/10.1093/bioinformatics/btx304
Laing C et al (2010) Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinform. https://doi.org/10.1186/1471-2105-11-461
Land M et al (2015) Insights from 20 years of bacterial genome sequencing. Funct Integrat Genom. https://doi.org/10.1007/s10142-015-0433-4
Lanska DJ (2014) Pasteur, Louis. In: Encyclopedia of the neurological sciences. https://doi.org/10.1016/B978-0-12-385157-4.00973-8
Chapter Google Scholar
Larkin M et al (2007) ClustalW and ClustalX version 2. Bioinformatics. https://doi.org/10.1093/bioinformatics/btm404
Laslett D, Canback B (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. https://doi.org/10.1093/nar/gkh152
Lees JA et al (2018) pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty539
Leinonen R et al (2011) The European nucleotide archive. Nucleic Acids Res 39(Suppl 1). https://doi.org/10.1093/nar/gkq967
Limasset A et al (2016) Read mapping on de Bruijn graphs. BMC Bioinform. https://doi.org/10.1186/s12859-016-1103-9
Lukjancenko O et al (2013) PanFunPro: PAN-genome analysis based on FUNctional PROfiles. F1000 Res. https://doi.org/10.12688/f1000research.2-265.v1
Luo R et al (2015) Erratum to “SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler” [GigaScience, (2012), 1, 18]. GigaScience. https://doi.org/10.1186/s13742-015-0069-2
Maloy S (2013) Bacterial genetics. In: Encyclopedia of biodiversity: second edition. https://doi.org/10.1016/B978-0-12-384719-5.00431-7
Marcus S, Lee H, Schatz MC (2014) SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics. https://doi.org/10.1093/bioinformatics/btu756
Marschall T et al (2016) Computational Pan-genomics: status, promises and challenges. bioRxiv. https://doi.org/10.1101/043430
Mengoni A, Galardini M, Fondi M (2015) Bacterial Pangenomics: methods and protocols. Methods Mol Biol. https://doi.org/10.1007/978-1-4939-1720-4
Minkin I, Pham S, Medvedev P (2017) TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes. Bioinformatics (Oxford, UK). https://doi.org/10.1093/bioinformatics/btw609
Miyazaki S et al (2004) DDBJ in the stream of various biological data. Nucleic Acids Res 32(Database issue):D31–D34. https://doi.org/10.1093/nar/gkh127
Article CAS PubMed PubMed Central Google Scholar
Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt509
Nawrocki EP et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. https://doi.org/10.1093/nar/gku1063
Ostell J, McEntyre J (2007) The NCBI handbook. NCBI Bookshelf:1–8. https://doi.org/10.4016/12837.01
Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693. https://doi.org/10.1093/bioinformatics/btv421
Article CAS PubMed PubMed Central Google Scholar
Pandey P et al (2018) Squeakr: an exact and approximate k-mer counting system. Bioinformatics. https://doi.org/10.1093/bioinformatics/btx636
Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform. https://doi.org/10.1093/bib/bbq020
Pedersen TL et al (2017) PanViz: interactive visualization of the structure of functionally annotated pangenomes. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw761
Cock PJA et al (2009) The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. https://doi.org/10.1093/nar/gkp1137
Petersen TN et al (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. https://doi.org/10.1038/nmeth.1701
Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One. https://doi.org/10.1371/journal.pone.0009490
Rasko DA et al (2008) The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. https://doi.org/10.1128/JB.00619-08
Rizk G, Lavenier D, Chikhi R (2013) DSK: K-mer counting with very low memory usage. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt020
Rouli L et al (2015) The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microb New Infect 7:72–85. https://doi.org/10.1016/j.nmni.2015.06.005
Article CAS Google Scholar
Sahl JW et al (2014) The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes. Peer J. https://doi.org/10.7717/peerj.332
Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74(12):5463–5467. https://doi.org/10.1073/pnas.74.12.5463
Article CAS PubMed PubMed Central Google Scholar
Santos AR et al (2013) PANNOTATOR: an automated tool for annotation of pan-genomes. Genet Mol Res. https://doi.org/10.4238/2013.August.16.2
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069. https://doi.org/10.1093/bioinformatics/btu153
Article CAS PubMed Google Scholar
Snipen L, Liland KH (2015) micropan: an R-package for microbial pan-genomics. BMC Bioinform. https://doi.org/10.1186/s12859-015-0517-0
Tettelin H et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci 102(39):13950–13955. https://doi.org/10.1073/pnas.0506758102
Article CAS PubMed Google Scholar
Thorpe HA et al (2018) Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. GigaScience. https://doi.org/10.1093/gigascience/giy015
Treangen TJ et al (2014) The harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. https://doi.org/10.1186/s13059-014-0524-x
Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol. https://doi.org/10.1016/j.mib.2014.11.016
‘WHO | Press release’ (2013) WHO. World Health Organization. Available at: http://www.who.int/whr/1996/media_centre/press_release/en/. Accessed 12 Sept 2018
Wilson RJ (2006) Graph theory. In: History of topology. https://doi.org/10.1016/B978-044482375-5/50018-3
Chapter Google Scholar
Wozniak M, Wong L, Tiuryn J (2014) ECAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains. BMC Bioinform. https://doi.org/10.1186/1471-2105-15-65
Xiao J et al (2015) A brief review of software tools for pangenomics. Genomics Proteom Bioinform. https://doi.org/10.1016/j.gpb.2015.01.007
Zekic T, Holley G, Stoye J (2018) Pan-genome storage and analysis techniques. Methods Mol Biol. https://doi.org/10.1007/978-1-4939-7463-4_2
Zhao Y et al (2012) PGAP: Pan-genomes analysis pipeline. Bioinformatics. https://doi.org/10.1093/bioinformatics/btr655
Zhao Y et al (2014) PanGP: a tool for quickly analyzing bacterial pan-genome profile. Bioinformatics. https://doi.org/10.1093/bioinformatics/btu017

Download references

Author information

Authors and Affiliations

Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Arash Iranzadeh & Nicola Jane Mulder

Authors

Arash Iranzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Jane Mulder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arash Iranzadeh .

Editor information

Editors and Affiliations

Department of Molecular and Cellular Engineering, Jacob Institute of Biotechnology and Bioengineering, Sam Higginbottom University of Agriculture, Technology and Sciences, Prayagraj, Uttar Pradesh, India
Vijay Tripathi
Department of Forestry, North Eastern Regional Institute of Science and Technology (Deemed To Be University-MHRD), Itanagar, Arunachal Pradesh, India
Pradeep Kumar
Department of Computational Biology and Bioinformatics, Jacob Institute of Biotechnology and Bioengineering, Sam Higginbottom University of Agriculture, Technology and Sciences, Prayagraj, Uttar Pradesh, India
Pooja Tripathi
Department of Botany, Kamla Nehru P.G. College, Raebareli, Uttar Pradesh, India
Amit Kishore

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Iranzadeh, A., Mulder, N.J. (2019). Bacterial Pan-Genomics. In: Tripathi, V., Kumar, P., Tripathi, P., Kishore, A. (eds) Microbial Genomics in Sustainable Agroecosystems. Springer, Singapore. https://doi.org/10.1007/978-981-13-8739-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-13-8739-5_2
Published: 06 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8738-8
Online ISBN: 978-981-13-8739-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics