Whole Genome Sequence Analysis and Population Genomics of Group A Streptococci

  • Jake A. Lacey
  • Taylah B. James
  • Steven Y. C. Tong
  • Mark R. DaviesEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 2136)


Whole-genome sequencing (WGS) is used to determine the genetic composition of an organism. This fast-moving field is continually evolving through technical advancements and the development of new bioinformatic tools for analyzing genomic data; however, the basic principles and processes for defining and processing high-quality genome sequence information remain unchanged. Here, we introduce some considerations and describe some commonly used bioinformatic steps for processing raw genome sequence data to generate genome assemblies through to understanding basic population genomics.

Key words

Population genomics Comparative genomics Genome sequencing Next-generation sequencing Reference genome Group A Streptococcus Streptococcus pyogenes 



This work was supported by NHMRC project grants (#1130455, #1165876 and #1098319). S.Y.C.T. is an Australian National Health and Medical Research Council (NHMRC) Career Development Fellow (#1145033). M.R.D is an University of Melbourne C.R. Roper Fellow.


  1. 1.
    Gardy JL, Loman NJ (2018) Towards a genomics-informed, real-time, global pathogen surveillance system. Nat Rev Genet 19(1):9–20. Scholar
  2. 2.
    Klemm E, Dougan G (2016) Advances in understanding bacterial pathogenesis gained from whole-genome sequencing and phylogenetics. Cell Host Microbe 19(5):599–610. Scholar
  3. 3.
    Bessen DE, Smeesters PR, Beall BW (2018) Molecular epidemiology, ecology, and evolution of group a streptococci. Microbiol Spectr 6(5).
  4. 4.
    Davies MR, McIntyre L, Mutreja A et al (2019) Atlas of group a streptococcal vaccine candidates compiled using large-scale comparative genomics. Nat Genet 51(6):1035–1043. Scholar
  5. 5.
    Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069. Scholar
  6. 6.
    Croucher NJ, Page AJ, Connor TR et al (2015) Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 43(3):e15. Scholar
  7. 7.
    Mostowy R, Croucher NJ, Andam CP et al (2017) Efficient inference of recent and ancestral recombination within bacterial populations. Mol Biol Evol 34(5):1167–1182. Scholar
  8. 8.
    Didelot X, Wilson DJ (2015) ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11(2):e1004041. Scholar
  9. 9.
    Lees JA, Harris SR, Tonkin-Hill G et al (2019) Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 29(2):304–316. Scholar
  10. 10.
    Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):R46. Scholar
  11. 11.
    Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477. Scholar
  12. 12.
    Wick RR, Schultz MB, Zobel J et al (2015) Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31(20):3350–3352. Scholar
  13. 13.
    Wick RR, Judd LM, Gorrie CL et al (2017) Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13(6):e1005595. Scholar
  14. 14.
    Kapatai G, Coelho J, Platt S et al (2017) Whole genome sequencing of group a streptococcus: development and evaluation of an automated pipeline for emmgene typing. PeerJ 5:e3226. Scholar
  15. 15.
    Arndt D, Grant JR, Marcu A et al (2016) PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 44(W1):W16–W21. Scholar
  16. 16.
    Liu M, Li X, Xie Y et al (2019) ICEberg 2.0: an updated database of bacterial integrative and conjugative elements. Nucleic Acids Res 47(D1):D660–D665. Scholar
  17. 17.
    Hunt M, Mather AE, Sanchez-Buso L et al (2017) ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom 3(10):e000131. Scholar
  18. 18.
    Brynildsrud O, Bohlin J, Scheffer L et al (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17(1):238. Scholar
  19. 19.
    Thorpe HA, Bayliss SC, Sheppard SK et al (2018) Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. Gigascience 7(4):1–11. Scholar
  20. 20.
    Page AJ, Taylor B, Delaney AJ et al (2016) SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom 2(4):e000056. Scholar
  21. 21.
    Nguyen LT, Schmidt HA, von Haeseler A et al (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32(1):268–274. Scholar
  22. 22.
    Ondov BD, Treangen TJ, Melsted P et al (2016) Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17(1):132. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  • Jake A. Lacey
    • 1
  • Taylah B. James
    • 2
  • Steven Y. C. Tong
    • 1
    • 3
    • 4
  • Mark R. Davies
    • 2
    Email author
  1. 1.Doherty DepartmentThe University of Melbourne, at the Peter Doherty Institute for Infection and ImmunityMelbourneAustralia
  2. 2.Department of Microbiology and ImmunologyThe University of Melbourne at the Peter Doherty Institute for Infection and ImmunityMelbourneAustralia
  3. 3.Division of Global and Tropical HealthMenzies School of Health Research, Division of Global and Tropical HealthDarwinAustralia
  4. 4.Victorian Infectious Disease Service, The Royal Melbourne HospitalThe University of Melbourne, at the Peter Doherty Institute for Infection and ImmunityMelbourneAustralia

Personalised recommendations