Advertisement

Whole Genome Sequence Analysis and Population Genomics of Group A Streptococci

  • Jake A. Lacey
  • Taylah B. James
  • Steven Y. C. Tong
  • Mark R. DaviesEmail author
Protocol
  • 89 Downloads
Part of the Methods in Molecular Biology book series (MIMB, volume 2136)

Abstract

Whole-genome sequencing (WGS) is used to determine the genetic composition of an organism. This fast-moving field is continually evolving through technical advancements and the development of new bioinformatic tools for analyzing genomic data; however, the basic principles and processes for defining and processing high-quality genome sequence information remain unchanged. Here, we introduce some considerations and describe some commonly used bioinformatic steps for processing raw genome sequence data to generate genome assemblies through to understanding basic population genomics.

Key words

Population genomics Comparative genomics Genome sequencing Next-generation sequencing Reference genome Group A Streptococcus Streptococcus pyogenes 

Notes

Acknowledgments

This work was supported by NHMRC project grants (#1130455, #1165876 and #1098319). S.Y.C.T. is an Australian National Health and Medical Research Council (NHMRC) Career Development Fellow (#1145033). M.R.D is an University of Melbourne C.R. Roper Fellow.

References

  1. 1.
    Gardy JL, Loman NJ (2018) Towards a genomics-informed, real-time, global pathogen surveillance system. Nat Rev Genet 19(1):9–20.  https://doi.org/10.1038/nrg.2017.88CrossRefPubMedGoogle Scholar
  2. 2.
    Klemm E, Dougan G (2016) Advances in understanding bacterial pathogenesis gained from whole-genome sequencing and phylogenetics. Cell Host Microbe 19(5):599–610.  https://doi.org/10.1016/j.chom.2016.04.015CrossRefPubMedGoogle Scholar
  3. 3.
    Bessen DE, Smeesters PR, Beall BW (2018) Molecular epidemiology, ecology, and evolution of group a streptococci. Microbiol Spectr 6(5).  https://doi.org/10.1128/microbiolspec.CPP3-0009-2018
  4. 4.
    Davies MR, McIntyre L, Mutreja A et al (2019) Atlas of group a streptococcal vaccine candidates compiled using large-scale comparative genomics. Nat Genet 51(6):1035–1043.  https://doi.org/10.1038/s41588-019-0417-8CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069.  https://doi.org/10.1093/bioinformatics/btu153CrossRefPubMedGoogle Scholar
  6. 6.
    Croucher NJ, Page AJ, Connor TR et al (2015) Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 43(3):e15.  https://doi.org/10.1093/nar/gku1196CrossRefPubMedGoogle Scholar
  7. 7.
    Mostowy R, Croucher NJ, Andam CP et al (2017) Efficient inference of recent and ancestral recombination within bacterial populations. Mol Biol Evol 34(5):1167–1182.  https://doi.org/10.1093/molbev/msx066CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Didelot X, Wilson DJ (2015) ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11(2):e1004041.  https://doi.org/10.1371/journal.pcbi.1004041CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Lees JA, Harris SR, Tonkin-Hill G et al (2019) Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 29(2):304–316.  https://doi.org/10.1101/gr.241455.118CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):R46.  https://doi.org/10.1186/gb-2014-15-3-r46CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477.  https://doi.org/10.1089/cmb.2012.0021CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Wick RR, Schultz MB, Zobel J et al (2015) Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31(20):3350–3352.  https://doi.org/10.1093/bioinformatics/btv383CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Wick RR, Judd LM, Gorrie CL et al (2017) Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13(6):e1005595.  https://doi.org/10.1371/journal.pcbi.1005595CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Kapatai G, Coelho J, Platt S et al (2017) Whole genome sequencing of group a streptococcus: development and evaluation of an automated pipeline for emmgene typing. PeerJ 5:e3226.  https://doi.org/10.7717/peerj.3226CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Arndt D, Grant JR, Marcu A et al (2016) PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 44(W1):W16–W21.  https://doi.org/10.1093/nar/gkw387CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Liu M, Li X, Xie Y et al (2019) ICEberg 2.0: an updated database of bacterial integrative and conjugative elements. Nucleic Acids Res 47(D1):D660–D665.  https://doi.org/10.1093/nar/gky1123CrossRefPubMedGoogle Scholar
  17. 17.
    Hunt M, Mather AE, Sanchez-Buso L et al (2017) ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom 3(10):e000131.  https://doi.org/10.1099/mgen.0.000131CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Brynildsrud O, Bohlin J, Scheffer L et al (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17(1):238.  https://doi.org/10.1186/s13059-016-1108-8CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Thorpe HA, Bayliss SC, Sheppard SK et al (2018) Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. Gigascience 7(4):1–11.  https://doi.org/10.1093/gigascience/giy015CrossRefPubMedGoogle Scholar
  20. 20.
    Page AJ, Taylor B, Delaney AJ et al (2016) SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom 2(4):e000056.  https://doi.org/10.1099/mgen.0.000056CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Nguyen LT, Schmidt HA, von Haeseler A et al (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32(1):268–274.  https://doi.org/10.1093/molbev/msu300CrossRefPubMedGoogle Scholar
  22. 22.
    Ondov BD, Treangen TJ, Melsted P et al (2016) Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17(1):132.  https://doi.org/10.1186/s13059-016-0997-xCrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  • Jake A. Lacey
    • 1
  • Taylah B. James
    • 2
  • Steven Y. C. Tong
    • 1
    • 3
    • 4
  • Mark R. Davies
    • 2
    Email author
  1. 1.Doherty DepartmentThe University of Melbourne, at the Peter Doherty Institute for Infection and ImmunityMelbourneAustralia
  2. 2.Department of Microbiology and ImmunologyThe University of Melbourne at the Peter Doherty Institute for Infection and ImmunityMelbourneAustralia
  3. 3.Division of Global and Tropical HealthMenzies School of Health Research, Division of Global and Tropical HealthDarwinAustralia
  4. 4.Victorian Infectious Disease Service, The Royal Melbourne HospitalThe University of Melbourne, at the Peter Doherty Institute for Infection and ImmunityMelbourneAustralia

Personalised recommendations