Genetic Characterization of Vibrio cholerae O1 isolates from outbreaks between 2011 and 2015 in Tanzania
Cholera outbreaks have occurred in Tanzania since 1974. To date, the genetic epidemiology of these outbreaks has not been assessed.
96 Vibrio cholerae O1 isolates from five regions were characterized, and their genetic relatedness assessed using multi-locus variable-number tandem-repeat analysis (MLVA) and whole genome sequencing (WGS).
Of the 48 MLVA genotypes observed, 3 were genetically unrelated to any others, while the remaining 45 genotypes separated into three MLVA clonal complexes (CCs) - each comprised of genotypes differing by a single allelic change. In Kigoma, two separate outbreaks, 4 months apart (January and May, 2015), were each caused by genetically distinct strains by MLVA and WGS. Remarkably, one MLVA CC contained isolates from both the May outbreak and ones from the 2011/2012 outbreak in Dar-es-Salaam. However, WGS revealed the isolates from the two outbreaks to be distinct clades. The outbreak that started in August 2015 in Dar-es-Salaam and spread to Morogoro, Singida and Mara was comprised of a single MLVA CC and WGS clade. Isolates from within an outbreak were closely related differing at fewer than 5 nucleotides. All isolates were part of the 3rd wave of the 7th pandemic and were found in four clades related to isolates from Kenya and Asia.
We conclude that genetically related V. cholerae cluster in outbreaks, and distinct strains circulate simultaneously.
KeywordsWhole Genome Sequencing Cholera Outbreak Cholerae Isolate Thiosulfate Citrate MLVA Genotype
Multilocus variable tandem repeat analysis
single nucleotide variants
Whole genome sequencing
Vibrio cholerae is the causative agent of cholera, a secretory diarrheal disease resulting in high mortality among humans, if untreated ; cholera continues to be a significant public health problem in countries with poor socio-economic conditions. The burden of this disease is estimated to be 3–5 million cases, with 100,000–120,000 deaths annually , and the number of cholera cases reported to the World Health Organization continues to rise. In 2014, 42 countries reported a total of 190,549 cholera cases and 2,231 deaths — an overall case fatality rate of 1.17% . About 85% of the cases occur in Africa and southern Asia .
Since an initial report in 1974, cholera outbreaks have been reported regularly in Tanzania . Outbreaks have been reported in various regions, including Dar es Salaam, Dodoma, Kigoma, Lindi, Mbeya, Morogoro, Mtwara, Pwani and Tanga; case fatality rates (CFR) have ranged from 1.3% to 11.7% . Between 2011 and 2016 seven outbreaks have been reported from six regions, including Dar es Salaam, Kigoma, Morogoro, Singida, Tanga, and Mara.
It has been difficult to determine whether cholera outbreaks in Tanzania are from a single or multiple sources, or whether the outbreak strains are genetically related to each other, because these isolates have not yet been characterized by modern genetic methods. Here, we use two established methods, multi-locus variable-number tandem-repeat analysis (MLVA) and whole genome sequencing (WGS), to determine the genetic relatedness and establish transmission patterns of outbreak strains, which can lead to conclusions on the source(s) of these cholera outbreaks. Such conclusions could form the basis for interventions to better control the spread of future outbreaks.
Study setting and study period
Isolation of Vibrio cholerae
Ninety-six V. cholerae isolates preserved in skim milk at −80 °C were revived by inoculation into alkaline peptone water at 37 °C for 6 hours and cultured aerobically on thiosulfate citrate bile salt sucrose medium at 35 °C overnight. One discrete colony from thiosulfate citrate bile salt sucrose medium was subcultured on tryptone soya agar at 35 °C overnight under aerobic conditions to obtain sufficient growth for biochemical and serological tests.
Biochemical screening and confirmation of the isolates
All isolates were confirmed as positive for oxidase, indole and string tests. The isolates were serogrouped using polyvalent serogroup O1 specific antiserum and confirmed for serotypes using Inaba and Ogawa specific antisera from DENKA SEIKEN Co. Ltd-Japan. Saline controls were included to detect auto agglutination.
DNA was extracted from overnight V. cholerae growth on Luria Bertani (LB) broth culture medium by taking 5 μL and following the Qiagen-DNeasy DNA extraction kit protocol (Qiagen, Germantown, MD).
Multiple-locus variable-number tandem-repeat analysis (MLVA)
Five loci (VC0147, VC0436-7, VC1650, VCA0171, and VCA0283) containing a variable number of tandem repeats were amplified using specific primers . These PCR products were separated, detected, and sized by using a 3730xl automatic sequencer using internal lane standards (Liz 600), and the Gene Scan program (all from Applied Biosystems, Foster City, CA, USA). Genotypes were determined using published formulas to calculate the number of repeats from the length of each allele . Five loci were ordered by their chromosomal location and a genotype (e.g., 9, 4, 14, 22, 17) was interpreted as an isolate having alleles of 9, 4, 14, 22, and 17 repeats at the 5 loci, respectively. The relatedness of isolates was assessed by using eBURST version 3 (http://eburst.mlst.net) [1, 4]. Genetically-related genotypes are defined as those possessing identical alleles at 4 of the 5 loci; groups of related genotypes are called MLVA clonal complexes (CCs).
We sequenced the genomes of 39 V. cholerae O1 isolates using DNA prepared for Illumina sequencing with the KAPA High Throughput Library Preparation Kit (Kapa Biosystems, Wilmington, MA). DNA was fragmented with the Covaris E210. Libraries were prepared using a modified version of manufacturer’s with-bead protocol (Kapa Biosystems, Wilmington, MA). The libraries were enriched and barcoded by ten cycles of PCR amplification with primers containing an index sequence seven nucleotides in length. The libraries were then sequenced using a 100 bp paired-end run on an Illumina HiSeq2500 (Illumina, San Diego, CA).
Whole genome alignment and detection of single nucleotide variants (SNVs)
The quality of the 101-base paired-end reads was confirmed using FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc). High quality reads were assembled with “Spades” software (v.3.6.2) , using the options ‘–careful’ to reduce the number of mis-assemblies and ‘–cov-cutoff auto’ to remove the potentially mis-assembled low coverage contigs. Annotation was performed using the RAST server . The assembled annotated files are in Genbank. Nucleotide variation was identified compared to V. cholerae O1 El Tor strain TEM/25/01-004 (whole genome sequence tag TANZ_56) in order to avoid spurious single nucleotides variants (SNVs). PARSNP (v1.2)  was used to extract and align the variable nucleotides from the core-genome, using the parameter ‘–c’ to constrain the use of all input genomes and generate the ‘.vcf’ variant description file and ‘.ggr’ alignment description file. The ‘.ggr’ file was loaded in Gingr (v1.2)  to visualize the alignments and export the variant nucleotide alignment ‘.mfa’ file. The ‘.vcf’ file was then used to remove all variants from the ‘.mfa’ file detected in the edge of the contigs (less than 1 kb of the contigs edges) using an in-house script. FastTree2 (v2.1.9)  was used with the default parameters to generate the maximum-likelihood newick tree file using the corrected ‘.mfa’ alignment file. Then iTOL (http://itol.embl.de/)  was used to visualize the maximum-likelihood tree. In order to place the isolates into the 7th pandemic phylogeny, we mapped the reads to the Vibrio cholerae 01 El Tor reference N16961 using SMALT (http://www.sanger.ac.uk/resources/software/smalt) . The alignment was then striped of putative recombinant sites via Gubbins . The resulting alignment of 5020 nucleotides was used to infer the phylogenic tree using RAxML  under the GTR model with 100 bootstrap replicates. The pre-seventh pandemic strain M66 (NCBI accession numbers CP001233 and CP001234) was used as outgroup. The whole genome sequences from previous publications are listed in Additional file 1: Table S1.
Multi-locus variable-number tandem-repeat analysis (MLVA)
Our MLVA revealed extensive genetic diversity among the 96 V. cholerae O1 isolates (Additional file 2: Table S2). All loci exhibited substantial variation: VC0147 had four alleles; VC436–7, five alleles; VC1650, three alleles; VCA0171, nine alleles; and VCA0283, thirteen alleles. When each isolate was assigned a genotype (on the basis of and in order of the number of repeat units at each locus), we identified 48 genotypes. Further exploration to determine the genetic relatedness of these genotypes revealed three MLVA-CCs, each comprising genotypes that differed from the other genotypes by a single allelic change. In addition, we detected three singleton genotypes that were unrelated to any other genotype; that is, they differed at >2 of the 5 MLVA loci from all other genotypes.
Whole genome sequences
We analysed the phylogenetic relationships of the whole genome sequences from 39 V. cholerae O1 isolates. The average number of assembled contigs or scaffolds was 136 with an average depth of 386 reads (range 271 to 646). The genomes averaged 4.03 Mb in length and had an average GC content of 47.5%. In order to examine only the highest quality SNVs, we analysed only SNVs that were more than 1 kb from the ends of a contig. The resulting number of SNVs was 196 (Additional file 3: Table S3).
Cholera outbreaks in Tanzania were caused by genetically diverse isolates of V. cholerae O1, although within an outbreak there was very little genetic diversity. Among our set of 96 isolates, MLVA revealed 48 genotypes of which 45 were grouped into three MLVA-CCs of genetically related isolates. Among the 39 isolates analysed by WGS, 196 SNVs were observed. Our maximum likelihood analyses of the genome sequences of the 39 isolates revealed four clades each within the 3rd wave of the 7th pandemic . The isolates in each clade, as defined by WGS corresponded to isolates from one of the MLVA-CCs. CC1 was divided into two clades reflecting the distinct outbreaks in different locations, different years and with different serotypes. The isolates within each outbreak primarily belonged to a single MLVA CC and the WGS clade.
Our MLVA data provided two unusual observations. First, in our data, the allele at locus VCA0171 was strongly correlated with the outbreak. In contrast, most previous MLVA investigations have found that the two small chromosome loci VCA0171 and VCA0283 were highly variable and did not correlate with any variable [1, 14]. Second, our CC1 was not a single clade when analysed by WGS. This contrasts with previous observations that i) isolates from the same CC many miles apart were part of the same clade by WGS  or ii) closely related WGS sequences may have unrelated MLVA genotypes [14, 15]. Whether this is the result of convergence or a slower than expected divergence of the MLVA genotypes cannot be determined from our data.
The epidemiological and genetic data, MLVA and WGS, are all consistent with three separate outbreaks in Tanzania during 2015. The first in January 2015 occurred in Kigoma and was caused by isolates from CC3 and a distinct singleton genotype representative of another genetic lineage of V. cholerae. The second outbreak in May also in Kigoma was caused by isolates from CC1. This indicates that the two outbreaks in Kigoma, occurring 4 months apart, were each caused by independent genetic lineages. Consistent with a single source, the isolates within each outbreak were less than 5 SNVs apart. The genomes from MLVA CC1 & 3 are likely the ongoing spread of the lineage (or lineages) that was imported into Africa between 1993 and 2002 . The third, August 2015 outbreak includes only isolates from MLVA CC2, again a single source for the isolates is consistent with them being fewer than five nucleotides apart. This outbreak has spread from Dar es Salaam in August to Morogoro later August, to Mara and then Singida (see Fig. 1). Genotypes from the two of the three outbreaks have been observed in or related to isolates from Dar es Salaam, we hypothesize that Dar es Salaam may be a reservoir of Vibrio cholerae in Tanzania.
Each cholera outbreak in Tanzania in 2015 was caused by a genetic distinct group of closely related isolates. Each outbreak has a single primary source and interventions should focus on stopping the spread of the disease. These outbreaks are part of the ongoing spread of cholera through Africa, as clearly seen with these strains related to the Zambian, Kenyan and other African lineages. Dar es Salaam is hypothesized to be the hub of cholera outbreaks in Tanzania, controlling cholera in Dar es Salaam will have a major impact on cholera outbreaks in the country.
Funding was provided by NIAID 5R01AI039129 from National Institute of Health (NIH), USA.
Availability of data and materials
The assembled annotated files of the whole genome sequences are in Genbank. Additional file 1: Table S1 contains a list of all the sequences (and their publically available location) used for the phylogenetic tree in Fig. 3. The MLVA data are presented in the Additional file 2: Table S2. Additional file 3: Table S3 contains all of the SNVs assigned to each isolate in this study.
YK, AAM, HL, LU, MM, SYM, FM, MM, RK, JM, Jl, SN, MA, SL, DD, OCS have made substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) YK, AAM, MA, DD, OCS have been involved in drafting the manuscript or revising it critically for important intellectual content; and 3) all co-authors have given final approval of the version to be published. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
These specimens were collected with consent of the patients, as part of routine surveillance by the Ministry of Health, and were de-identified to protect patient anonymity. A waiver of consent for the use of anonymous data was approved by the Senate Research and Publications Committee of Muhimbili University of Health and Allied Sciences.
- 2.Wash I. World health organization: weekly epidemiological record. Geneva: 2015.Google Scholar
- 3.World Health Organization. Cholera Country Profile : United Republic of Tanzania. 2008.Google Scholar
- 6.Aziz RK, Bartels D, Best AA, Dejongh M, Disz T, Edwards RA, et al. The RAST server : rapid annotations using subsystems technology. BMC Genomics. 2008;9(75):1–15.Google Scholar
- 7.Treangen TJ, Ondov BD, Koren S, Phillippy AM. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. BMC Genome Biol. 2014;15(524):1–15.Google Scholar
- 10.Harris SR, Feil EJ, Holden MTG, Quail MA, Emma K, Chantratita N, et al. Europe PMC funders group evolution of MRSA during hospital transmission and intercontinental spread. Eur J Infect Dis Epidemiol Prev Control. 2010;327(5964):469–74.Google Scholar
- 13.Mutreja A, Kim DW, Thomson NR, Connor TR, Lee JH, Kariuki S, Croucher NJ, Choi SY, Harris SR, Lebens M, Niyogi SK, Kim EJ, Ramamurthy T, Chun J, Wood JL, Clemens JD, Czerkinsky C, Nair GB, Holmgren J, Parkhill J, Dougan G. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature. 2011;477(7365):462–5.CrossRefPubMedPubMedCentralGoogle Scholar
- 14.Rashid MU, Almeida M, Azman AS, Lindsay BR, Sack DA, Colwell RR, Huq A, Morris Jr JG, Alam M. Stine OC Comparison of inferred relatedness based on multilocus variable-number tandem-repeat analysis and whole genome sequencing of Vibrio cholerae O1. FEMS Microbiol Lett. 2016;363:116.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.