Whole-genome sequencing and comparative genomic analysis of Escherichia coli O91 strains isolated from symptomatic and asymptomatic human carriers
- 1.1k Downloads
The Shiga toxin–producing Escherichia coli (STEC) O91:H21 strains NCCP15736 and NCCP15737 were isolated during a single outbreak in Korea, NCCP15736 from a symptomatic carrier and NCCP15737 from an asymptomatic carrier. To investigate genomic differences between the two strains, we performed whole-genome sequencing of both strains and conducted a comparative genomic analysis.
Using the Illumina HiSeq 2000 platform and Rapid Annotation using the Subsystem Technology (RAST) server, whole-genome sequences of NCCP15736 and NCCP15737 were obtained and annotated. Phylogenetic analysis of ten E. coli strains showed that NCCP15736 and NCCP15737 are evolutionarily close. The two strains were found to be most close to E. coli O91:NM str. 2009C-3745. The genomic comparison showed that the fimD gene of NCCP15737 is truncated and that the truncation could underlie the defects in infection and pathogenicity of NCCP15737. The two strains showed the same virulence factor profiles, and we identified 25 virulence factors from NCCP15736 and NCCP15737, respectively. We identified ten and nine phage-associated regions in the NCCP15736 and NCCP15737 genomes, respectively; the two strains share five of these.
NCCP15736 and NCCP15737 differ at the genomic level, even though they share features such as virulence-related genes. NCCP15737 has a deletion in fimD, which may underlie its asymptomatic character. We conclude that complete genome sequencing and integration of other types of omics data are needed to fully reveal the mechanism underlying the asymptomatic character of NCCP15737.
KeywordsShiga-like toxin-producing Escherichia coli O91 Draft genome Type I fimbriae fimD Truncated protein
coding DNA sequences
enterohemorrhagic Escherichia coli
National Culture Collection for Pathogens
Rapid Annotation using Subsystem Technology
Shiga toxin-producing Escherichia coli
Escherichia coli is a typical member of the normal microflora of the human gastrointestinal tract . However, some E. coli isolates cause serious disease. They can be divided into three major subgroups: commensal or nonpathogenic strains, pathogenic strains that cause intestinal infection, and extraintestinal pathogenic strains . Intestinal pathogenic E. coli include enteroaggregative E. coli, enterohemorrhagic E. coli (EHEC), enteropathogenic E. coli (EPEC), enteroinvasive E. coli, and enterotoxigenic E. coli (ETEC). Shiga toxin-producing E. coli (STEC) O157:H7 in humans was first reported in 1983 [3, 4, 5]. STEC causes a variety of diarrheal diseases and hemolytic uremic syndrome (HUS) . EHEC belongs to the STEC group but it is associated with a distinctive clinical syndrome, namely hemorrhagic colitis (HC), mainly caused by E. coli O157:H7 [7, 8]. Shiga toxin (Stx) inhibits protein synthesis by disrupting the 28S RNA of the 60S ribosomal subunit . Shiga toxins can be classified into two groups: Stx1 and Stx2 . Stx1 originates from Shigella dysenteriae and there are three subtypes: Stx1a, Stx1c and Stx1d; these genes are highly conserved in STECs. Stx2 shows a lower degree of conservation and includes several variants: Stx2a, Stx2b, Stx2c, Stx2d, Stx2e, Stx2f, and Stx2g . Most outbreaks involve STEC O157:H7, but outbreaks caused by non-O157 STEC have shown a recent increase . Thus, a better understanding of the causes of the asymptomatic character of STEC strains is required. Non-O157 STEC includes the O8:H, O26:H, I26:H11, O91:H21, O103:H2, O111:H, O113:H21, O128:H2, and O145:H  serotypes.
Two STEC O91:H21 isolates were used in this study, one from a symptomatic carrier and one from an asymptomatic carrier, both isolated during a recent outbreak in Korea . Molecular and cellular analyses to investigate differences in pathogenicity between the isolates were performed in a previous study. A reduced adherence phenotype and transcriptional repression of type I fimbriae genes were identified in the isolates from the asymptomatic carrier; these two factors may explain why the isolates cause no symptoms. However, the mechanism underlying the transcriptional repression of type I fimbriae is not yet understood at the genomic level. To investigate the differences between the O91:H21 isolates from symptomatic and asymptomatic carriers and to explore the genetic basis underlying these differences, whole-genome sequencing and comparative genomic analyses were performed.
Strain, isolation, and serotyping
An outbreak of STEC at an elementary school was reported in Gwangju, Korea on July 2004 . A total of 1643 stool samples were obtained from asymptomatic individuals and all isolates were biochemically characterized using the API20E system (Biomerieux, Marcy l’Etoile, France). A total of 74 STEC isolates were characterized as positive for STEC but caused no symptoms. Apart from the isolates from asymptomatic carriers, one STEC isolate from a symptomatic carrier was characterized. The isolated strains were deposited in the National Culture Collection for Pathogens (NCCP) at the Korean National Institute of Health under accession numbers NCCP15736 and NCCP15737. For the present study, NCCP15736 and NCCP15737 were obtained from the NCCP for whole-genome sequencing. This research has been reviewed and approved by the Institutional Review Board of the Korean Centers for Disease Control and Prevention.
Library preparation and whole-genome sequencing
A sequencing library was constructed using the TruSeq Sample Preparation Kit (Illumina, San Diego, CA, USA) following the manufacturer’s instructions. Genomic DNA was end repaired and ligated with paired-end sequencing adapters. DNA fragments with the desired length of ~500 bp were selected by gel electrophoresis. A sequencing library was produced by PCR amplification. The Illumina HiSeq 2000 platform was used for whole-genome sequencing.
Genome assembly and annotation
Low-complexity reads, reads with quality scores <Q20, adapter sequences, and duplicate reads were discarded. De novo assembly of high-quality reads was performed with SOAPdenovo (version 1.05) . The de novo assembly results were corrected based on alignment of all reads that passed the quality control threshold against the assembly results using SOAPaligner (version 2.21) . After correction, scaffolds >500 bp in length were considered for downstream analysis.
Open reading frames and annotated open reading frames were identified using the Rapid Annotation using Subsystem Technology (RAST version 4.0)  server pipeline. The coding sequences (CDSs) of NCCP15736 and NCCP15737 were compared using the sequence base comparison functionality of the RAST server. For comparison of type I fimbriae gene clusters between the two strains, the sequence base comparison functionality of the RAST server was also used. To investigate the virulence factor genes, a BLAST search of the total open reading frames (ORFs) of NCCP15736 and NCCP15737 against the virulence factor genes of E. coli listed in VFDB  was performed with an e-value threshold of 1e − 5. To select homologous virulence factor genes, the BLAST Score Ratio (BSR) was calculated and only genes with a BSR score ≥0.4 were used in further analyses. The BSR score was calculated using our in-house scripts. We excluded genes with coverage lower than 60%, even if they showed high sequence identity. Phage-associated gene clusters in the genome sequences of NCCP15736 and NCCP15737 were identified using the PHAST server . Three scenarios for the completeness of the predicted phage-associated regions were defined according to how many genes/proteins of a known phage the region contained: intact (≥90%), questionable (90–60%), and incomplete (≤60%).
Phylogenetic analysis and genomic structure comparison
To infer the evolutionary relationships among E. coli O91, including NCCP15736 and NCCP15737, multiple sequence alignments of the whole genome were performed with Mugsy (version 1.2.3) . The generalized time-reversible  + CAT model  was used to infer the structure of maximum-likelihood phylogenetic trees using FastTree (version 2.1.7) . FigTree (version 1.3.1) (http://tree.bio.ed.ac.uk/software/figtree/) was employed for tree visualization. For comparison of genomic structures between the two strains, the progressive alignment algorithm in Mauve (version 2.3.1)  was used. The BLAST algorithm was used to compare phage-associated regions.
The genomic DNA was purified from a pure culture of a single bacterial isolate of NCCP15736 and NCCP15737, respectively. Potential contamination of the genomic library by other microorganisms was assessed using a BLAST search against the non-redundant database. We also checked for contamination by other genomes by confirming coverage distribution.
Results and discussion
Genomic features of NCCP15736 and NCCP15737 strains of Escherichia coli
Total open reading frames
Comparison of genome structure
Type I fimbriae operon
NCCP15736 was isolated from a symptomatic human carrier but NCCP15737 was isolated from an asymptomatic human carrier. To determine the causal mechanisms underlying the observed pathogenicity, we investigated the virulence factors of NCCP15736 and compared these factors with those of NCCP15737. Using a BLAST search against VFDB, we identified the same number, 25, of virulence factors from NCCP15736 and NCCP15737, respectively (Additional file 2: Table S2). The 25 virulence genes present in NCCP15736 were also present in NCCP15737. The virulence genes of NCCP15736 and NCCP15737 can be classified into five categories: adherence, invasion, iron uptake, secretion system, and toxins. In the adherence category, E. coli common pilus (ECP)-related genes (ecpA, B, C, D, E, and ecpR) F1C fimbriae (focC), and type I fimbriae genes (fimA, B, C, D, E, F, G, H, and I) were identified. Tia invasion determinant (tia) , which belongs to the invasion category and originates from E. coli O1:K1, was identified in both strains. In the iron uptake category, iron-regulated element gene (ireA) and salmochelin siderophore-related gene (iroN) were identified in NCCP15736 and NCCP15737. Neither strain contained all of the genes in the LEE-encoded TTSS effectors category, harboring only one secretion gene, escR . In the toxins category, alpha-hemolysin–related genes (hlyA, B and D)  were identified. Alpha-hemolysin is a major virulence factor present in ETEC, STEC, and EPEC strains. It is acquired by horizontal gene transfer via conjugative plasmids . Shiga-like toxin-related genes (stx1A and 1B)  were present in both of the strains and exhibited 100% sequence conservation. In summary, the NCCP15736 and NCCP15737 strains showed the same virulence factors, although NCCP15736 was isolated from a symptomatic carrier and NCCP15737 was isolated from an asymptomatic carrier. In a previous report , the expression of type I fimbriae genes was found to be significantly repressed, and the repression was hypothesized to be the main cause of the asymptomatic nature of NCCP15737.
Prophages are mobile genetic elements that can deliver antimicrobial-resistance genes  or virulence factors  to bacterial hosts and contribute to the diversity of host genomes . We identified ten phage-associated regions (S1–S10) in the NCCP15736 genome and nine phage-associated regions (A1–A9) in the NCCP15737 genome using the PHAST algorithm (Additional file 3: Table S3). Seven of the ten phages in NCCP1576 were intact, and seven of the nine phages in NCCP15737 were intact. NCCP15736 and NCCP15737 each contain two incomplete prophages. Only one questionable prophage, in the S6 region (Stx2-converting phage 1717), was identified in the NCCP15736 genome. Five of the identical phage-associated regions, as determined via a BLAST search, were shared by the two strains. The prophage-associated regions S2, S6, S8, S9, and S10 were unique to NCCP15736, and the A5, A7, A8, and A9 regions were unique to the NCCP15737 genome.
The number of outbreaks caused by non-O157 STEC has increased recently and is causing growing concern. In this study, we performed whole-genome sequencing and comparative genomic analysis of two strains, NCCP15736 and NCCP15737. Our whole-genome sequencing and bioinformatics analyses revealed that NCCP15736 and NCCP15737 have the same virulence gene profiles, but NCCP15737 fimD shows a deletion. Even though our results did not reveal the genomic basis of the transcriptional repression of type I fimbriae genes in NCCP15737, we provided a structural basis for the relationship between the deficiency in the gene encoding type I fimbriae and the asymptomatic character of NCCP15737. We suggest that complete genome sequencing and integration of other types of omics data are required to fully reveal the mechanism underlying the asymptomatic character of NCCP15737.
SHC and WK planned and directed the project and interpreted the results. SHC drafted the manuscript. YSB, YBY, JBK, JTC, CHK and YHJ interpreted the results. YHJ performed the MLST database search. YSB characterized the strain and prepared the genomic DNA. TK performed the gene annotation and comparative genomic analysis and wrote the manuscript. All authors read and approved the final manuscript before submission.
The authors declare that they have no competing interests.
Availability of data and material
Nucleotide sequence accession numbers: Whole-genome shotgun sequencing data for the NCCP15736 and NCCP15737 strains have been deposited in DDBJ/EMBL/GenBank under the accession numbers AOUQ00000000 and AOUP00000000, respectively.
Ethics approval and consent to participate
This research has been reviewed and approved by the Institutional Review Board of the Korea Centers for Disease Control and Prevention (Reference No.: 2013-12-04-P).
This work was supported by a grant from the Marine Biotechnology Program (Genome Analysis of Marine Organisms and Development of Functional Applications) funded by the Ministry of Oceans and Fisheries).
- 10.Scheutz F, Teel LD, Beutin L, Pierard D, Buvens G, Karch H, Mellmann A, Caprioli A, Tozzoli R, Morabito S, et al. Multicenter evaluation of a sequence-based protocol for subtyping Shiga toxins and standardizing Stx nomenclature. J Clin Microbiol. 2012;50(9):2951–63.CrossRefPubMedPubMedCentralGoogle Scholar
- 19.Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences. 1986;17:57–86.Google Scholar
- 20.Stamatakis A. Phylogenetic models of rate heterogeneity: a high performance computing perspective. Parallel and distributed processing symposium 2006. 20th International 2006 IPDPS.Google Scholar
- 26.Johnson TJ, Kariyawasam S, Wannemuehler Y, Mangiamele P, Johnson SJ, Doetkott C, Skyberg JA, Lynne AM, Johnson JR, Nolan LK. The genome sequence of avian pathogenic Escherichia coli strain O1:K1:H7 shares strong similarities with human extraintestinal pathogenic E. coli genomes. J Bacteriol. 2007;189(8):3228–36.CrossRefPubMedPubMedCentralGoogle Scholar
- 33.Ventura M, Canchaya C, Bernini V, Altermann E, Barrangou R, McGrath S, Claesson MJ, Li Y, Leahy S, Walker CD, et al. Comparative genomics and transcriptional analysis of prophages identified in the genomes of Lactobacillus gasseri, Lactobacillus salivarius, and Lactobacillus casei. Appl Environ Microbiol. 2006;72(5):3130–46.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.