CRISPR-Cas systems present in the family Vibrionaceae
Using BLAST and comparative genome analyses, we examined species belonging to the family Vibrionaceae available in the NCBI genome database for the presence of CRISPR-Cas systems. We identified eight different system types: type I-C, I-E, I-F, II-B, III-A, III-B, III-D, and IV as well as variants of these types and hybrid systems among 70 species (Additional file 1: Figure S1A). These CRISPR-Cas systems were sporadic in their occurrence and distribution within and among species. The majority of the systems were detected on MGEs such as genomic islands, plasmids, and transposon-like elements suggesting a possible vector for horizontal gene transfer (Additional file 1: Figure S1B). The most predominant type identified was the type I system, which accounted for 81% of the systems identified that encompassed type I-F, type I-E and type I-C systems. Within the type I systems, the type I-F subtype was the most abundant and was found across four genera, 41 species, and 116 strains (Additional file 1: Fig. S1C). A type II-B system was present in two Vibrio species and three Salinivibrio species (Additional file 1: Figure S1C). The type III systems were the next most prominent type making up 14% of the systems identified consisting of type III-A, type III-B, and type III-D. The rare type IV system was identified on a plasmid in two strains of V. parahaemolyticus. The distribution data, only present in a few strains of a particular species, leads to the most parsimonious conclusion that CRISPR-Cas systems are not ancestral to any species within this family.
CRISPR-Cas type I-F systems in V. cholerae
Using the Cas1, Cas3 and Cas6f proteins from the previously identified type I-F system in V. cholerae HC-36A1 as seeds, we examined V. cholerae genomes in the NCBI genome database using BLAST analysis. This analysis revealed the presence of variants of the type I-F system in addition to the canonical system (Fig. 1). A total of 35 distinct non-pathogenic V. cholerae strains contained a type I-F system (Additional file 2: Table S1), with 16 strains containing a canonical type I-F system consisting of six cas genes (cas1cas3cas8fcas5cas7cas6f) followed by a type I-F CRISPR array (Fig. 1a). One strain 490–93 had an additional gene between cas3 and cas8f that encodes a hypothetical protein (Fig. 1b). A type I-F system comprised of five cas genes (cas1cas3cas7fvcas5fvcas6f) was present in 13 strains, which encodes variant Cas7fv and Cas5fv proteins (Fig. 1c). This system shared homology with the type I-Fv previously described in Shewanella putrefaciens strain CN-32 that was shown to be an active system [41]. One strain, TM 11079–80, contained a CRISPR-Cas system comprised of cas1cas3tnpcas3cas7fvcas5fvcas6f with the cas3 gene and the CRISPR array flanked by transposase genes (Fig. 1d). These type I-F systems were all present in non-choleragenic strains within the previously described region VPI-6, which has all the hallmarks of a GIs acquired by horizontal gene transfer (HGT) [38, 39, 42]. VPI-6 is a ~ 29-kb island with a %GC content lower than the overall GC content of the genome and is absent from V. cholerae N16961, the cholera pandemic strain [36]. The island contains an integrase from the tyrosine recombinase family, which mediates site specific integration and excision of the island from the genome [36, 42, 43]. The ends of the islands are flanked by attachment sites, attL and attR marking the insertion site [36, 42]. Previously, we demonstrated that VPI-6 can excise from the chromosome as a complete unit indicating that the CRISPR-Cas system can be transferred with the rest of the island [36].
To determine whether the CRISPR-Cas system and the VPI-6 island had a similar evolutionary history and were acquired together, phylogenetic analysis of the cas1 gene from the type I-F system and intV gene from VPI-6 was performed. Overall there was congruency between the intV and cas1 gene trees, with four divergent branches within the intV tree, and strains found within these four divergent branches showed a somewhat similar branching pattern in the cas1 gene tree. This would suggest a similar evolutionary history. A few strains, 2012Env-2, HE-48, A325, showed different clustering patterns between the trees. However, the bootstrap values for many branches for the cas1 tree were low indicating the branching patterns are not robust as there were limited polymorphic sites (Additional file 1: Figure S2).
A mini type I-F system within a Tn7-like transposon
Our analysis identified five V. cholerae strains that contained a type I-F CRISPR-Cas region comprised of a four gene cluster, tniQcas5cas7cas6f (Fig. 2; Additional file 2: Table S1). In this system, cas5 is a fusion between cas8f and cas5f and lacked the adaptation proteins Cas1 and Cas2 as well as Cas3 required for target cleavage. The system was homologous to a previously described Cas1-less minimal system associated with a Tn7-like transposon in a large number of bacterial species [44]. The canonical Tn7 transposon is comprised of five genes tnsABCDE, which encode a TnsAB transposase, a regulator TnsC, and TnsD and TnsE required for target site insertion at a Tn7 specific attachment generating right end (R) and left end (L) att sites [45, 46]. In V. cholerae strains, the Tn7-like region consists of four genes, tnsABC homologs in an operon, and tniQ in an operon with cas5cas7cas6f genes. TniQ shows homology to TnsD, which targets a sequence specific site, attTn7, for insertion. No tnsE homolog was identified, which is usually responsible for directing the element into other MGEs [47, 48]. The CRISPR array associated with this system was short, containing two to three spacers and a type I-F direct repeat.
Examination of the genome context of these mini type I-F systems in V. cholerae strains identified the presence of three divergent mini type I-F systems. These mini type I-F systems were associated with at least three divergent Tn7-like transposons (Fig. 2a). In V. cholerae strains DRAKES2103, a 27-kb region was identified that was absent from N16961. The region encompassed the Tn7-like transposon, the mini type I-F CRISPR-Cas system and also contained a restriction modification system (Fig. 2a). To determine whether the entire region was cargo within the Tn7-like transposon, we examined the insertion site for Tn7-like right and left end attachment sites. In V. cholerae DRAKES2103, the element is inserted downstream of the gene for signal-recognition particle RNA (SRP-RNA), a novel Tn7 transposon attachment sites that was described previously by Peters and colleagues [44]. By convention the insertion site closest to tnsA is called the right (R) end and the distal end is named the left (L) end (Fig. 2a). Transposition generates a direct repeat usually 5-bps at each end, followed by a 3-bp sequence TGT and a 22-bp TnsB binding site at the right end, and at the left end a 3-bp ACA preceding the direct repeat which is itself preceded by a TnsB binding site (Table 1). In DRAKES2103, we identified the R and L end attTn7 sites that flanked the entire region suggesting mobilization by the element (Fig. 2a; Table 1). A divergent Tn7-like transposon was identified in V. cholerae TP, which encompassed a 35-kb element that included the mini type I-F system and a restriction modification system. We were also able to detect the R and L site flanking the entire region (Fig. 2; Table 1). Similarly, in strain L15, the element was inserted at the same site, however, the contig was short and the left end could not be determined (Fig. 2a).
Table 1 Mini type I-F-carrying Tn7 R-end and L end attachment sites In V. cholerae HE-45, the Tn7-like transposon was inserted at another novel Tn7 insertion site downstream of inosine-5′-monophosphate dehydrogenase (IMPDH) also annotated as guanosine 5′-monophosphate oxidoreductase (guaC) (Fig. 2b) (Table 1). The Tn7-like element encompassed a 36-kb region, containing a mini type I-F system and a restriction modification system. In strain 490–93, at the same genomic location in which the Tn7-like element for HE-45 was inserted, we identified a region with a mini type I-F system and a xylulose metabolism gene cluster, however due to short contig, we were unable to locate a Tn7-like element or R and L sites (Fig. 2b). This is the only V. cholerae strain in the NCBI genome database (> 900 genomes sequenced) that contains this metabolic cluster. Overall it appears that these Tn7-like elements have captured not only CRISPR-Cas defense systems but also restriction modification systems.
BLAST analysis using TniQ identified at least 40 additional species from the family Vibrionaceae that contained a copy of this protein and an associated Cas6f. Interestingly, a large number of V. parahaemolyticus strains (> 800 sequenced genomes) in the NCBI genome database contained a Tn7-like associated mini type I-F system. In this species, four divergent mini type I-F systems were identified that were carried within five different Tn7-like transposons, which had three different attachment sites; SRP-RNA on chromosome 1, and IMPDH and YciA (acyl-CoA thioester hydrolase) on chromosome 2 (Fig. 3). In all V. parahaemolyticus strains that contained a type three secretion system-2α (T3SS-2 α) on chromosome 2, the Tn7-like CRISPR-Cas region was directly upstream of the T3SS-2α gene cluster at the yciA R end attachment site (Fig. 3a). T3SS-2α is a contact dependent secretion system that delivers effector proteins directly into target eukaryotic cells and is the primary virulence mechanism of pandemic V. parahaemolyticus strains [49,50,51]. T3SS-2α was previously shown to be present on an 80-kb pathogenicity island named VpaI-7 that is only present in pathogenic isolates [49, 52, 53]. To investigate whether T3SS-2α is within the Tn7-like element, we examined the region for an attTn7 L end site and identified the L end at the 3′ end of the island (Fig. 3a). Vibrio parahaemolyticus strains that contained the non-homologous T3SS-2β system also contained an associated mini type I-F CRISPR-Cas system at the Tn7-like insertion site yciA (Fig. 3a). This entire region was flanked by R end and L end attTn7 sites (Fig. 3a and Table 1). In strain MAVP-Q, a variant of the T3SS-2β gene cluster named T3SS-2α is also present at this site but contains a highly divergent Tn7-like system with unique attTn7 sites flanking the 107-kb region. These data suggest a possible mechanism of mobilizing T3SS-2 within this species.
In strain ISF-25-6 that contains a T3SS-2β on chromosome 1, a variant Tn7 associated mini CRISPR-Cas type I-F system was present on chromosome 2 at the IMPDH locus between VPA1158 and VPA1159 relative to strain RIMD2210633 (Fig. 3a) (Table 1). This region also contains a restriction modification system and the entire 35-kb region is flanked by attTn7 sites (Table 1). In non-human pathogenic strains, the Tn7 associated mini type I-F system is located in chromosome 1 at the SRP-RNA insertion site between VP0953 and VP0954 relative to RIMD2210633 (Fig. 3b). At this site depending on the strain, two divergent CRISPR-Cas systems were present associated with two divergent Tn7-like transposons. In one strain CDC_K4762, the region also contained a type IV toxin antitoxin system, we were able to identify the R site, however due to a short contig, the L site could not be determined (Fig. 3b) (Table 1). Comparative genomic analysis indicated that the Tn7 associated CRISPR-Cas mini type I-F was acquired at least four times in this species.
To investigate further the evolutionary history of the Tn7-like transposons and the mini type I-F systems, we performed phylogenetic analysis on a select number of strains and species for which we knew that TniQ (TnsD-like), Cas6, and TnsA were co-located on the same contig (Fig. 4). For the TniQ and Cas6f trees, at least five highly divergent clades are present named A, B, C, D, and E and within each clade are several divergent lineages. For the most part, the TniQ and Cas6f trees showed overall congruency suggesting a shared evolutionary history, with a few exceptions. For the TniQ and Cas6f proteins present in V. cholerae, three strains cluster together in clade A and two strains clustered together within clade E in both trees. Similarly, pathogenic strains of V. parahaemolyticus clustered together within clade C whereas nonpathogenic strains cluster mainly within clade A on both trees. In both trees, TniQ and Cas6f from V. anguillarum and V. fluvalis clustered within clades D and E (Fig. 4). These data indicate both proteins share a similar evolutionary history in these species. In strains of both V. anguillarum and V. fluvalis, two non-homologous mini type I-F systems were present, and the second system is present at the SRP-RNA locus attTn7 site within clade A and B, respectively. Similarly, in the TnsA tree, for the most part, the clustering patterns hold, indicating that both the CRISPR systems and Tn7-like transposons share a similar evolutionary history suggesting they were acquired together as an evolutionary unit. However, some key discrepancies are present between the CRISPR protein trees and the Tn7 TnsA tree in a few species (Fig. 4). For example, V. anguillarum strain HI610, which clusters with V. anguillarum ATCC14181 in the TniQ and Cas6f trees is present on a divergent branch in clade A on the TnsA tree. Similarly, two V. ordallii strains cluster together on the CRISPR protein trees but are present on two divergent clades within the TnsA tree. This indicates two different Tn7-like transposons present at different attachment sites suggesting recent horizontal transfer (Fig. 4). This is also found for V. cholerae DRAKES2013 and V. mimicus VVM223, whose CRISPR region cluster together with other V. cholerae and V. mimicus strains but contained two divergent Tn7-like transposons but in this case, they are present at the same chromosomal insertion site (Fig. 4).
Putative hybrid CRISPR-Cas type III-B/I-F system
We identified eight V. cholerae strains that contained a putative type III-B/I-F hybrid system that consisted of type III-B operon cas7cas10cas5cas7cmr5cas7 followed by a resolvase and cas6f genes (Additional file 2: Table S7). One exception was strain BRV8, which had the same type III-B system genes followed by a gene encoding a hypothetical protein and then cas6f. The type III-B/I-F system in all 8 strains was associated with a 62-kb region inserted within chromosome 2 at ORF VCA0885 relative to the El Tor reference genome N16961, which lacked the region (Fig. 5a). Analysis of this 62-kb region using Phaster [54], a phage identification tool, showed that the region had homology to a Shigella phage Ss-VASDF (Fig. 5a). A 156-bp direct repeat was identified at the ends of the region suggesting the CRISPR-Cas system could be mobilized within the prophage (Fig. 5a).
The CRISPR array associated with the type III-B/I-F hybrid system contained a type I-F direct repeat and a type I-F PAM (Fig. 5c and b). In addition, in two V. metoecus strains YB4D01 and RC341, a highly similar type III-B/I-F hybrid system within a prophage highly homologous to that present in V. cholerae HE-45 was identified (Fig. 5a) (Additional file 2: Table S7). Phylogenetic analysis of an integrase gene, cmr1, and cas6f genes among five strains showed no congruency suggesting no shared evolutionary history. However, there was a limited number of polymorphic sites among the strains examined (Additional file 1: Figure S3).
Next, we examined the CRISPR arrays associated with the type I-F system and the putative type III-B/I-F hybrid system identified in V. cholerae. The CRISPRMap program classified the direct repeat sequence as a type I-F system repeat in all strains analyzed (Additional file 2: Tables S1 and S7). The arrays analyzed ranged in size from 2 to 83 spacers and a total of 1504 spacers were identified. Using the CRISPRTarget program to identify spacer homology in the plasmid and phage databases, we found that 356 of the 1504 spacers hit to protospacers. A total of 215 spacers matched to regions within the same sequences of phages X29/phi-2 (accession number KJ572845) and Kappa, as well as three filamentous phages fs2, fs-1 and KSF1 (Additional file 1: Figure S4). In V. cholerae strain 984–81, spacers had four targets to CTXphi. Several spacers were also found to target Vibrio phages, pYD21-A, YFJ, CP-T1, and Martha 12B12.
CRISPR-Cas type I-F systems within mobile genetic elements (MGEs)
We determined that 97% of the type I-F systems identified in this study were associated with MGEs, which was based on the presence of signature genes in the vicinity of the CRISPR-Cas genes and comparative genome analysis. For example, the type I-F system in V. metoecus YB5B04 was present within an 18-kb island integrated between a gene encoding a hypothetical protein and trmA, with respect to V. metoecus OYP8G12, which lacked the island (Additional file 1: Figure S5A). The 5′ end of the island was marked with int, which encoded a putative integrase required for site specific recombination. We also identified attL and attR sites flanking the island indicating that the 18-kb region was likely acquired as a unit by site specific recombination (Additional file 1: Figure S5A). The GC content of this region was 43% compared to the overall genome GC content of 47% suggesting it is not ancestral to the genome. In V. parahaemolyticus A4EZ703, a type I-F system was present within an island inserted between VPA0712 and VPA0713 with respect to V. parahaemolyticus RIMD2210633 that lacked the region (Additional file 1: Figure S5B). The 63-kb island had a GC content of 41%, compared to 45% across the genome. This island contained an integrase at its 3′ end and the island was flanked by attL and attR sites (Additional file 1: Figure S5B). The type I-F system in V. vulnificus 93 U204 was present within a 25-kb genomic island inserted between VV1_0634 and the tRNA-Met locus with respect to V. vulnificus CMCP6 that lacked the region (Additional file 1: Figure S5C). The type I-F system was also present within a genomic island region in V. fluvialis that contained an integrase gene (Additional file 1: Figure S5D). Although the CRISPR-Cas system are identified within different genomic islands in these strains, it is not possible to determine whether they were acquired with the island or whether they are a recent addition to the island.
Phylogenetic analysis of the Cas6f proteins
All Cas6f proteins identified in this study (Additional file 2: Tables S1, S2 and S7) were aligned using ClustalW and a phylogenetic tree was constructed by the neighbor-joining method [55, 56] (Fig. 6). The Cas6f proteins clustered into 10 major clades that had strong bootstrap values. The Cas6f proteins clustered together based on the type I-F system in which it was found. Clade I contained 13 V. cholerae strains and 1 V. parillis (formerly V. cholerae) strain, recovered between 1978 and 2009 from five continents and was associated with the variant type I-F (type I-FV2) systems containing five cas genes. Clade II contained Cas6f from 10 strains with the type III-B/I-F hybrid system, which were isolated mainly in the USA and Haiti with single strains from both the Ukraine and the UK. Clade III contained six strains, containing the canonical type I-F system from three genera, Aliivibrio, Photobacterium and Vibrio. Clade IV contained 17 V. cholerae strains that were recovered between 1981 and 2012 from four continents and was associated with the canonical type I-F system. Within this clade and closely related to V. cholerae were Cas6f from V. metoecus, V. fluvialis, V. navarrensis and five strains of Salinivibrio. (Fig. 6; Additional file 2: Table S1). Salinivibrio is a highly divergent species from Vibrio species in general [57]. Thus, this cluster within clade IV represents another example of horizontal transfer between two distantly related genera. Evidence of horizontal gene transfer include the presence of a Cas6f protein from V. navarrensis that was nested with V. cholerae Cas6f proteins within clade IV (Fig. 6). It was previously shown that V. navarrensis is distantly related to V. cholerae [58]. Clade IV also contains a lineage comprised of Cas6f from 10 V. parahaemolyticus strains, which all contained a canonical type I-F system, present within a genomic island at the same genome location that varied in size from 75-kb to 135-kb depending on the strain. Clade V consists of Cas6f proteins entirely from Salinivibrio species and was distantly related to those present in other Vibrionaceae genera (Fig. 6). Clade VI contained Cas6f from two Vibrio species, V. rhizosphaerae and V. spartinae (Fig. 6).
The Cas6f associated with the Tn7-like transposon mini type I-F system formed four highly divergent branches (clade VII-clade X) within which were highly variant Cas6f proteins, with some species present on multiple distantly related branches indicating in some species the system was acquired multiple times from diverse sources (Fig. 6).
Type I-E CRISPR-Cas systems in V. cholerae
A total of 29 V. cholerae strains contained a type I-E system (18 strains were previously described as classical biotype V. cholerae strains) (Additional file 2: Table S3) [34, 40]. In all strains, the type I-E system was carried on the genomic island GI-24 that is absent from N16961 (Fig. 7a) [34, 40]. It was demonstrated that the type I-E system present in the classical biotype strains was functional [34]. All strains contained the cas3cas8ecse2cas6cas7cas5cas1cas2 gene cluster followed by a CRISPR array and a canonical type I-E repeat (Fig. 7c). This is a variant cas gene arrangement for type I-E systems as the canonical cas gene arrangement is described as cas3cas8ecse2cas7cas5cas6cas1cas2 [15]. CRISPR arrays in these strains contained between 2 and 80 spacers and 44 of the 330 spacers targeted the Vibrio phage X29/phi2. Analysis of the protospacers identified allowed us to determine the PAM of these systems which we found to be 3’ NTT 5′ as previously describe for type I-E systems (Fig. 7b) [34].
Type I-E CRISPR-Cas systems in Vibrionaceae
A total of 28 strains encompassing ten species of Vibrio, four species of Photobacterium, nine species of Salinivibrio contained a type I-E system (Additional file 2: Table S4). In V. metoecus YB5B06, the type I-E system was present within GI-24 similar to V. cholerae classical strains (Fig. 7a). The type I-E system was also identified in V. albensis strains ATCC 14547 and VL426 and was present in a 12-kb genomic region inserted at the same genomic location as GI-24 with respect to N16961, however, no integrase was identified (Fig. 7a; Additional file 2: Table S4).
Vibrio azureus LC2–005 and NBRC 104587 also contained the I-E system, each with two CRISPR arrays. We did not identify any spacer hits for either of these two strains. A canonical type I-E system consisting of cas3cas8ecse2cas7cas5cas6ecas1cas2 was present in two strains of V. gazogenes, CECT 5068 and DSM 21264. The associated type I-E arrays consisted of 15 total spacers (13 and 2 spacers, respectively), however no protospacer matches were identified. We identified a canonical type I-E system in V. harveyi ATCC 43516 with 37 spacers. In S. sharmensis DSM 18182, the type I-E systems had 79 spacers with protospacer targets in Salinivibrio phage SMHB1 (Additional file 2: Table S4).
CRISPR-Cas type I-E system present within an excisable genomic island GI-24
Based on our in silico analysis, 88% of the type I-E systems present in Vibrionaceae were carried on a MGE, including those present in GI-24, which was present in V. cholerae, V. metoecus and V. albensis (Fig. 7a). GI-24 contained an integrase required for site specific integration and conserved attL and attR sites that mark integration at the ends of GI-24 (Fig. 8a). The presence of integrase also suggests that the island can excise from the chromosome and be mobilized as a unit. To further investigate this, we performed a GI-24 excision assay that we have used previously to detect excision of several genomic islands in V. cholerae [36, 42, 43, 59]. A two-stage nested PCR approach was used to examine excision of GI-24 in V. cholerae classical strain O395 by detecting both a circular intermediate (attP) of the excised GI-24 and an empty GI-24 chromosomal insertion site (attB) after excision (Fig. 8b). We used V. cholerae N16961, which does not contain GI-24, as a control strain for the attB excision assay, which detects an empty insertion site in the chromosome. Using genomic DNA isolated from overnight cultures as template, we detected a PCR product of the expected size in an attB assay in N16961 in the first PCR reaction, but no product was detected for O395. This could suggest that in O395 excision does not occur or that it occurs at very low rates. Therefore, using as template the PCR reaction from round 1, we performed a second PCR attB reaction with a second primer pair. In this assay, we detected an attB PCR product from O395 indicating that GI-24 can excise but does so at low rates (Fig. 8c). To detect the GI-24 circular intermediate attP, we performed PCR using attP primers, after the first round of PCR, no product was produced but after the second round of PCR using the PCR cocktail from the first round as template, the expected attP PCR band was present for O395 demonstrating excision of GI-24 (Fig. 8d). These data show that the type I-E system is part of GI-24 and can be excised with the entire region, a likely first step in its transfer. Phylogenetic analysis was performed to determine whether the integrase and cas8e genes shared a common evolutionary history. There were a limited number of polymorphic sites among the strains examined, however in both trees V. metoecus formed a divergent branch from V. cholerae strains. Whereas eight V. cholerae strains shared identical clustering patterns in both trees (Additional file 1: Figure S6).
The type I-E system present in V. harveyi ATCC 43516 was carried on an 85-kb region inserted between LA59_08695 and LA59_08700, with respect to V. harveyi ATCC 33843, which lacked the region (Fig. 7d). The 85-kb region had a GC content of 40%, compared to 45% for the whole genome, however no integrase or transposase genes were identified. The region also contained genes for a type three secretion system (T3SS) (Fig. 7d). In P. profundum SS9 and V. halioticoli NBRC 102217, the type I-E system was identified within a homologous conjugative plasmid suggesting horizontal transfer between these distantly related species.
Phylogenetic analysis of Cas8e proteins
The Cas8e protein sequences were aligned and a neighbor-joining tree was constructed. The branching patterns demonstrates the presence of six major clades designated I to VI (Additional file 1: Figure S7). We identified 12 V. cholerae biotype classical strains that contained highly homologous Cas8e proteins that clustered in lineage I with a Cas8e protein from V. metoecus strain YB5B06 and two Cas8e proteins from two V. albensis strains (Additional file 1: Figure S7). Divergent but related to this group were Cas8e proteins from two strains of V. azureus in lineage II. The next three divergent lineages, III, IV and V grouped Cas8e proteins based on the genus and species they were present in. Lineage III consisted of Cas8e proteins from 4 strains of V. gazogenes and one strain each of V. spartinae and V. ruber that were all highly related. Lineage IV was comprised of Cas8e from V. parahaemolyticus and V. harveyi clustered together and branching with these were Cas8e from two Photobacterium species (Additional file 1: Figure S7). Lineage V was comprised of Cas8e from 8 strains of Salinivibrio and one strain of Photobacterium galatheae. Finally, clade VI consisted of the two most divergent Cas8e proteins from V. halioticoli NBRC 102217 and P. profundum SS9, which contained a variant type I-E system carried on a plasmid.
Type I-C CRISPR-Cas systems in Vibrionaceae
Previously, we identified a type I-C system in V. metschnikovii CIP 69.14 [36]. We used the Cas proteins from this species as seeds in BLAST searches to identify putative systems in the Vibrionaceae. This analysis identified type I-C CRISPR-Cas systems in 12 species; V. metschnikovii, V. cidicii, V. hangzhousensis, V. navarrensis, P. aquimaris, P. marinum, V. anguillarum, V. salilacus, V. fujianensis, Vibrio sp. V03-P4A6T147, Salinivibrio sp. DV, and Photobacterium sp. CECT 9192 (Additional file 2: Table S5). All type I-C systems identified, with the exception of the one present in V. anguillarum, contained the canonical CRISPR-Cas type I-C cas gene arrangement and a type I-C 32-bp direct repeat (Additional file 2: Table S5).
In Vibrio sp. V03-P4A6T147 and V. hangzhouensis CGMCC 1.7062, we were unable to identify CRISPR arrays due to short contig sequences. Across the remaining species there were a total of 491 spacers identified, and each had a conserved type I-C PAM. In P. marinum, there were two CRISPR arrays flanking the type I-C cas gene cluster each with a type I-C repeat. The CRISPR arrays ranged in size from 2 spacers up to 179 spacers present in Salinivibrio sp. DV, the largest array identified in this study (Additional file 2: Table S5). Protospacer targets were identified for 31 spacers from a total of 491 and of these 31 targets 16 were hits to the Salinivibrio phage SMHB1.
Type I-C CRISPR-Cas system present within a Tn7-like transposon
The type I-C system in V. navarrensis ATCC 51183 was present within a 53-kb region that was inserted within a Tandem-95 repeat protein, with respect to V. navarrensis 08–2426, which lacked the region (Fig. 9a). This 53-kb region in ATC51183 contained at least 11 different transposase genes, which flanked three different modules within the region; a restriction modification (RM) system, a mini type I-F system tniQcas5cas7cas6 and a complete type I-C system with a CRISPR array. A Tn7-like transposon (tnsABC) was present at the 5′ end of the island at a SRP-RNA insertion site as described in V. cholerae and V. parahaemolyticus. We identified attTn7 sites that encompassed a RM system, the mini type I-F system and the complete CRISPR-Cas type I-C system (Table 1). A region within the element also contained two copies of a reverse transcriptase (RT) with group II intron origin (RT-G2_intron), a P-loop NTPase (TniB), and a protein with a TIR-like domain, which were also flanked by transposase genes. The type I-C CRISPR-Cas system contained a CRISPR array with a type I-C repeat and the PAM motif was also identified (Fig. 9b and c). Of the four sequenced V. navarrensis genomes only ATCC51183 contained this region.
In V. cidicii 1048–83, the type I-C system is present within a 25-kb region that contains three transposases genes and had a GC content of 40%, compared to 48% GC content for the entire genome (Fig. 9e). In V. anguillarum PF7, V. hangzhouensis and P. aquimaris, the type I-C systems was within a region that contained both transposases and integrase genes. However, several strains of Vibrio, Photobacterium and Salinivibrio contained only a complete type I-C system integrated within the genomes with no additional genes present suggesting it was the sole acquisition at the insertion site (Additional file 1: Figure S8). Overall it appears that the CRISPR-Cas systems in these species were acquired as distinct unit or modules and not within any identifiable MGE.
Phylogenetic analysis of the Cas8c proteins present in Vibrionaceae showed that V. metschnikovii and V. navarrensis Cas8c proteins were closely related to each other but were the most divergent Cas8c proteins and formed a separate highly divergent branch. Cas8c (Fig. 9d). In V. metschnikovii CIP69.14, the type I-C system was not associated with any MGE or signature MGE genes. In V. anguillarum PF7, two divergently transcribed cas gene clusters are present, cas3cas5cas8cas7 and cas3cas5cas8cas7cas4cas1cas2. The Cas8c proteins from this species clustered within two distinct lineages, one with Cas8c from V. salilacus, Vibrio sp., V. fujianensis and Salinivibrio sp. DV and the second with Cas8c proteins from V. cidicii and V. hangzhouensis. The Cas8c proteins from three Photobacterium species clustered together with long-branch lengths indicating they are not closely related to each other. (Fig. 9d).
Type II-B CRISPR-Cas system in Vibrionaceae
Currently, no type II CRISPR systems have been characterized in Vibrionaceae. A recent study showed Legionella pneumophila contained a type II-B CRISPR-Cas systems, therefore we used this Cas9 as a seed to examine Vibrionaceae [60]. Five species were identified that contained a homolog of Cas9: V. natriegens CCUG 116373, V. sagamiensis NBRC 104589, S. sharmensis CBH463, two strains of S. kushneri, and Salinivibrio sp. ML323. All six strains were found to have the complete type II-B cas gene cluster of cas9cas1cas2cas4 (Additional file 2: Table S6; Fig. 10).
Analyzing the CRISPR array, a type II-B system repeat sequence of 37-bp was identified in Vibrio and in Salinivibrio strains (Fig. 10b). We used CRISPRone to detect the trans-activating crRNA (tracrRNA), which is usually located between the cas genes and CRISPR array region and is complementary to the repeat sequence of the type II-B system, allowing it to pair with the repeat fragment of the pre-crRNA for interference [61]. We identified the tracrRNA downstream of the cas1 in three out of the six strains analyzed as shown for S. kushneri IC202 and S. sharmensis CBH463 (Fig. 10e). The inability to detect the tracrRNA in the other three strains could be due to the threshold of 15 nucleotide match and at most two mismatches for the paring length set by the program [62]. Spacer analysis identified from 3 to 51 spacers among the strains with a total of 130 spacers and 15 putative protospacers were identified (Additional file 2: Table S6). Using these protospacers, we were able to identify the PAM sequence for these II-B systems and found it to be a 3’NGG 5′ (Fig. 10c), which is in agreement with what was previously shown in Francisella novicida [63].
Phylogenetic analysis based on the Cas9 obtained from the 6 strains demonstrated two major clades. Clade separation was genus specific: Clade I contains species belonging to Salinivibrio and showed highly related Cas9 proteins among three species. Divergent from these were Cas9 proteins in clade II from two Vibrio species (Fig. 10d).
CRISPR-Cas type II-B systems present within MGEs
The type II-B system in V. natriegens CCUG 16373 was present within a 30-kb region that was inserted adjacent to a tRNA-Met locus that was absent from V. natriegens CCUG16374. (Fig. 10a). The 30-kb region contained a restriction modification system and three integrases, one of which was adjacent to the tRNA locus suggesting site specific integration (Fig. 10a). Within two Salinivibrio species, the type II-B system is also present within a genomic island that contains an integrase and is inserted at a tRNA locus (Fig. 10e).
Type III CRISPR-Cas systems in Vibrionaceae
We used the Cas10 protein from the putative hybrid type III-B/I-F system to determine whether other species contained type III systems within Vibrionaceae. We identified 15 species that contained a type III system (Additional file 2: Table S7). Based on cas gene arrangement and cas gene homology, three subtypes were identified: type III-A, type III-B, and type III-D (Additional file 2: Table S7). In addition to these subtypes, we also uncovered a hybrid type III-B/I-F hybrid system in V. palustris CECT 9027 and Salinivibrio sp. DV (Additional file 2: Table S7). Interestingly the type I-F direct repeat in Salinivibrio sp. DV was identical to the repeat present in V. metoecus YB4D01 and V. cholerae (Additional file 2: Table S7). This suggests a common origin in distantly related species and recent horizontal transfer between these genera. In addition, we identified a type III-B system in V. spartinae CECT 9026 with three type I-F CRISPR arrays (Additional file 2: Table S7). In Salinivibrio sp. MA351, we identified a III-B system followed by a type I-F array but this system also clustered with a complete type I-F system (Additional file 2: Table S7).
The genome sequence for four V. gazogenes strains ATCC 43941, ATCC 43942, CECT 5068 and DSM 21264 each contained at least one type III system. Vibrio gazogenes ATCC 43941 and ATCC 43942 harbored identical type III-B systems on chromosome 1 with cas2cas1 divergently transcribed from hphpcmr1cas10cmr3cmr4cmr5cmr6 with two CRISPR arrays, one at each end of the cas gene clusters (Additional file 1: Figure S9A). Strains CECT 5068 and DSM 21264 harbored a homologous type III-B system with two CRISPR arrays and is found on chromosome 2 (Additional file 1: Figure S9B). These strains also contained a type III-A system with two arrays, one at each end of the cas loci (Additional files 2 and 1: Table S7; Figure S9C).
We identified an additional five strains with a type III-A system containing the cas gene arrangement of cas10cas7cas5cas7cas1cas2 (Additional file 2: Table S7). The type III-A system in these strains contained one type III-A CRISPR array, with the exception of P. aphoticum JCM 19237, which contained two type III-A CRISPR arrays. Seven strains containing a type III-D system were also identified containing cas10csm3csx10csm3csx19cas7cas6 along with cas1cas2 genes in close proximity (Additional file 2: Table S7).
In 18 of the strains with a type III system characterized in this study, cas1cas2 were present. Of note was the presence of a reverse transcriptase (RT) domain in 14 of the 18 Cas1 proteins identified. In type III-A system of V. gazogenes CECT 5068 and DSM 212464, the Cas1 protein is fused with RT and Cas6 domains. In 11 strains with either a type III-A, III-B or III-D system, only RT and Cas1 domains are fused. In P. aphoticum JCM 19237, the RT encoding gene is adjacent to the cas1. These RT containing Cas1 proteins have been shown previously to be primarily found in proximity to type III systems, are not specific to any subtype, and function autonomously [64]. In addition, the reverse transcriptase activity of the RT-Cas1 domain is required for spacer acquisition from RNA [65].
Neighbor-joining trees were constructed from the Cas1 domain sequences and the Cas10 proteins to determine the evolutionary history of these proteins. In clade I of the Cas1 domain tree, the seven strains containing a type III-A system are clustered (Additional file 1: Figure S10A). This clade contained two V. gazogenes strains with a cas1 gene with cas6 and retron domains and are distantly related to cas1 genes from Photobacterium and Vibrio species. In clade II, the Cas1 from the four strains containing a type III-B cluster together from 4 V. gazogenes strains (Additional file 1: Figure S10A). In these strains, the Cas1 is directly next to but transcribed divergently from the type III-B system cas genes and contains a retron domain. Clade III consists of three Vibrio species with a type III-D system. The Cas1 from these three strains has a fused RT domain (Additional file 1: Figure S10A). Clade IV contains four species with a type III-D system that formed the most divergent cluster with a Cas1 only domain. This clade is highly divergent, characterized by long-branch lengths (Additional file 1: Figure S10A).
In the Cas10 tree from the strains containing Cas1, the proteins are separated based on the subtypes, with all type III-A clustered together, type III-B clustered together and all type III-D clustered together (Additional file 1: Figure S10A-B). The Cas10 from the seven strains containing a type III-D system are much more closely related and cluster in one single clade (Additional file 1: Figure S10B). In the Cas10 tree, proteins from the type III-B are the most divergent (Additional file 1: Figure S10A-B). These data suggest that each type III systems share a similar evolutionary history which is not the case within the Cas1 domain tree.
CRISPR-Cas type III systems within MIGEs
As described above, the CRISPR-Cas type III-B/I-F putative hybrid system was associated with a prophage in both V. cholerae and V. metoecus. In V. metoecus 07–2435, a type III-A system was carried on a 26-kb island, which contains an integrase at the 3′ end of the island and a transposase towards the 5′ end of the island (Fig. 11a). The island was inserted at a tRNA-Leu locus and the region has a GC content of 44%, compared to 47% for the genome. In V. sinaloensis T08 a type III-A system was present on a 46-kb island flanked at the 3′ by an integrase and is inserted at an L-threonine 3-dehydrogenase with respect to strain AD048 which lacked the island (Fig. 11b). The region had a GC content of 43% compared to 46% across the entire genome. In V. vulnificus YJ016, a type III-D is carried on a 22-kb island and is inserted in between VV2_1039 and VV2_1038 with respect to V. vulnificus CMCP6 that lacked the region (Fig. 11c). We also identified a transposase associated with this island. Finally, in V. breoganii ZF-29, the III-D system is present on a 22-kb island between A6E01_18830 and A6E01_18835 relative to strain FF50, which lacks the entire region (Fig. 11d). The data suggests that these CRISPR-Cas systems are present within regions recently acquired but does not indicate that they were acquired with the element or were added later.
Phylogenetic analysis of all the Cas10 proteins identified demonstrated that these systems are highly divergent from one another. One exception is the Cas10 from the hybrid type III-B/I-F systems associated with a prophage, which all clustered together (Additional file 1: Figure S11). Branching from these type III-B/I-F hybrid systems were V. palustris CECT 9027, V. spartinae CECT 9026 and Salinivibrio sp. MA351 which all have a type III-B system and I-F arrays. In clade II, Cas10 proteins from V. gazogenes strains cluster together demonstrating homologous type III-B systems. Clade III is comprised of the seven strains containing a type III-A system, which are homologous to each other and encompasses Photobacterium and Vibrio species. Clade IV contains the diverse type III-D systems and comprise of Cas10 proteins from Vibrio, Salinivibrio and Photobacterium species (Additional file 1: Figure S11).
Type IV CRISPR-Cas systems in Vibrionaceae
Type IV systems are rare and all have been discovered to be present on plasmids [15, 26]. In our analysis of V. parahaemolyticus, we identified two strains containing type IV systems also carried on plasmids (Fig. 12a). These systems were homologous to the type IV system on a plasmid in Shigella sp. FC 130 and consisted of csf4cas6-likecsf1csf2csf3 gene arrangement (Fig. 12b). In V. parahaemolyticus MAVP-21, the type IV system has an associated CRISPR array consisting of 24 spacers and a direct repeat sequence of 5’ACTCTTTAACCCCCTTAGGTACGGG 3′. A sequenced plasmid, S91, also carried a type IV system associated with an array with 19 spacers and a direct repeat sequence of 5’ TTAACCCCCGTACAAACGGGGAAGAC 3′. Between the two CRISPR arrays, we did not identify spacer targets.