The impact of the neisserial DNA uptake sequences on genome evolution and stability
- 8.8k Downloads
Efficient natural transformation in Neisseria requires the presence of short DNA uptake sequences (DUSs). Doubts remain whether DUSs propagate by pure selfish molecular drive or are selected for 'safe sex' among conspecifics.
Six neisserial genomes were aligned to identify gene conversion fragments, DUS distribution, spacing, and conservation. We found a strong link between recombination and DUS: DUS spacing matches the size of conversion fragments; genomes with shorter conversion fragments have more DUSs and more conserved DUSs; and conversion fragments are enriched in DUSs. Many recent and singly occurring DUSs exhibit too high divergence with homologous sequences in other genomes to have arisen by point mutation, suggesting their appearance by recombination. DUSs are over-represented in the core genome, under-represented in regions under diversification, and absent in both recently acquired genes and recently lost core genes. This suggests that DUSs are implicated in genome stability rather than in generating adaptive variation. DUS elements are most frequent in the permissive locations of the core genome but are themselves highly conserved, undergoing mutation selection balance and/or molecular drive. Similar preliminary results were found for the functionally analogous uptake signal sequence in Pasteurellaceae.
As do many other pathogens, Neisseria and Pasteurellaceae have hyperdynamic genomes that generate deleterious mutations by intrachromosomal recombination and by transient hypermutation. The results presented here suggest that transformation in Neisseria and Pasteurellaceae allows them to counteract the deleterious effects of genome instability in the core genome. Thus, rather than promoting hypervariation, bacterial sex could be regenerative.
KeywordsAdditional Data File Natural Transformation Core Genome Gene Conversion Event Clonal Interference
DNA uptake sequence
horizontal gene transfer
uptake signal sequence.
The act of combining genetic information from two different individuals is ubiquitous among living organisms. Genetic exchange can take the form of sexual reproduction in some eukaryotes, whereas in most prokaryotes it is the result of horizontal transfer of DNA from a donor to a recipient cell. Horizontal transfer may result in the introduction of new and radically different genetic information or in the allelic replacement of existing genetic loci by homologous recombination. Among the three mechanisms that facilitate horizontal gene transfer (HGT), natural transformation is often referred to as the bacterial equivalent of meiotic sex in eukaryotes. This is because it re-assorts genetic information among members of the same species and, contrary to transduction and conjugation, is a process under the direct control of the recipient cell . Because maintenance of the capacity for transformation is strictly dependent on its having positive effects on the fitness of the recipient bacteria, it might be regarded as the mechanism of choice for elucidating the advantages of sex in prokaryotes.
Many bacterial species are naturally competent for transformation, some constitutively so, whereas others are competent in response to specific environmental conditions . Naturally competent bacteria have evolved mechanisms or strategies to avoid entry of heterologous and potentially harmful DNA . Similar to reproductive barriers that exist between eukaryotic species, a preference for homologous DNA over heterologous DNA is evident in a range of competent species through adaptations including induction of competence by quorum sensing, presence of restriction modification systems, and blockage of heterologous recombination by stringent RecA function or mismatch repair .
Transformation in Neisseria spp. and members of the family Pasteurellaceae requires the presence of a specific DNA uptake sequence (DUS)  or uptake signal sequence (USS) [6, 7], respectively, in the incoming DNA. These signals allow discrimination between DNA from closely related strains or species and foreign/unrelated DNA. The DUS of Neisseria spp. is a short signal extending 10 nucleotides: 5'-GCCGTCTGAA-3' . It is present in approximately 2,000 copies occupying 1% of the sequenced neisserial genomes, which is much more than expected given the sizes of the genomes and their composition, and can only be maintained by strong counteraction to drift [9, 10]. The efficacy of transformation is higher if the 10-mer DUS is preceded by an A and a T . The 10-nucleotide signal is required and sufficient for transformation  and is the one considered in this study. However, because 75% of 10-nucleotide DUSs also are also extended 12-nucleotide DUSs, this should not affect our conclusions. DUSs often appear as closely spaced inverted repeats that function as rho-independent transcription terminators [11, 12, 13]. This local arrangement of inverted repeats does not lead to a change in the efficacy of transformation, which only depends on the presence of a single DUS .
Transformation has traditionally been studied and conceptualized as a succession of distinct stages: surface binding/entry through an outer membrane pore, transit across the periplasm and the inner membrane, and genome integration. However, recent studies have demonstrated that these processes, at least in Bacillus subtilis, are tightly linked in both space and time  and the term 'transformation complex' has been coined. DNA has, per definition, been taken up when it is no longer degradable by DNase, but more research is needed to appreciate fully the physical implications of the DNase protected state and exactly where DUS specificity acts.
Two major theories have been proposed to account for the origin and maintenance of DUS signals. Classically, DUSs have been regarded as cellular guardians that prevent the entry of potentially damaging non-DUS containing sequences, such as naked DNA from phages, plasmids, or transposable elements. Indeed, DUS specificity effectively disfavors DNA originating from distantly related species because these lack DUSs. It has also been suggested that DUS-specific transformation may lead to molecular drive . If the DNA uptake machinery by some physical means has a preference for DUSs, then sequences containing a DUS are more likely to be transferred and, consequently, effectively accumulate in the genomes of Neisseria. At the extreme end of this concept, it has been suggested that DUSs increase in frequency purely because of molecular drive, independently of any putative positive effect on fitness (selfish DUS hypothesis) [15, 16].
Molecular drive is a model of evolution that provides an explanation alternative to natural selection and is based on purely stochastic preferential uptake of DUS-containing DNA. Preferential DNA uptake is a biologic mechanism and should not be confused with molecular drive, which is a model of evolution and might be one of its consequences. The mechanism of preferential DNA uptake may also be involved in DUS/USS fixation by classic natural selection driven by the advantage of taking up conspecific DUS-containing DNA or preventing the entry of alien sequences. The darwinian model generally seeks the potential selective advantages of sex and particularly 'safe sex', whereas the molecular drive model seeks to explain how these genomes can tolerate such large amounts of an 'intrusive' repetitive sequence without discretion, and in essence how DUSs/USSs can accumulate without being positively selected for by forces affecting the fitness of the organism.
Competent bacteria have invested extensively in complex machineries to facilitate transformation, involving a comprehensive range of competence and recombination proteins . Neisseria spp. express type IV pili that are required for transformation . Furthermore, a type IV secretion system that exports DNA into the environment has been described in most gonococci and some strains of meningococci [19, 20]. Thus, neisserial sex is an active process mediated by specific machineries that can import and export genetic information. Competence for transformation in Neisseria is constitutive throughout its growth cycle and does not depend on environmental conditions . Studies of population structures, which in nature may range from complete clonality to panmixia, have shown that transformation in the pathogenic Neisseria has fuelled high rates of recombination . It has been estimated that an allele of the Neisseria meningitidis genome is ten times more likely to change by recombination than by point mutation .
The reasons for such an intense recombination rate have often been associated with the lifestyle of Neisseria spp. and in particular with virulence in humans. Members of the genus Neisseria and the family Pasteurellaceae populate the mucosal surfaces of humans and animals. Of particular clinical significance are N. meningitidis and Haemophilus influenzae, which are leading causes of bacterial meningitis and septicemia worldwide , and Neisseria gonorrhoeae, which causes the sexually transmitted disease gonorrhea . The commensal Neisseria lactamica is commonly found in the upper respiratory tract of young children and teenagers and may contribute to immunity to meningococcal disease . Analyses of the four published neisserial genomes revealed high densities of repeated elements [27, 28, 29, 30]. Intrachromosomal recombination between these repeats is a major source of variability in Neisseria, resulting in frequent adaptive changes in gene expression profiles [31, 32] and even re-occurring states of hypermutability [33, 34].
Given the role of HGT in genome fluidity, elucidation of the evolutionary role of natural transformation is pivotal to our understanding of prokaryotic adaptation. The abundant DUS and USS elements are required for efficient natural transformation in Neisseria and Pasteurellaceae members, respectively. If these repeat sequences are markers of selection for transformation, as commonly believed, then their differential presence and conservation across a genome may also contribute to our understanding of the advantages of sex, which is a longstanding question in evolutionary biology [35, 36]. Here, we used the potential provided by the availability of six complete neisserial genomes to align globally the core genome and to define the sets of genes that are ubiquitous and those that were recently acquired or recently lost in each group. These multiple genome alignments warrant a new and powerful approach to address the puzzle of the origin and fate of DUSs in these genomes and the association between these signals and recombination events. In this work we use the term 'recombination' for the process of homologous recombination between the chromosome and DNA from other cells. A striking correlation between the average distance between DUSs and the length of conversion fragments was found, which indicates that the process of transformation is tightly linked to and even shaped by recombination. The presence of unique DUSs that interrupt otherwise conserved regions in neisserial alignments further emphasizes the role of recombination in DUS evolution. Within the limits of available data, we find similar results when analyzing the genome of H. influenzae. The findings presented here enhance the influence of allelic replacement as the bacterial equivalent of sex and the role of transformation in genome maintenance.
Global genome alignments
Despite this phylogenetic proximity, several rearrangements have accumulated after the divergence of these genomes . This is a consequence of the high numbers of repeats that these genomes contain and requires the use of a multiple alignment method that handles rearrangements and duplications, such as M-GCAT. The global multiple alignment was composed of 79 co-linear regions, with breakpoints induced by the chromosomal rearrangements, and covered approximately 82.5% of each genome (Additional data file 1). Within these common regions there was a high percentage of monomorphic sites, and, overall, the multiple alignment contained homologous regions with high sequence identity present in all of the neisserial genomes. We also conducted a global alignment of four H. influenzae genomes that resulted in 53 co-linear alignment regions (Additional data file 6). The genomes of the other members of the Pasteurellaceae could not be globally aligned because of the relatively large phylogenetic distances involved. Because the multiple genome alignment in this clade only includes four closely related strains of a single species, the range of analysis that could be made was much more limited.
The distribution of neisserial DUS corresponds to the length of conversion fragments
Because DUSs are required for transformation, a nonrandom positioning or conservation of this repeat (also USS) could pinpoint distinct effects on recombination and thus shed some light on the role(s) of bacterial sex. Within certain limits, a high local density of DUS is expected to increase the probability of transfer of the corresponding chromosomal region. A linear relationship between the affinity for DNA in transformation and the frequency of DUSs on the segment has indeed been demonstrated in a competitive assay . However, only a single DUS is required for efficient transformation, and two very closely spaced motifs do not increase the transformation efficiency . The usual interpretation of these results is that one DUS is enough for transformation, but because DNA is sheared in the environment a higher density of DUS increases the probability that a given fragment will contain a DUS and thus enter the cell and recombine. The positive effect of DUS density in conversion will become smaller with the increase in DUS density up to the point at which the selective effect is too weak to counterbalance drift. Selection for high DUS density will also depend on the size of DNA fragments that are taken in by the cell. If fragments are smaller for a species, then there should be compensatory selection for higher DUS density. One would thus expect that the limits of selection or molecular drive to increase DUS density were indicated by the distribution of sizes of conversion fragments. If conversion fragments are large, then a high DUS density would not be maintained. If, on the other hand, these fragments are small, then DUSs could be more tightly packed in the chromosome. If selection or molecular drive varies along the chromosome, then DUSs should not be homogeneously distributed throughout the genome, and more recombining regions should contain more of these elements.
Genome size, number of genes, and DUS 10-mers distribution in the genomes of Neisseria
Genome size (kb)
Number of genes
% in genes
N. meningitis Z2491
N. meningitis MC58
N. meningitis FAM18
N. meningitis 8013
The stringent conservation of DUS
The global multiple genome alignment allows the identification of DUSs located in regions that can be aligned, namely in the core genome, and to study how they changed over time. For this purpose, it suffices to identify the location of DUSs in the alignment and analyze the corresponding sequence columns. Previous studies have shown that some very distant orthologous genes maintain the presence of USSs in their sequences, but without precisely studying the conservation of the motif sequences . In this study, the M-GCAT global genome alignment allowed precise prediction of the evolution of these elements in each of the aligned positions. DUSs were found to be highly conserved; in fact, they were much more conserved than the average conserved sequence (about 85% identity), exhibiting on average 97% sequence identity. Additionally, 71% of the DUSs in the multiple alignments were exactly conserved in all genomes. In the four N. meningitidis genomes, which are most closely related, the number of exactly conserved DUSs was greater than 90%.
It might be argued that the frequency of DUS-1 elements suggests that if DUSs are positively selected, then selection is very weak. However, it must be noted that there are 30 different DUS-1 sequences and only one DUS sequence. Under mutation selection balance, the frequency of DUS/DUS-1 is given by , where m is thus 1/30, Ne the effective population size, and s the selection coefficient (the fractional advantage of DUS over DUS-1). The effective population size was estimated at 105 in N. meningitidis using the population mutation rate (2Neμ) of 3 * 10-2, as consistently found by Jolley and coworkers , and the average wild-type mutation rate (μ) of about 1.5 * 10-7, as found by Bucci and colleagues . The observed ratio DUS/DUS-1 of about 2.4 in the aligned regions leads to a coefficient of selection of about 2 * 10-5. Usually, one considers that a mutation will tend to escape drift if 2Nes is greater than 1. In this case, one obtains an 2Nes of about 4, which is sufficiently large for purifying selection to be effective and for DUSs to be under mutation selection balance .
DUSs arise by recombination
If DUSs are under mutation selection balance, and mutation rates are the same in every genome, then genomes with more DUSs result from stronger purifying selection on DUS (counter-selection of cells containing degenerated DUSs). If DUSs are under molecular drive, then more frequent conversion would lead to more DUSs. In both cases, DUSs should be more conserved in such a genome (there should be fewer DUS-1 elements for every DUS). This is because, as mentioned above, DUS-1 elements have been found not to increase transformation rates relative to no DUSs at all . The N. lactamica genome contained the highest number of DUSs in the aligned regions and yet the lowest number of DUS-1 elements (Table 1). We detected 308 DUSs in the conversion fragments identified in N. lactamica. This is significantly more than the 270 that were expected, given the size of these regions and the DUS density in the multiple alignment (P < 0.01, χ2 test). Thus, in the genome with the higher density of DUSs, these motifs are more conserved and we find the smallest gene conversion fragments along with an over-representation of DUSs in the fragments. Collectively, this evidence points toward DUS integration in the genome by recombination and subsequent selection to allow conspecific natural transformation.
DUSs do not promote genetic diversity in Neisseria
Because it is a widely held belief that the evolutionary role of natural transformation is to generate diversity in genomes, we were surprised to discover that predicted horizontally transferred regions contain very few DUSs. Laterally acquired genes of N. meningitidis Z2491 and MC58 were collected from the HGT-DB database  and scanned for the presence of DUSs. These predicted recently transferred genes with low %G+C amounted to 5.6% of the Z2491 genome and 7.1% of the MC58 genome, but they contained only 2% (38) and 2.6% (51) of the total number of DUSs, respectively (P < 0.001, χ2 test). These horizontally transferred regions thus held significantly fewer DUSs as compared with the genome average, suggesting that DUS-mediated transformation is not associated with gene flux, which may arise by other means such as transduction.
The previous analysis was conducted in genes with peculiar sequence composition, which typically represent horizontally transferred genes from distant species. However, because these sequences are often A+T rich , they might be expected to lack the G+C-rich DUS sequences. Furthermore, methods based on atypical sequence composition miss the transfers from genomes of similar oligonucleotide composition. Therefore, we conducted a more rigorous and conservative analysis of transfer by using the presence and absence of sequences within the six genome sequences. We first counted the number of DUSs in the regions not aligned by M-GCAT (in the regions containing genes that are not ubiquitous in the clade). Because these genomes diverged recently and exhibit few substitutions (average 85% identity in DNA sequences), point mutations cannot account for the divergence of unaligned regions. These can thus be assumed to contain the horizontally transferred genes and the genes that were lost in some, but not all, genomes. Only about 10% of the total DUSs were located there, although they account for 17.5% of the sequence (P < 0.001, binomial test).
The results described above suggest that selection for genetic novelty is not associated with the selection for DUSs because recent acquisitions under-represent these motifs. We thus conducted a strict test to check whether DUSs are under-represented in both new laterally transferred sequences and in the sequences that, although present in the ancestral genome, were recently lost in an extant one. The N. meningitidis Z2491 genome was used as a reference and the 34 genes with more than 100 codons that were absent in all other genomes were analyzed (the length threshold was set to avoid the uncertainties associated with the annotation of small genes). Most of these recently acquired genes have no known function, and they are all devoid of DUS elements. The probability of finding no DUSs in such a large set of genes by stochastic effects is very small (P < 0.001, χ2 test), both when controlling for the number and for the length of these genes. In comparison, the ubiquitous genes contain an average of 0.4 DUSs per gene. We then identified the genes that were present in the N. meningitidis Z2491 and all other genomes except in the N. meningitidis MC58 genome. The 29 genes encountered were present in the ancestral genome and recently lost in N. meningitidis MC58. These genes were also totally devoid of DUSs, which is significantly different from the expected number (P < 0.001, χ2 test). We found the same results when performing the same analysis using N. meningitidis MC58 genome as a reference (data not shown). DUSs were completely absent from very recently gained genes, suggesting that DUSs are of minor importance in gene acquisition. Lost genes also lacked DUSs, indicating that DUS absence may render a sequence more prone to be lost than DUS-containing sequences.
Distribution of 10-mers DUS in the genes showing phase variation and/or containing signal peptides
Number of genes
Number of DUSs
N. meningitis Z2491
N. meningitis MC58
N. meningitis FAM18
N. meningitis 8013
DUSs and USSs are located in permissive regions of their core genomes
For a preliminary comparison, we conducted a similar analysis on the alignment of the four genomes of H. influenzae (Additional data files 6 and 7). The results were indeed similar, showing that the USS-flanking regions were less conserved than USS-negative regions in genomes, and that USSs themselves - like DUSs - are more conserved. Thus, DUSs and USSs are associated with regions conserved in all genomes and, within these regions, they are located in the parts that were permissive to substitutions.
Modeling the effect of DUS proximity
We then checked whether positions in genes putatively under selection for diversification had an average DUSp lower than the rest. In N. meningitidis Z2491, DUSp in these genes averaged 0.39 versus 0.49 in the remaining genes (P < 0.001, t-test). In N. lactamica the results were similar, with average DUSp values of 0.26 and 0.30, respectively (P < 0.001, t-test). The DUSp values are lower overall in N. lactamica than in the other genomes because of the smaller conversion fragments and in spite of higher DUS density. The use of our DUSp index confirms that selection for diversification in rapidly evolving genes related to fitness and virulence is not associated with selection for recombination by natural transformation.
To quantify the association between the distribution of DUSs and sequence conservation, we also computed the correlation between DUSp and sequence identity. This analysis is somewhat delicate because most columns in the multiple alignment are strictly identical between all genomes, whereas the remaining ones are bi-allelic (Additional data file 4). Thus, we focused the analysis on calculating the average DUSp for all nucleotides in N. meningitidis Z2491 (Figure 7) and classified all aligned positions in N. meningitidis Z2491 as changed if they differed from the consensus, or as not changed if they agreed with the consensus. The average position in this genome had a DUSp of 0.51 for changed sites and 0.49 for the others. The difference is small, but it means that across the aligned N. meningitidis Z2491 genome changed sites are on average 40 nucleotides closer to DUSs than the others, and the difference is highly significant (P < 0.001, analysis of variance). Thus, DUSs are associated with permissive regions that exhibit higher sequence diversity, even though DUS themselves are highly conserved.
The use of multiple genome alignments facilitated elucidation of the role played by DUSs in genome evolution in greater detail than was possible in previous studies. We have thus been able to show that DUSs are associated with recombination hotspots (with regions of increased recombination rates). First, their spacing matches the length of conversion fragments. Second, the analysis of recently acquired DUSs identifies many cases in which the motif region matches homologous regions lacking any motif resembling a DUS, which suggests insertion by recombination and not by point mutation. Third, N. lactamica has smaller conversion fragments and more tightly spaced DUSs, and these conversion fragments over-represent DUSs. This association between recombination and DUS distribution may be caused by selection for recombination, by selfish molecular drive, or by both.
Simple preliminary analysis of the degeneracy of DUSs allows determination of coefficients of selection compatible with DUSs being under mutation selection balance. An interesting case is provided by the analysis of N. lactamica, which has a higher DUS/DUS-1 ratio, suggesting that stronger selection for DUSs results from smaller conversion fragments. This is in accordance with the experimental observations that natural competence in N. lactamica is more specific (as opposed to being genus specific) than competence in N. meningitidis and N. gonorrhoeae, which exhibit only DUS dependency irrespective of the source of DNA . It is thus tempting to speculate that more discriminating transformation and smaller conversion segments cause selection or molecular drive for a higher density of DUSs.
Because DUSs are markers of recombination resulting from natural transformation, their role must be understood in the light of the multiple theories for the evolution of transformation: sex for the acquisition of heterologous sequences (horizontal transfer sensu strictu); sex to allow diversification of quickly evolving functions (for example, virulence factors); sex as a source of food; sex as a source of template for DNA repair; or sex as a mechanism allowing allelic (homologous) recombination to purge deleterious mutations and avoid clonal interference [50, 51, 52, 53, 54]. Some of these hypotheses are difficult to distinguish from the point of view of DUS distribution and evolution. However, one can easily distinguish the first two from the remaining ones because they lead to very different expected distributions of DUS elements.
Horizontally transferred regions sensu strictu have very few DUS elements, and recent insertions have no DUSs at all, suggesting that these sequences had no DUSs at the time of transfer. In addition, we find that conversion fragments are small, which should severely limit the extent of co-transfer of heterologous sequences with DUSs. Therefore, our data are in clear disagreement with the idea that the primary role of transformation is to mediate the horizontal transfer of new genetic information. Although it had been thought that transformation is the major vehicle of lateral transfer in Neisseria , recent data show that extensive genetic variation originates from phages and other mobile elements [56, 57]. In fact, most well documented incidences of HGT in Neisseria are the result of illegitimate not homologous recombination [51, 57, 58]. This does not mean that transformation never leads to HGT (for example, some recently horizontally transferred elements flanked by DUSs have raised speculations that they arose by natural transformation) [59, 60].
It has often been suggested that transformation allowed quick diversification of genetic information involved in virulence. However, here we find that even the core genes known to be under selection for diversification contain fewer DUSs than expected. This further argues against a role of DUSs and natural transformation in selection for genetic diversification, either through horizontal transfer of new functions or through variation in extant genes under selection for diversification.
If the purpose of bacterial sex were to feed on DNA, then DUS specificity would be clearly deleterious, because it prevents most DNA from entering the cell. Even if a DUS were present by chance, this would occur more frequently in G+C-rich genomes (because DUSs are G+C rich), whereas the most required nucleotide in cells is A because of the energetic metabolism . As a result, and because degeneracy in protein-DNA interactions is the norm, bacteria exhibiting lower DUS specificity should arise and quickly out-compete DUS-specific Neisseria. It follows either that DUS specificity is highly deleterious, and one might wonder why degeneracy has not evolved, or that nutrient acquisition is not the main purpose of sex in Neisseria. In light of the observed association between DUSs and recombination, the second hypothesis seems more plausible.
We show that DUS regions have the same average evolutionary history as the core genome. Nevertheless, DUSs are highly conserved despite being located in regions slightly more divergent than the average core genome. This data and the link between DUSs and conversion fragments is thus consistent with the scenarios of sex for repair, for allelic re-assortment, or for DUS being selfish sequences under pure molecular drive. Because recombination is mutagenic , one would expect regions close to DUSs to be evolving quicker than the average core genome, as observed. Because many DUSs arise by recombination, one would also expect their concentration to be higher in more plastic regions of the core genome, as observed. Thus, in all three evolutionary scenarios one would expect DUSs to be highly conserved, but more frequent in the permissive regions of the core genome, while rare in the accessory genome.
Recent findings show that competence for natural transformation in Bacillus subtilis stops growth  and is associated with the expression of proteins involved in recombination and repair, and that these proteins co-localize with transformation proteins at the cellular poles [64, 65]. This suggests a strong link between transformation, recombination, and repair. We previously showed that genome maintenance genes are enriched in both DUS in Neisseria and USS in the phylogenetically distant Pasteurellaceae, suggesting that transformation mediates DNA repair and conservation rather than diversification of lineages . The over-representation of DUSs and USSs in genome maintenance genes may reflect selection for facilitated recovery of genome preserving functions and co-evolution between these processes and specific transformation. Although DNA uptake increases upon UV mutagenesis in B. subtilis , competence for transformation is not found to be regulated by DNA damage in B. subtilis, in H. influenzae [66, 67], or in the constitutively competent Neisseria spp. In fact, N. gonorrhoeae, and quite possibly other Neisseria spp., is polyploid  and may use another copy of the chromosome to repair DNA damage by homologous recombination. It is therefore unclear to what extent transformation alone plays a role in the repair of DNA lesions.
Transformation allows the re-assortment of alleles in populations. In this sense, transformation can both reduce clonal interference, the competition between adaptive mutations in different clones, and efficiently purge deleterious mutations . We find that DUSs are missing in both ancient genes of the neisserial clade that were recently lost in one genome and in large gaps in the multiple alignments. This is in accordance with transformation allowing the recovery of inactivated or completely lost genes [10, 70]. A DUS might not change the probability of deletion of a gene, but once lost the presence of DUSs in a gene increases the probability that it will be restored by natural transformation. Sex by transformation in Neisseria could thus have evolved to deal with the vast amount of deleterious polymorphisms that result from the mechanisms aiming at rapid sequence diversification, for example for the evolution of virulence, such as intrachromosomal recombination and mutator periods. Repeats in Neisseria account for approximately 20% of the genome , and some of them, such as Correia elements  and insertion sequences , are highly dynamic. Interestingly, other pathogens that also generate variability through frequent intrachromosomal (homologous or illegitimate) recombination and/or high mutation rates, such as H. influenzae [72, 73] and Helicobacter pylori , are also naturally competent bacteria. Furthermore, several transformable bacteria that do not have sequence-specific transformation systems, such as B. subtilis, Streptococcus pneumoniae, and H. pylori, have other genetic or ecological mechanisms ensuring that natural transformation is induced when the likelihood for uptake of conspecific DNA is very high [75, 76, 77, 78]. Indeed, our previous and other analyses [10, 16] suggested that patterns of DUS and USS evolution are similar in Neisseria and Haemophilus. In both cases one expects that variability generated by intrachromosomal recombination will lead to deleterious mutations that would quickly accumulate in lineages in the absence of recombination, a phenomenon known as Muller's ratchet. This would fit well with recombination having a role in the maintenance of the core genome.
In order to elucidate the role of DUS-specific transformation, it is vital to determine exactly how DUSs are created ex nihilo, which is the mechanism of DUS specificity, and how intense is the influence of molecular drive. We showed that DUSs arise frequently in genomes and that the presence of DUSs is associated with gene conversion events. However, originally these DUSs must have been created by some mechanism. Because the exact site of many new DUSs has sequences of very weak similarity in the other genomes, point mutation is an unlikely candidate for the origin of new DUSs in these populations. Neisseria both take up and export DNA; therefore, harboring DUSs in a genome increases the probability for DNA propagation (it increases the fitness of the genes in its neighborhood). In this case selfish molecular drive would most frequently be in accordance with the interest of the cell, which is to filter out nonhomologous DNA and to select for uptake sequences associated with the core genome. In this sense, an understanding of how DUS specificity works will allow an appreciation of its possible evolutionary flexibility as well as its evolutionary history. Did the original system require a perfect DUS sequence, which would severely restrict the range of natural transformation? Alternatively, did DUS specificity co-evolve with the increase in the density of DUSs in genomes? In that case, did the positive feedback of molecular drive allowed faster evolution toward DUS stringent specificity? Both scenarios are plausible on theoretical grounds  but have radically different consequences for the theories aiming to explain the role of natural transformation. Our data suggest that the role of DUS-mediated natural transformation in virulence may not be the most commonly invoked. Instead of providing novelty in genetic repertoires, natural transformation may be a mechanism to tackle the side effects of the vast generation of genetic hypervariability that constantly takes place in neisserial genomes. Therefore, an understanding of the evolutionary role of DUS-specific natural transformation may highlight how the bacteria face both the needs for variability in its interaction with hosts and the commitment to preserve the core genome.
Materials and methods
Genes and genome sequences
We analyzed the genome sequences of six Neisseria spp. and four Haemophilus influenzae strains. Four of the neisserial and all of the Pasteurellaceae genome sequences and their annotations were obtained from GenBank Genomes: N. meningitidis Z2491, serogroup A (NC_003116) ; N. meningitidis MC58, serogroup B (NC_03112) ; N. meningitidis FAM18 serogroup C (NC_03221) ; N. gonorrhoeae FA1090 (NC_002946, unpublished); H. influenzae Rd KW20 (L42023.1) ;H. influenzae strain 86-028NP (CP000057.1) ; H. influenzae PittEE (CP000671.1) ; and H. influenzae PittGG (CP000672.1) . The sequence data for N. lactamica ST-640 were produced by the Pathogen Sequencing Unit at the Sanger Institute (Cambridge, UK). The sequence and annotation data for N. meningitidis 8013 serogroup C were provided by the unit 'Génomique des microorganismes pathogènes' from Institut Pasteur (Paris, France).
Definition of sets of orthologous genes
A preliminary set of orthologs was defined by identifying unique pair-wise reciprocal best hits, with at least 40% similarity in protein sequence and less than 30% difference in length. This list was then refined by combining the information on the distribution of similarity of these putative orthologs and the data on gene order conservation (as in the report by Rocha and coworkers ). Because few rearrangements are observed at these short evolutionary distances, genes outside conserved blocks of synteny are likely to be xenologs or paralogs. Hence, we conservatively used the distribution of sequence similarity within reciprocal best hits, together with the classification of these genes as either syntenic or nonsyntenic, to set appropriate lower thresholds of protein sequence similarity between orthologs. We considered two genes to be orthologs if their proteins were at least 85% similar. The definitive list of orthologs for each group was defined as the intersection of pair-wise lists.
Correlation of predicted HGT regions and DUS content
The six-genome analysis was preceded by mapping of DUS content in predicted HGT regions in the N. meningitidis Z2491 and MC58 genomes, as identified in HGT-DB. These are regions with statistical parameters such as G+C content, codon, and amino acid usage deviating from the genome average .
Multiple genome alignment
The M-GCAT genome comparison and alignment tool was used to produce a multiple alignment of the six genome sequences . As output, M-GCAT returns the alignment partitioned into locally co-linear regions or clusters, along with a concatenated version of the multiple alignment that joins all individually aligned co-linear regions. Aligning only locally co-linear regions with strong evidence of homology avoids forcefully aligning potentially nonhomologous sequence. For our comparison, the M-GCAT parameters were configured as follows: q = 100; d = 40000; c = 110; min Anchor length = 0.8 * (Log [S]); min MUM length = 8. The remaining parameters were left as default values. Different values for these parameters will vary the final output. Accordingly, we optimized these values by maximizing the final amount of matches found and percentage of sequence covered by the comparison framework. We used MUSCLE  to align the unaligned regions in the M-GCAT comparison framework and to produce the final gapped alignment. After this step, all remaining unaligned regions are lineage specific (they are not in the core genome) and are left unaligned for further inspection. Gblocks  was used to calculate the number and identify the regions of conserved blocks in the multiple alignments.
We conducted three separate phylogenetic analyses using three different but partially overlapping datasets: the concatenate of the alignments of the ubiquitous genes; the concatenate of the M-GCAT multiple alignments; and the concatenate of all 1,000 nucleotides regions (± 500 nucleotides on each side) surrounding each DUS present in all of the genomes of the M-GCAT multiple alignment. All analyses were conducted using Tree-Puzzle  to generate the matrix of distances by maximum likelihood, with the HKY+Γ model and exact parameter estimates. The trees were then computed using BIONJ .
Gene conversion analysis
To estimate the number and size of gene conversion events between all pairs of these six sequences, we employed the gene conversion detection tool GENECONV  on the M-GCAT multiple alignments. The parameters were configured as follows:/mig0.005/g2/dm -Outerseq = off. GENECONV aims to find the most likely candidates for gene conversion events between pairs of sequences in a DNA alignment. It does so by looking for maximal aligned pairs of segments that are unusually similar at a local level. We have also controlled for invariable or highly selected sites by excluding monomorphic sites from the analysis. Candidate events were ranked by multiple comparison corrected P values.
Definition of DUS proximity
The correlation of functional genomic features with DUS presence is complicated by the over-representation of these elements in intergenic regions. Thus, comparing the number of DUSs present in genes neglects the fact that many genes lacking DUSs may have one just after the stop codon. Because we are interested in DUS proximity in relation to DUS-related recombination, we used the distribution of conversion fragment sizes between genomes to compute the probability of a nucleotide being affected by the presence of a neighboring DUS from the point of view of gene conversion. Thus, for a pair of genomes A and B, we compute the cumulative distribution of the sizes of gene conversion fragments given GENECONV (CD). A nucleotide at a distance X from the closest DUS has a score 1 - CD(X), which represents the likelihood that the position will be affected by presence of the closest DUS in terms of engaging into a conversion fragment arising from transformation. Nucleotides far from DUSs will have very low scores, whereas nucleotides close to DUSs will have scores close to 1. We tested some variants of this method, notably by summing the score of the position for the closest downstream and upstream DUSs and by taking the maxima of the two. Both maxima and average approaches give qualitatively comparable results. In the text we focus on the average approach.
All searches for exact and degenerate DUSs in the genome sequences and conserved in the multiple alignment were found using a customized Python script (available upon request). Additionally, the script was used to identify the number of DUSs in coding sequences and calculate levels of DUS degeneracy and percent identity surrounding DUS sites in the alignment.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 provides details regarding genome alignment. Additional data file 2 shows the average nucleotide distance between DUSs. Additional data file 3 details the under-representation of DUSs in strain-specific insertions. Additional data file 4 details the classification of polymorphic sites. Additional data file 5 shows the distribution of the distance between contiguous composite DUSs in the N. meningitidis Z2491 genome. Additional data file 6 provides a visual representation of M-GCAT's multiple alignment of four H. influenzae genomes. Additional data file 7 shows that within the core genome of H. influenzae the USS-proximal regions accumulate more substitutions.
The authors thank the Sanger Center and the Institut Pasteur for providing sequence data before publication. We are grateful to SV Balasingham, E Feil, G Achaz, SA Frye, and J Allunans for critical reading of the manuscript and helpful discussions on the statistical analysis. OH Ambur and T Tønjum are supported by FUGE/CAMST and CoE funding from the Research Council of Norway and EMBIO, University of Oslo. T Treangen is supported by a Spanish Ministry MECD Research Grant TIN2004-03382, and AGAUR Training Grant FI-IQUC-2005 BE-2006. T Treangen thanks X Messeguer for support and encouragement.
- 27.Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, Klee SR, Morelli G, Basham D, Brown D, Chillingworth T, Davies RM, Davis P, Devlin K, Feltwell T, Hamlin N, Holroyd S, Jagels K, Leather S, Moule S, Mungall K, Quail MA, Rajandream MA, Rutherford KM, Simmonds M, Skelton J, Whitehead S, Spratt BG, Barrell BG: Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature. 2000, 404: 502-506. 10.1038/35006655.PubMedCrossRefGoogle Scholar
- 28.Tettelin H, Saunders NJ, Heidelberg J, Jeffries AC, Nelson KE, Eisen JA, Ketchum KA, Hood DW, Peden JF, Dodson RJ, Nelson WC, Gwinn ML, DeBoy R, Peterson JD, Hickey EK, Haft DH, Salzberg SL, White O, Fleischmann RD, Dougherty BA, Mason T, Ciecko A, Parksey DS, Blair E, Cittone H, Clark EB, Cotton MD, Utterback TR, Khouri H, Qin H, Vamathevan J, et al: Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science. 2000, 287: 1809-1815. 10.1126/science.287.5459.1809.PubMedCrossRefGoogle Scholar
- 30.Bentley SD, Vernikos GS, Snyder LA, Churcher C, Arrowsmith C, Chillingworth T, Cronin A, Davis PH, Holroyd NE, Jagels K, Maddison M, Moule S, Rabbinowitsch E, Sharp S, Unwin L, Whitehead S, Quail MA, Achtman M, Barrell B, Saunders NJ, Parkhill J: Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18. PLoS Genet. 2007, 3: e23-10.1371/journal.pgen.0030023.PubMedPubMedCentralCrossRefGoogle Scholar
- 44.Bucci C, Lavitola A, Salvatore P, Del Giudice L, Massardo DR, Bruni CB, Alifano P: Hypermutation in pathogenic bacteria: frequent phase variation in meningococci is a phenotypic trait of a specialized mutator biotype. Mol Cell. 1999, 3: 435-445. 10.1016/S1097-2765(00)80471-2.PubMedCrossRefGoogle Scholar
- 56.Hotopp JC, Grifantini R, Kumar N, Tzeng YL, Fouts D, Frigimelica E, Draghi M, Giuliani MM, Rappuoli R, Stephens DS, Grandi G, Tettelin H: Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes. Microbiology. 2006, 152: 3733-3749. 10.1099/mic.0.29261-0.CrossRefGoogle Scholar
- 60.Klee SR, Nassif X, Kusecek B, Merker P, Beretti JL, Achtman M, Tinsley CR: Molecular and biological analysis of eight genetic islands that distinguish Neisseria meningitidis from the closely related pathogen Neisseria gonorrhoeae. Infect Immun. 2000, 68: 2082-2095. 10.1128/IAI.68.4.2082-2095.2000.PubMedPubMedCentralCrossRefGoogle Scholar
- 73.De Bolle X, Bayliss CD, Field D, van de Ven T, Saunders NJ, Hood DW, Moxon ER: The length of a tetranucleotide repeat tract in Haemophilus influenzae determines the phase variation rate of a gene with homology to type III DNA methyltransferases. Mol Microbiol. 2000, 35: 211-222. 10.1046/j.1365-2958.2000.01701.x.PubMedCrossRefGoogle Scholar
- 78.Levine SM, Lin EA, Emara W, Kang J, DiBenedetto M, Ando T, Falush D, Blaser MJ: Plastic cells and populations: DNA substrate characteristics in Helicobacter pylori transformation define a flexible but conservative system for genomic variation. Faseb J. 2007, 21: 3458-3467. 10.1096/fj.07-8501com.PubMedCrossRefGoogle Scholar
- 80.Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Klerlavage AR, Bult CJ, Tomb JF, Dougherty RA, Merrick IM, Mckenney K, Sutton G, Fitzhugh W, Fields C, Gocayne JD, Scott J, Shirley R, Liu LI, Glodek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehm CL, Mcdonald LA, Small KV, Fraser CM, Smith HO, Venter JC: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995, 269: 496-512. 10.1126/science.7542800.PubMedCrossRefGoogle Scholar
- 81.Harrison A, Dyer DW, Gillaspy A, Ray WC, Mungur R, Carson MB, Zhong H, Gipson J, Gipson M, Johnson LS, Lewis L, Bakaletz LO, Munson RS: Genomic sequence of an otitis media isolate of nontypeable Haemophilus influenzae: comparative study with H. influenzae serotype d, strain KW20. J Bacteriol. 2005, 187: 4627-4636. 10.1128/JB.187.13.4627-4636.2005.PubMedPubMedCentralCrossRefGoogle Scholar
- 82.Hogg JS, Hu FZ, Janto B, Boissy R, Hayes J, Keefe R, Post JC, Ehrlich GD: Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol. 2007, 8: R103-10.1186/gb-2007-8-6-r103.PubMedPubMedCentralCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.