Identification of constraints influencing the bacterial genomes evolution in the PVC super-phylum
Horizontal transfer plays an important role in the evolution of bacterial genomes, yet it obeys several constraints, including the ecological opportunity to meet other organisms, the presence of transfer systems, and the fitness of the transferred genes. Bacteria from the Planctomyctetes, Verrumicrobia, Chlamydiae (PVC) super-phylum have a compartmentalized cell plan delimited by an intracytoplasmic membrane that might constitute an additional constraint with particular impact on bacterial evolution. In this investigation, we studied the evolution of 33 genomes from PVC species and focused on the rate and the nature of horizontally transferred sequences in relation to their habitat and their cell plan.
Using a comparative phylogenomic approach, we showed that habitat influences the evolution of the bacterial genome’s content and the flux of horizontal transfer of DNA (HT). Thus bacteria from soil, from insects and ubiquitous bacteria presented the highest average of horizontal transfer compared to bacteria living in water, extracellular bacteria in vertebrates, bacteria from amoeba and intracellular bacteria in vertebrates (with a mean of 379 versus 110 events per species, respectively and 7.6% of each genomes due to HT against 4.8%). The partners of these transfers were mainly bacterial organisms (94.9%); they allowed us to differentiate environmental bacteria, which exchanged more with Proteobacteria, and bacteria from vertebrates, which exchanged more with Firmicutes. The functional analysis of the horizontal transfers revealed a convergent evolution, with an over-representation of genes encoding for membrane biogenesis and lipid metabolism, among compartmentalized bacteria in the different habitats.
The presence of an intracytoplasmic membrane in PVC species seems to affect the genome’s evolution through the selection of transferred DNA, according to their encoded functions.
KeywordsHorizontal transfer Bacteria Environments Lifestyle Genomes Functions
The extensive amount of genomic data acquired over the last 20 years has provided insights into the evolutionary processes that drive bacterial evolution. The horizontal transfer of DNA (HT) appears to be major driving force of innovation [1, 2] as it provides additional functions, allowing adaptation to specific conditions and environmental changes. The HT process in bacteria depends on several conditions : i. the possibility of exchanges, meaning the presence of different microorganisms in a single place; ii. the possibility of foreign sequences to enter into recipient bacteria, mediated by conjugation, transformation or transduction; iii. the ability to integrate into the recipient genome; iv. the genes expressed and the genes used v. those conserved, in relation to the benefits for recipient bacteria. This process could be regulated by intrinsic and extrinsic constraints. Two extrinsic constraints influencing the possibility of exchanges include the environment or the “ecological niches” and the lifestyle, which together constitute the habitat of bacteria [4, 5, 6]. Thus the proportions and origins of HT were more similar among bacteria from the same habitat than among bacteria from a given phylum . Changing environmental conditions are also well known constraints for HT regulation; UV irradiation or starvation and other stress conditions, were shown to affect the mobility of transposons and insertion sequences [6, 8, 9, 10]. The habitat also seems to play an important role in the selection and conservation of transferred sequences encoding for specific functions that are involved in host’s colonization, and the development of pathogenesis. Indeed, many examples in the literature indicate that genes encoding for metabolic functions [11, 12, 13] and for antibiotic resistance [14, 15] and virulence [16, 17, 18] represent commonly transferred sequences. The intrinsic constraints that influence the entrance and integration of foreign DNA into a recipient genome include the exclusion surface that limits the entrance of specific sequences in some bacteria , the presence of CRISPR that decreases the quantity of transferred sequences insertion in recipient genomes [19, 20] and the presence of some endo-nucleases that can destroy foreign DNA [3, 21].
Many studies have been conducted to explore the impact of the different extrinsic and intrinsic constraints on horizontal transfers. However, these studies involved one or a few species, or bacteria presenting only one to two habitats or lifestyles [22, 23, 24, 25], or undergoing relatively few intrinsic constraints [26, 27]. The study of only few characteristics may one lead to miss the cumulative or overlapping effects of the different constraints. Therefore, we used a phylogenomic approach to mine a large set of bacteria with different habitats in order to decipher the impact of different constraints on genome composition, especially regarding HT. The PVC super-phylum seems to be a good model to study, as it includes seven bacterial phyla (Planctomycetes, Verrucomicrobiae, Chlamydiae, Lentisphaera, Poribacteria, OP3, WWE2) [28, 29, 30, 31] with diverse habitats, three different lifestyles (intracellular allopatric, intracellular sympatric, extracellular sympatric) and numerous environments (water, soils, water and soils, metazoa, amoeba, ubiquitous…), thus varying the external constraints. Moreover, a specific cell plan is also present in all the Planctomycetes [32, 33, 34], in some of Verrucomicrobiae  in one Lentisphaera and in one Poribacteria . The cytoplasm of these bacteria is separated into two compartments by an intracytoplasmic membrane (ICM), the pirellulosome inside (with DNA ) and the paryphoplasm outside. This membrane is a lipid bilayer in contact with proteins [32, 33, 38] presenting structural similarities with proteins from eukaryotic membranes like the clathrins [39, 40]. The function of this intracytoplasmic membrane is still unknown, but we hypothesize the possible impact of this intrinsic constraint on HT. In the present investigation, we analyzed 33 PVC bacteria together with 31 phylogenetically close species (Bacteroidetes, Chlorobi and Spirochaetes) that were considered as the control group, looking for evidence for horizontal transfer. Statistical analyses of the potential partners and functions involved in HT allowed us to estimate the real impact of habitat and cell plan on the genomes evolution.
Bacterial set selection, definition of lifestyles, environments and cell plan
The genomes of 64 bacteria have been retrieved from two different databases [41, 42]. These bacteria belong to different phyla (Additional file 1) including four phyla of the PVC super-phylum, Planctomycetes, Verrucomicrobiae, Lentisphaerae and Chlamydia and the phylogenetically closest phyla, Bacteroidetes, Chlorobi and Spirochaeta (determined thanks to a reference tree ). We reconstructed the species tree of PVC bacteria and Bacteroidetes-Chlorobi-Spirochaetes on the basis of 12 markers that are common to the 64 species (Additional file 2) using Mega5 . Therefore, the protein sequences of each marker were aligned with Muscle  and non-conserved positions were removed manually. All alignments were concatenated, leading to an alignment of 5067 sites. We used a Maximum likelihood tree (substitution model JTT) based on this concatenated alignment to reconstruct the phylogeny of species. Bootstrap support values were obtained with 150 replicates (Additional file 3). The bacteria studied have different lifestyles (intracellular or extracellular, allopatric or sympatric [46, 47]) and live in different environments (amoeba, mammals, soils, water, insects). We use the term “environment” as the main place where the bacteria are living. For example, bacteria detected in sea, freshwater or wastewater are all annotated as bacteria from ‘water’. If bacteria are present in two different environments, we indicate both of them (for example ‘water-soil’ bacteria), while bacteria living in more than 5 environments are considered ‘ubiquitous’. The lifestyle of bacteria is characterized by two factors: the intracellular and extracellular conditions, and the ability to exchange with other microorganisms (in allopatric or sympatric lifestyles, respectively ). Lifestyles and living environments were defined for each bacterium based on a literature search [48, 49, 50, 51, 52, 53]. Cell plans of the bacteria were determined via transmission electron microscopy images already available in the literature [33, 34, 35, 36] and microscopic observations of the bacteria realized in our laboratory . Three states are determined for the cell plan: compartmentalization, non compartmentalization and unknown. The selected set of species contains 4 bacteria from amoeba, 4 from insects (1 intracellular, 3 extracellular), 3 from soils, 4 living in soils and water, 3 ubiquitous, 26 from vertebrates (18 extracellular, 8 intracellular) and 20 from water. Among these 64 bacteria, 20 present a compartmentalized cell plan, 5 have an unknown cell plan, and 39 are not compartmentalized (Additional file 1).
Genome analysis: common genes, specific genes, ORFans
OrthoMCL  was used to obtain groups of orthologous proteins. Groups containing at least one representative member of each habitat were considered as the common genes, and those that contain only proteins of bacteria from the same habitat are considered as specific to the corresponding habitat. We calculated the rate and determined the function of proteins that are specific to habitat in each species using two software, COGnitor and WGA and the Interpro database [56, 57, 58, 59]. Genes that do not belong to any orthologous group are either acquired by “specific” HT or generated de novo. Blast against NR database allowed the identification of ORFans in the genome with no identifiable homologous (i.e. genes that do not have a Blast hit with an e-value < 10e-4 AND a query coverage >50%). We performed a clustering of all species according to their genes contents in order to detect, in some bacteria, a tendency to share the same gene contents in relation with their habitat.
HT detection, functions and partner identification
Generally, two main difficulties hinder the analysis of the Horizontal Transfer of DNA sequences (HT) according to habitat: the distinction between the ancestral and recent gene gains, as well as the difficulty of determining the ancestral habitat of the bacteria. In order to avoid these problems, we focused our study on recent transfers that occurred only in modern species of the super-phylum (in the leafs of the tree) and not in their ancestors (at the nodes of the tree). HT instances were identified using a comparative phylogenomic approach, phylogenetic profiling of proteins and phylogenetic analysis of gene trees in comparison with the species tree. Using the Phylopattern  pipeline, we identified the gene gain events in four steps: 1) based on the orthologous groups, a tree was reconstructed for each group. 2) The topologies of these trees were compared with that of the species tree in order to detect species missing in orthologous groups. 3) We obtained a pattern of presence/absence for each gene in the different species, allowing Phylopattern to reconstruct the ancestral states of genes by implementing the Sankoff parsimony algorithm . Based on this reconstruction, the pattern-matching module in PhyloPattern allows us to infer, by parsimony, two types of genetic events that could have occurred during gene evolution: gains and losses. The gains could be a possible HT, de novo genes or artifacts, therefore, among gene gains detected by Phylopattern, we had to identify those due to HT. We focused on specific gene gains and performed a Blast to identify similar sequences in the NR database. If the first twenty hits of Blast result belong to species outside the super-phylum of the query, and they are orthologous as confirmed by a reciprocal best hit, the enquired gene could be considered horizontally acquired. The pattern permitting the automatic identification of HT among gain events is presented in Additional file 4. Sequences with e-value > 10–5, coverage < 60% or identities < 30% were not considered. The localization of these events in the genome allowed the identification of any horizontal transfers of DNA sequences. The directionality of the transfer could not always be identified, but as we were interested in the capacity of exchange of the bacteria, it did not matter if the bacterium was the donor or the recipient. If some transferred genes are side by side in genomes, and were exchanged with the same partners, we considered them to have been transferred by a single event.
We calculated quantities and proportions of proteins (proteins transferred/total proteins in proteome) and sequences (nucleotides transferred/total nucleotides in genomes) implicated in HT for each genomes and the size of the transferred sequences. We identified the function of transferred genes by using two software programs, COGnitor and WGA and the Interpro database [56, 57, 58, 59]). These programs attribute one type of function to proteins, according to the COG to which proteins belong. We studied the possible significant differences in HT distribution among the different studied groups of bacteria concerning the HT partners and functions.
The occurrence of HT and the frequency of specific genes were compared among the different groups of bacteria from different habitats. We tested whether the data (proportions of functions for specific genes and proportions of partners and functions for HT) follows a normal (Gaussian) distribution using the Shapiro-Wilk test and we controlled the homogeneity of the data by the Levene test . These tests were followed by a comparison of variance among the different habitats, using the Kruskal test  or the ANOVA test, according to whether or not the data had a normal distribution. The Nemenyi  or Tukey  tests were performed to obtain a comparison of each pair of habitats. We also realized a Principal Components Analysis (PCA), focused on HT proportions in genomes, size, functions and partner of transfer, followed by a hierarchical clustering (HCPC), to identify clustering of bacteria according to their transfer partners or their functions. All analyses were realized with R software.
We then used comparative phylogenetic methods to test the impact of phylogenetic relationships between species on the acquisition of studied characters. Analysis of variance was used in intergroup comparisons of categorical variables to determine whether there is statistical significance of the repartition of species on the basis of their habitat, compared to their classification according to phylogenetic distances (Additional file 3). Therefore, the Nemenyi  or Tukey  tests were performed to obtain a comparison of each pairs of groups based on the phylogenetic relationships. These groups were determined by the phylogenetic distance separating bacteria. We compared the results of these tests with the results of tests carried out for groups based on habitats, allowing us to determine if classes defined by the phylogenetic distances present different and more reliable results than classes based on habitat (t-test). If results were similar, it could be difficult to determine whether the differences observed were related to the habitat or to the phylogenetic relationships. We also performed two correlation tests. The first test, Pagel’s correlation method , performed on Mesquite, is a test of the independent evolution of two binary characters (all features studied were tested thanks to a binarization of continuous values). This test compares the ratio of likelihoods of two models where the rates of change in each character are dependent or alternatively independent from phylogenetic relationships. The second test is the Spearman coefficient , weighted by the phylogenetic distances that studies the relationship between two variables. The detection by means of this correlation test of a significantly convergent character in bacteria from a single habitat is rather unrelated to the phylogenetic background.
The genes that are common to all habitats represented 26.2% of the content of each genome on average, and varied from 20.1% for bacteria from insects to 49.5% for the intracellular vertebrates. In order to determine the functional profile of the common genes, each protein was assigned to Cluster of Orthologous Groups of proteins (COGs) functional category. We could infer a putative function to the protein sequences of 74% of the common genes; of these, 43.8% encode for cellular processes and signaling, 36.8% for metabolic functions, and 19.5% for storage and processing information (Fig. 1). Among these, four functions were significantly over-represented compared to the other functions: wall/membrane/envelop biogenesis, signal transduction mechanisms, transcription and energy production and conversion (12.4, 11.6, 9.1 and 8.5%, respectively Chi2 test: p-value = 9.4*10-7) (Fig. 1).
Horizontally transferred functions and phenotype
When analyzing the functions of the transferred sequences, we found that the general function distribution in the HT for the different habitats was not similar to that of the whole genomes which suggests that HT was not due to chance. Genes involved in cell processes and signaling (33 to 50%) seemed to be significantly more subject to HT, whereas genes dedicated to information storage (12 to 17%) were less subject to HT (t-test between whole genomes and transferred genes : p-value = 4.6*10-2 and 8.3*10-4) (Fig. 3). Moreover, there were significant differences among the habitats. Biological functions of transferred sequences were biased to three categories according to bacterial habitat: the signal transduction mechanism function in ubiquitous bacteria and in bacteria from soils (20.2 and 17.9%, respectively; ANOVA test: p-value = 2.3*10-4) the transport and metabolism of amino acid in bacteria from amoeba and lipids in ubiquitous bacteria (16%, Kruskal-Wallis test and correlation test: p-value = 7.5*10-2 and 2.1*10-4; 10.5%, ANOVA test: p-value = 6.9*10-3, respectively) and the defense mechanism in bacteria from extracellular vertebrates (4.7%, Kruskal-Wallis: p-value = 3.5*10-2).
The comparative analysis of 33 genomes from PVC species from four different phyla showed the influence of the living environment and compartmentalization on the genome composition of PVC bacteria. The common genes were genes encoding for transcription, signal transduction mechanisms, energy production and membrane biogenesis. Conversely, shared and specific genes encode for different functions in relation to the lifestyle of the corresponding species. Evidence for a random horizontal transfer of DNA sequences has been given using a phylogenomic approach. Genes implicated in cell wall/membrane/envelope biogenesis, and those involved in lipid metabolism, were found to be over-represented among the transferred genes of compartmentalized bacteria from different habitats, according to a convergent evolutionary selection.
Our findings replicate observations from previous studies which demonstrated the role played by shared genomes in environmental adaptation . Nevertheless, our approach, by examining as many as 8 different habitat conditions, offers a large advantage over other genomic studies, and increases the reliability of our results. The low proportion of specific genes that have been detected in bacteria from insects, soils and soil-water milieu is rather due to the higher number and more distant phylogenetic relationships among the species studied [69, 70] compared to the other habitats. Intracellular bacteria from vertebrates showed a low proportion (1.9%) of horizontally transferred sequences compared to the other bacteria. This result is probably related to the physical isolation of intracellular bacteria, which prevents opportunities for HT [71, 72]. This agrees with previous studies showing that the predominant evolutionary process in intracellular bacteria is genome reduction, leading to smaller genome sizes [73, 74]. Intracellular bacteria in amoeba with 4.8% of HT are the exception [75, 76], since amoeba can phagocyte several bacteria at once, giving a particular field for potential genetic exchange and a training ground for the emergence of parasitism .
Likewise, results obtained for partners of transfers analysis were in agreement with previous results concerning transfers between PVC bacteria and Proteobacteria  or Spirochaetes and Firmicutes [12, 13]. Indeed, HT occurred preferentially between bacteria from the same habitats, as had already been assumed. Firmicutes are one of the two major phyla present in the gut microbiome , and this is the main partner of our bacteria from vertebrates. In the same way, Acidobacteria are mainly detected in soil  and they are overrepresented as HT partners of bacteria from soils, compared to bacteria from other habitats. The tendency of bacteria from Amoeba to exchange more with Eukaryotes, especially plants, is probably due to their ancestral habitats. Indeed, ancestral Chlamydiae are known to have lived in and exchanged genes with the Archaeplastides [79, 80]. Thus, we can support the hypothesis that part of the HT detected was acquired by the interaction between the ancestors of the Chlamydiae and the plants, followed by the loss in the majority of bacteria. It is worth noting that like previous studies for HT detections, it is difficult to distinguish between ancient and recent HT events; yet HT partners are the witnesses of modern and ancestral habitats of the bacteria studied, and our HT analysis helps infer the ancestral habitat of these bacteria.
Beyond the complexity hypothesis that claims that genes involved in transcription and translation are less prone to transfer than metabolic genes, our findings showed that horizontal transfers can affect any function. Thus, HT do not only concern genes encoding for metabolic mechanisms and other functions that enhance pathogenicity, like genes for virulence and antimicrobial activity [1, 15, 76]; genes involved in transcription and translation, in cell surface and DNA binding, and genes essential for defense can likely be transferred as well [46, 81, 82, 83]. Positive selection might be contributing to the over-representation of some functions in the category of transferable genes [84, 85]. Indeed, horizontally acquired genes that have a useful function are maintained as it follows a strategy of colonization and adaptation to the environment. Our findings confirm previous results showing that HT particularly affects the genes involved in lipid metabolism, signal transduction and membrane transport in PVC bacteria, and genes specific to outer membrane (such as O-antigen polymerase and outer membrane efflux protein) in some Planctomycetes [43, 86, 87]. Since, the intracytoplasmic membrane of compartmentalized bacteria is a lipid bilayer, we can assume that the over-representation of the two functions in the genes transferred could be related to the cell plan of the bacteria. These genes may be essential for the maintenance of the supplementary intracytoplasmic membrane. Knowing that the quantity of HT events was found to be similar between compartmentalized and non compartmentalized bacteria, these results revealed the possible impact of the cell plan on the transfers’ positive selection. This selection that seems to be dependent on the function, and induces the recurrent maintenance of some transferred genes involved in the formation of compartments in bacteria from different habitats. It is noteworthy that genes implicated in lipid metabolism and membrane biosynthesis were not over-represented in the non-transferred part of the genome of compartmentalized bacteria, compared to the other bacteria; therefore, the selection seems to concern only the transferred genes.
One limitation of our comparative genomic approach is that the number of genomes studied leads to a small sample size in each environmental category, which hinders the realization of the statistical tests for certain categories. Moreover, our dataset was comprised of seven phyla, with only few representatives of soil bacteria, four for bacteria living in amoeba, and three extracellular bacteria from insects or ubiquitous, while some environmental categories contain only bacteria from just one phylum. Although the sample size was minimal, the results obtained were statistically usable and showed significant differences among phylogenetically close bacteria in relation with their habitat. Given the increased number of sequenced genomes, it will be interesting to characterize HT events in compartmentalized bacteria for diverse phyla, in order to elucidate the role of physical barriers in horizontal transfers.
The genomic study of bacteria allowed to better understand the influence of the different constraints acting on genomes evolution in bacteria, especially the impact of the habitat and the special cell plan, in PVC super-phylum. The habitat influences the flux of horizontal transfer and determines the partners for genetic exchanges. The presence of an intracytoplasmic membrane in some PVC bacteria doesn’t seem to limit the HT but rather, induces a selection of transferred genes, according to their functions.
We thank Olivier Chabrol for assistance in computer programming during the elaboration of HT detection strategy and Manuela Royer Carenzi for her assistance in the statistical analyzes. We also thank the Xegen company for their assistance in HT detection by using of Phylopattern software. We thank TradOnline for English reviewing.
This work was supported by the Assistance Publique - Hopitaux de Marseille (Marseille Public University Hospital System). VM was supported by a Chairs of Excellence program from the Centre National de la Recherche Scientifique (CNRS). The funders had no role in study design, data collection and interpretation or the decision to submit the work.
Availability of data and materials
The dataset supporting the conclusions of this article is included within the article and its additional files.
PS carried out the design of the study, the strategy elaboration and the collection of data, performed the statistical analysis of results, and drafted the manuscript. PP participated in strategy elaboration, data interpretation and revised the manuscript. DR conceived the study, participated in its design and coordination and revised the manuscript. VM participated in the coordination of the study, strategy elaboration and the interpretation of data, and also drafted the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
- 6.Aminov RI. Horizontal gene exchange in environmental microbiota. Front Microbiol. 2011;2(158):10–3389.Google Scholar
- 11.Hacker J, Carniel E. Ecological fitness, genomic islands and bacterial pathogenicity : A Darwinian view of the evolution of microbes. EMBO Rep. 2011;21:376–81.Google Scholar
- 14.Craigie R, Gellert M, Lambowitz AM. Mobile DNA. In: Craig NL, editor. American society for microbiology. 1989.Google Scholar
- 21.González-Candelas F, Francino MP. Barriers to Horizontal Gene Transfer: Fuzzy and Evolvable Boundaries. In: Pilar F editor. Horizontal Gene Transfer in Microorganisms. Caister: Academic Press. 2012. p. 47.Google Scholar
- 41.NCBI - proteins.https://www.ncbi.nlm.nih.gov/protein. Accessed January and March 2013.
- 42.NCBI - genomes.http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/genomes/. Accessed January and March 2013.
- 48.JGI - genomes. http://genome.jgi.doe.gov/. Accessed between August and December 2014.
- 49.GOLD database. https://gold.jgi-psf.org/index. Accessed between August and December 2014.
- 50.List of Prokaryotic names with Standing in Nomenclature - Bacterio.net. http://www.bacterio.net/index.html. Accessed between August and December 2014.
- 55.Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, et al. Using OrthoMCL to assign proteins to OrthoMCL‐DB groups or to cluster proteomes into New ortholog groups. Curr Protoc Bioinformatics. 2011;35(Suppl 6.12):1–19.Google Scholar
- 62.Levene H. Robust tests for equality of variance. In: Olkin I, Ghurye SG, Hoeffeling W, Madow WG, Mann HB, editors. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford: University Press; 1960. p. 278–292.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.