Comparative genomics of the class 4 histone deacetylase family indicates a complex evolutionary history
- 5.8k Downloads
Histone deacetylases are enzymes that modify core histones and play key roles in transcriptional regulation, chromatin assembly, DNA repair, and recombination in eukaryotes. Three types of related histone deacetylases (classes 1, 2, and 4) are widely found in eukaryotes, and structurally related proteins have also been found in some prokaryotes. Here we focus on the evolutionary history of the class 4 histone deacetylase family.
Through sequence similarity searches against sequenced genomes and expressed sequence tag data, we identified members of the class 4 histone deacetylase family in 45 eukaryotic and 37 eubacterial species representative of very distant evolutionary lineages. Multiple phylogenetic analyses indicate that the phylogeny of these proteins is, in many respects, at odds with the phylogeny of the species in which they are found. In addition, the eukaryotic members of the class 4 histone deacetylase family clearly display an anomalous phyletic distribution.
The unexpected phylogenetic relationships within the class 4 histone deacetylase family and the anomalous phyletic distribution of these proteins within eukaryotes might be explained by two mechanisms: ancient gene duplication followed by differential gene losses and/or horizontal gene transfer. We discuss both possibilities in this report, and suggest that the evolutionary history of the class 4 histone deacetylase family may have been shaped by horizontal gene transfers.
KeywordsHorizontal Gene Transfer Phaeodactylum Tricornutum Ancient Gene Duplication Differential Gene Loss Ancient Horizontal Gene Transfer
In eukaryotes, DNA is packaged into chromatin structures, the basic unit of which is the nucleosome. Each nucleosome consists of about 148 bp of DNA tightly wrapped around a histone-protein octamer containing two copies each of H2A, H2B, H3, and H4 . The packaging of DNA restricts its accessibility to proteins such as transcription factors, and therefore the transcriptional activation of many genes requires chromatin modifications such as reversible acetylation of the core histones . The steady-state level of acetylation is controlled by the antagonistic activities of two types of enzymes: histone acetyltransferases and histone deacetylases (HDACs). HDACs thus play key roles in transcriptional regulation and also in other cell processes that are influenced by the acetylation state of core histones, such as chromatin assembly, DNA repair, and recombination [3, 4].
HDACs have additional activities that are not directed at histones: many HDACs are partially found in the cytoplasm, and some have been shown to act on non-histone substrates, such as the cytoskeletal protein, tubulin, and the transcription factors p53 and YY1 [5, 6, 7]. Acetylation/deacetylation might thus be a widespread type of post-translational modification, acting in a manner similar to phosphorylation/dephosphorylation in the regulation of protein activity . In addition, HDACs have recently attracted considerable attention because chemical inhibitors of HDACs induce growth arrest, differentiation, and/or apoptosis of cancer cells both in vitro and in vivo, and may thus represent a new class of anti-tumor agents .
Recent phylogenetic studies  classify the non-sirtuin HDACs into three families: the well-known class 1 (which includes the human HDACs 1, 2, 3, and 8), class 2 (including the human HDACs 5, 6, 7, 9, and 10), and an additional class defined by the recently identified human HDAC 11 . This third class has been named class 4 to distinguish it from the unrelated NAD-dependent class 3, i.e. the sirtuin deacetylases related to the yeast Sir2 protein . Orthologues of the eukaryotic HDACs are found in prokaryotes [9, 11], and phylogenetic analyses indicate that most of them can confidently be assigned to one or another of the three classes distinguished among eukaryotic HDACs . These prokaryotic proteins act biochemically on non-histone substrates and are usually labelled as 'acetoin utilization proteins' or 'acetylpolyamine amidohydrolases' with reference, respectively, to their involvement in the utilization of the carbon source acetoin or in the deacetylation of polyamines such as spermine . It is known, however, that acetylpolyamine aminohydrolases share some important functional features with eukaryotic histone deacetylases, as both: (i) recognize an acetylated aminoalkyl group; (ii) catalyse the removal of the acetyl group by cleaving an amide bond; and (iii) increase the positive charge of the substrate .
In this study, we have identified, through similarity searches against sequenced genomes and EST data, a very large sampling of putative eukaryotic and prokaryotic proteins belonging to the class 4 HDAC family. In the remainder of this paper we call these 'class 4 HDACs' on the sole basis of their orthology to the characterized class 4 HDACs of metazoans, and irrespective of their actual functional specificities, which have not been characterized. By means of multiple phylogenetic analyses, we show that the class 4 HDACs display unexpected phylogenetic relationships, at odds with the phylogeny of the corresponding species. Some eukaryotic proteins appear more closely related to eubacterial proteins than to those of related eukaryotic species. We discuss the possibility that this anomalous phyletic distribution might be the consequence of multiple ancient horizontal gene transfers between prokaryotes and eukaryotes, or alternatively, the result of gene duplication and a high rate of differential gene loss.
Derivation of a comprehensive set of class 4 HDACs
Phylogenetic analyses of the class 4 HDACs
We performed a multiple alignment of the retrieved class 4 HDACs of 82 different species and used this alignment to construct phylogenetic trees. We then applied several different phylogenetic methods (as described in the legend of Figure 1 and under Methods) to reconstruct evolutionary relationships among the class 4 HDACs. We used both statistical support (bootstrap values, quartet puzzling support values, and posterior marginal probabilities) and congruence between the different phylogenetic methods as indicators of the reliability of the different internal branches of the tree. Figure 1 summarizes these results. The trees obtained by the different phylogenetic methods can be found in the Additional Files 3, 4, 5).
We found two large well-supported monophyletic groups (Figure 1; black circles). One group, which we named the 'eukaryotic group', includes only eukaryotic proteins of animals (metazoa), land plants and a green alga (viridiplantae), and ciliates (alveolata), i.e. of representatives of three of the main eukaryotic lineages  (opisthokonta, plantae, and chromalveolata, respectively). The other group, called the 'mixed group', includes proteins of representatives of various lineages, both eubacterial and of eukaryotic: animals (metazoa, opisthokonta), green algae (viridiplantae, plantae), a red alga (rhodophyta, plantae), diatoms (stramenopiles, chromalveolata), and a coccolithophore alga (haptophyceae, chromalveolata). This mixed group includes a well-supported monophyletic group (Figure 1; grey circle) comprising eukaryotic sequences and sequences from cyanobacteria and proteobacteria.
The phylogeny of the class 4 HDACs appears, in many respects, at odds with the phylogeny of the species in which these proteins are found. In the mixed group, we identified, for example, a monophyletic group of nine animal proteins showing closer resemblance to eubacterial proteins than to those of other animals (Figure 1; red circle). This group includes sequences belonging to representatives of several animal lineages: a cnidarian (Nematostella vectensis), two arthropods (Callinectes sapidus, a crustacean, and Locusta migratoria, an insect), an annelid (Platynereis dumerilii), an echinoderm (Strongylocentrotus purpuratus), and four vertebrates (the teleost fishes Takifugu rubripes, Oryzias latipes, Gasterosteus aculeatus, and Pimephales promelas). Strikingly, the class 4 HDAC found in these teleosts is only distantly related to that found in another teleost fish, the zebrafish Danio rerio, and is more closely related to eubacterial proteins (Figure 1). Similarly, one class 4 HDAC found in Locusta migratoria is closer to those found in eubacteria than to those of other insects (Drosophila melanogaster, Anopheles gambiae, Apis melifera, and Tribolium castaneum) and to the second class 4 HDAC of Locusta migratoria. In the mixed group, we also found class 4 HDACs in two green algae, Chlamydomonas reinhardtii and Ostreococcus tauri (Figure 1), appearing more similar to eubacterial class 4 HDACs than to those of other viridiplantae, such as Arabidopsis thaliana, Oryza sativa, or to the second class 4 HDAC of Chlamydomonas reinhardtii. We further noted the existence of a monophyletic group including proteins of very distant eukaryotic species (Figure 1, yellow circle): the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum (chromalveolata), the red alga Cyanidioschyon merolae, and the green alga Ostreococcus tauri (plantae). The green alga sequence is thus more closely related to those of the diatoms, which are evolutionarily quite distant, than to those of any other viridiplantae. Finally, we found a monophyletic group comprising the second HDAC sequence found in the genome of the cnidarian Nematostella vectensis and the sequences of two distantly related eubacteria, Cytophaga hutchinsonii (a Bacteroides species) and Psychrobacter cryhalolentis (a γ-proteobacterium) (Figure 1, orange circle).
There is thus a clear incongruence between the HDAC protein tree and the phylogenetic tree of the corresponding species. We then looked more closely at the distribution of class 4 HDACs in eukaryotes (Figure 2). Class 4 HDACs are found in three of the main lineages of eukaryotes (Chromalveolata, Plantae, and Opisthokonta) . Inside each of these groups, some species possess proteins belonging to the eukaryotic group and others display proteins of the mixed group, with a few species possessing both types (Figure 2). The eukaryotic class 4 HDACs thus clearly display an anomalous phyletic distribution, given our current view of the phylogenetic tree of eukaryotes.
Two main mechanisms might account for the unexpected phylogenetic relationships among class 4 HDACs and the anomalous phyletic distribution of the eukaryotic ones: (i) ancient gene duplication followed by differential gene loss or (ii) horizontal gene transfer (HGT).
One main problem with this view is that both types of class 4 HDAC genes must have coexisted in the ancestors of lineages (e.g. metazoans and viridiplantae) where some descendants have one type of HDAC and other descendants have the other type. We would expect many of these organisms to still possess both genes, but as a rule, this is not so (Figure 2). We found both gene types in only three eukaryotic species, as opposed to 37 species possessing only one gene. As more genomes are sequenced, more will probably be found to contain both genes, but the presence of a single gene in most genomes studied to date does not support the notion that the two categories of class 4 HDACs represent two paralogous groups that originated early in eukaryotic evolution. In addition, the model of ancient gene duplication followed by differential gene loss fails to fully explain some of our observations, such as the strongly supported separation of two diatom HDACs, those of Thalassiosira pseudonana and Phaeodactylum tricornutum, within the mixed group (Figure 1). Much more complicated scenarios are therefore required, making this model less plausible and not especially parsimonious.
The other main possibility is HGT, the transmission of genetic material from one species to another (Figure 3B). HGT is a widespread and important phenomenon in prokaryotes. It is one of the driving forces of genome evolution in both archaea and eubacteria [13, 14, 15, 16, 17, 18]. Over the past few years, it has become increasingly clear that HGT has had an impact on eukaryote evolution also, at least in the case of unicellular and/or parasitic eukaryotes [19, 20, 21, 22, 23, 24], yet the occurrence and the importance of HGT in organisms such as land plants and animals is less obvious and very controversial. Although claims have been made for HGT in multicellular organisms, only very few cases have been clearly demonstrated, and these mainly concern eukaryote-eukaryote and/or host-parasite gene transfer [15, 25, 26, 27, 28, 29, 30, 31]. The main criteria used in the aforementioned publications to detect HGT are unexpected phyletic distribution, differential presence or absence in closely related species, and incongruent phylogenetic trees [15, 16, 23, 32]. Our data meet all these criteria (see Figures 1 and 2), and are therefore very suggestive of the occurrence of HGTs having shaped the evolutionary history of the class 4 HDACs.
Although our data do not allow a firm determination of the direction of these putative HGTs (the identity of donors and recipients remains unknown), we favour the hypothesis that transfer occurred from prokaryotes to eukaryotes, and that the eukaryotic-group members are the 'original' eukaryotic HDACs and the mixed-group members are the 'transferred' HDACS. In support of this view, class 4 HDACs are found in many diverse eubacterial species representative of most major eubacterial lineages (Figure 1). To imagine that a class 4 HDAC was present in early eubacterial evolution (and subsequently transferred a few times to eukaryotes) is a more parsimonious mechanism than to postulate that the different prokaryotic class 4 HDACs were acquired from eukaryotes by numerous independent HGTs.
An important feature of the putative prokaryote-eukaryote HGTs is that most of them are probably ancient, as indicated by the species ranges covered by the monophyletic groups distinguished among the mixed-group eukaryotic HDACs (Figure 1). For example, the existence of the aforementioned monophyletic group comprising all nine metazoan sequences indicates that the transferred gene was already present in the last common ancestor of these animals, i.e. in that of most or all animals. This means that the recipient of the putative HGT was not a present-day complex metazoan but an ancient, probably much more simple (maybe unicellular) ancestor. This is important, as gene transfers from prokaryotes to eukaryotes with sequestered germ lines, such as most present-day animals, appear to be very rare ; almost all other putative HGTs we have detected concern unicellular eukaryotes. A possible exception concerns the second HDAC sequence found in the genome of the cnidarian Nematostella vectensis, which forms a monophyletic group with the sequences of Cytophaga hutchinsonii and Psychrobacter cryhalolentis (Figure 1, orange circle). Although we cannot rule out contamination of the genomic data from which these sequences were obtained, this might be indicative of a much more recent HGT involving a complex multicellular organism.
Besides these putative eubacterium-eukaryote transfers, there is also the possibility of at least one eukaryote-eukaryote HGT. This is suggested by the existence of a monophyletic group including very distant eukaryotic species (Figure 1, yellow circle): the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum (chromalveolata), the red alga Cyanidioschyon merolae, and the green alga Ostreococcus tauri (plantae). The green alga sequence is more closely related to those of the evolutionarily very distant diatoms than to those of any other viridiplantae. We suggest that this association may be the result of eukaryote-eukaryote HGTs between these phytoplanctonic species.
Lastly, we note that in most lineages only a single HDAC is found (Figures 1 and 2), yet two different proteins are found in the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum, the cnidarian Nematostella vectensis, and the green alga Ostreococcus tauri. In all these cases, both proteins belong to the mixed group and are not closely related, suggesting independent HGTs. The existence of eukaryotes with only mixed-group HDACs (and thus lacking a eukaryotic-group member) suggests that gene transfer was sometimes followed by functional replacement of the 'original' eukaryotic gene by the transferred one. The only eukaryotes to possess both a mixed-group and a eukaryotic-group protein are the green alga Chlamydomonas reinhardtii and two animals (Strongylocentrotus purpuratus and Locusta migratoria) (Figures 1 and 2). Similar multiple replacements have been reported for the eukaryotic translation elongation factor 1α . Whether these replacements are due solely to chance or have selective advantages  is an open question that awaits functional and biochemical characterization of the proteins and still broader sampling of eukaryotic HDAC genes.
The results presented here shed new light on the evolutionary history of class 4 HDACs. These proteins display unexpected phylogenetic relationships, at odds with the phylogeny of the corresponding species, suggestive of ancient horizontal gene transfers between prokaryotes and eukaryotes. This suggests that the evolution of important eukaryotic multigene families, such as the histone deacetylase gene family, may have been shaped by horizontal gene transfers.
Class 4 HDAC sequences were retrieved through BLAST searches  on protein and genome data, mainly from the NCBI , the Doe JGI , the Sanger Institute , the Baylor College of Medicine , the Genoscope , and the TIGR  databases. To ascertain that we identified class 4 HDACs, we first used the 'reciprocal best BLAST hit' criterion. We retained for each species only the best BLAST hits, using known class 4 HDACs (of animals) as queries. We then performed the reciprocal BLAST using the obtained sequences as queries against the NCBI NR database and verified that the class 4 HDACs initially used in the first BLAST search are the best BLAST hits in the corresponding species. All class 4 HDACs identified are listed in Additional File 2. As most of the identified sequences come from EST data and unfinished genomes, we were concerned about the possibility that some of them might represent contamination of the genomic data. We list our argument against this possibility in Additional File 6. In order to detect potential bacterial contaminations, we also performed an analysis of the codon usage of the HDAC coding sequences compared to the corresponding genomes. This analysis, which is shown in Additional Files 6 and 7, does not show any evidence for contamination.
Multiple alignments were performed with Clustal W  and subsequently manually improved. We performed two types of alignments, class 4 HDACs with HDACs of other classes and class 4 HDACs alone. The first type of alignment was used to verify the monophyly of the class 4 HDACs and thus to ascertain that we had identified bona fide class 4 HDACs (see Additional File 1). The second type of alignment was used to determine phylogenetic relationships among class 4 HDACs. In establishing the phylogeny of class 4 HDACs, we avoided using the first type of alignment (with other HDCAs serving as outgoups) to prevent potential phylogenetic reconstruction artefacts due to the presence of distant outliers (class 4 HDACs diverge considerably from other HDACs, not shown) . We used both a multiple alignment containing the whole protein sequences and a multiple alignment containing only the regions with unequivocal alignment. Both alignments gave the same tree topologies. The alignment of the whole proteins (used to produce the trees shown in this paper) can be found in Additional File 8.
Unweighted maximum-parsimony (MP) and neighbour-joining (NJ) reconstructions were performed with the PAUP 4.0 program . MP analyses were performed with the following settings: heuristic search of over 500 bootstrap replicates, MAXTREES set at 2000, and other parameters set at default values. Maximum likelihood (ML) analyses were performed with PHYML  and TreePuzzle . PHYML analyses were performed using two different amino-acid substitution models, the Jones-Taylor-Thornton (JTT) model  and the Whelan and Goldman (WAG) model , the frequencies of amino acids being estimated from the data set, and rate heterogeneity across sites being modelled by two rate categories (one constant and eight γ rates). Statistical support for the different internal branches was assessed by bootstrap resampling (100 bootstrap replicates), as implemented in PHYML . Bootstrap consensus trees were constructed with the PAUP 4.0 program. Treepuzzle analyses were performed by means of the quartet puzzling tree search procedure, with 25,000 puzzling steps . We used the WAG model of substitution  and the frequencies of amino acids being estimated from the data set, and allowed rate heterogeneity across sites to be modelled by two rate categories (one constant and eight γ rates) . Bayesian inference was performed using the Markov chain Monte Carlo method as implemented in the MRBAYES (version 3) package [48, 49]. We used the WAG substitution frequency matrix  with among-sites rate variation modelled by means of a discrete γ distribution with four equally probable categories. Two independent Markov chains were run, each containing 1,000,000 Monte Carlo steps, after a burn-in of 400,000 steps. One out of every 100 trees was saved. For each run, we computed the majority consensus of the obtained trees by means of the PAUP 4.0 program. The same consensus tree was obtained for both runs. Marginal probabilities at each node were taken as a measure of statistical support. The discrepancy between the estimated probabilities obtained in the two runs was 5% on the average and never exceeded 11%. The results obtained from the two runs are thus consistent, so that we finally combined them by gathering the trees of both samples.
We are very grateful to Hervé Philippe for his essential comments and suggestions during the early course of this work and to Jacques van Helden for his participation in the analysis of the codon usage of the HDAC genes. We thank Genoscope for providing the Platynereis HDAC sequence. V.L. thanks Robert Herzog and Marc Colet for support. This work was supported by the CNRS and the Ministère de la Recherche through its ACI 'Jeunes chercheurs et jeunes chercheuses' (MV) and by the Belgian Science Policy (VL).
- 20.Koonin EV, Makarova KS, Rogozin IB, Davidovic L, Letellier MC, Pellegrini L: The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers. Genome Biol. 2003, 4: R19-10.1186/gb-2003-4-3-r19.PubMedCentralCrossRefPubMedGoogle Scholar
- 35.The National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov/]
- 36.The Doe Joint Genome Institute. [http://www.jgi.doe.gov/]
- 37.The Welcome Trust Sanger Institute. [http://www.sanger.ac.uk/]
- 38.The Human Genome Sequencing Center. [http://www.hgsc.bcm.tmc.edu/]
- 39.Genoscope: Centre National de Séquençage. [http://www.genoscope.cns.fr/]
- 40.The Institute for Genomic Research. [http://www.tigr.org/]
- 43.Swofford DL: PAUP: Phylogenetic Analysis Using Parsimony (and Other Methods), Version 4. 1998, Sunderland, MA: SinauerGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.