Phylogeny-wide conservation and change in developmental expression, cell-type specificity and functional domains of the transcriptional regulators of social amoebas
- 294 Downloads
Dictyostelid social amoebas self-organize into fruiting bodies, consisting of spores and up to four supporting cell types in the phenotypically most complex taxon group 4. High quality genomes and stage- and cell-type specific transcriptomes are available for representative species of each of the four taxon groups. To understand how evolution of gene regulation in Dictyostelia contributed to evolution of phenotypic complexity, we analysed conservation and change in abundance, functional domain architecture and developmental regulation of their transcription factors (TFs).
We detected 440 sequence-specific TFs across 33 families, of which 68% were upregulated in multicellular development and about half conserved throughout Dictyostelia. Prespore cells expressed two times more TFs than prestalk cells, but stalk cells expressed more TFs than spores, suggesting that gene expression events that define spores occur earlier than those that define stalk cells. Changes in TF developmental expression, but not in TF abundance or functional domains occurred more frequently between group 4 and groups 1–3, than between the more distant branches formed by groups 1 + 2 and 3 + 4.
Phenotypic innovation is correlated with changes in TF regulation, rather than functional domain- or TF acquisition. The function of only 34 TFs is known. Of 12 TFs essential for cell differentiation, 9 are expressed in the cell type for which they are required. The information acquired here on conserved cell type specifity of 120 additional TFs can effectively guide further functional analysis, while observed evolutionary change in TF developmental expression may highlight how genotypic change caused phenotypic innovation.
KeywordsDictyostelia Evolution of transcriptional regulation Evolution of phenotype Comparative genomics Comparative transcriptomics Amoebozoa
Basic Local Alignment Search Tool
Simple Modular Architecture Research Tool
Multicellularity enables organisms to specialize their cells for different functions and to organize the specialized cells into a wide array of tissues and organs. Cell-type specialization results from selective gene transcription, which is largely achieved by the binding of sequence-specific transcription factors upstream of the trancription start site in the 5′ intergenic regions of protein coding genes. The regulation of the activity of these factors by intercellular communication and environmental cues is one of the major mechanisms that allow fertilized eggs to develop into functioning adults. The duplication and diversification transcription factor genes and their expression is considered to have been a major mechanism for acquisition of ever-increasing cell-type specialization and organismal complexity in the course of evolution .
Dictyostelid social amoebas represent an early type of multicellularity where cells feed as individuals, but come together when starved to form multicellular aggregates. The aggregates transform into migrating slugs and fruiting bodies, which, depending on the species, contain spores and up to four more cell-types . This life cycle evolved from that of the solitary amoebas, which encyst individually when starved. Encystment still occurs in some Dictyostelia, when conditions for aggregation are unfavourable .
We aim to understand how the gene regulatory mechanisms that caused cell-type specialization evolved in early multicellular organisms, using the genetically tractable Dictyostelia to investigate this problem. Molecular phylogenies subdivide Dictyostelia into four major and some minor groups [4, 5], with most novel cell types appearing in group 4 [6, 7], which contains the model organism Dictyostelium discoideum. Following completion of the D. discoideum genome sequence , we obtained genome sequences for a representative species in each of the three other taxon groups, which were almost fully assembled by primer walking [9, 10]. Others and ourselves obtained transcriptome data across taxon groups of purified cell types and during developmental progression into fruiting bodies and cysts, both earlier [10, 11, 12] and in this work. The high quality genomes and transcriptomes allow us to retrace changes in the abundance, expression profiles, cell type specificity and functional domain architecture of Dictyostelium transcriptional factors (TFs) throughout the course of their evolution.
We here present conservation and change in 440 sequence-specific and 42 general TFs of Dictyostelia, highlighting associations between particular TF families and specific developmental roles, taxon group-specific gene amplification and loss, and evolutionary changes in the cell-type specificity and developmental regulation of TFs.
Identification and conservation of transcription factor families
The genomes of D. discoideum (Ddis) and D. purpureum (Dpur) in group 4, D. lacteum (Dlac) in group 3, P. pallidum (Ppal) in group 2 and D. fasciculatum (Dfas) in group 1 were screened for the presence of members of the 97 known eukaryotic families of sequence–specific transcription factors . Groups 1, 2, 3 and 4 have recently been reclassified as families with the names Cavenderiaceae, Acytosteliaceae, Raperosteliaceae and Dictyosteliaceae, while Dlac, Ppal and Dfas have been renamed to Tieghemostelium lacteum, Heterostelium album and Cavenderia fasciculata . However, this classification was based on the single gene small subunit ribosomal DNA phylogeny , which was superseded by more robust multi-gene phylogenies, which only partially support the new classification [5, 15]. We therefore continue to use the older nomenclature here.
Sequence-specific transcription factors detected in Dictyostelia
Eukaryote sequence-specific transcription factor families
not in Dictyostelia
Overall, 35% of sequence-specific and 86% of gTFs were conserved over all five genomes (Fig. 4a). The Dpur genome is most often missing an ortholog, but this is likely an artefact due to it being the only partially assembled draft genome. The large family of GATA TFs shows the most extensive genome-specific gain of individual members. Across sequence-specific TFs, gene amplification occurs about equally frequently in Ddis, Dpur and Ppal, but is lower in Dfas and much reduced in Dlac (Fig. 4b), which correlates with and may partially cause the small genome size of Dlac (23 Mbp versus ~ 31–35 Mbp for the others [9, 10]).
Conservation of functional domains and developmental expression
Functional domain architecture is conserved in the greater majority of orthologs (Fig. 5a), except for the AT-hook and C2H2 TFs, where the small domains (12 amino acids for AT-hook, 23 amino acids for C2H2) are often not recognized in some orthologs. Compared to a set of 385 developmentally essential genes , the domain architecture of TFs is mostly simple, containing little else than the signature DNA binding domain. There is therefore less opportunity for domain change. More than half of all orthologous sets of TFs show differences in the developmental expression profiles of their member genes. Because change in gene expression may cause individual TF’s to take on novel roles, we were particularly interested in the phylogenetic distribution of such changes. Figure 5b shows that across TF families, developmental expression was most frequently divergent in only one species. In those cases where it was divergent in two or three species, the difference most frequently occurred between group 4 and the other groups and less frequently between the more distantly related branch I and branch II, or scattered across the phylogeny. This is particularly evident in the compiled sets of all sequence-specific TFs, the combined families with three or less members and the general TFs (1st, 2nd and last bars of Fig. 5b) and for the E2F_DP and MIZ TFs. On the other hand, for bZIPs divergent gene regulation occurred only scattered across the phylogeny.
Divergence in functional domain architecture also affects single species most, but is then mostly scattered across the phylogeny (Fig. 5a) and the same is true for conservation of the TF genes themselves (Fig. 4c). This difference between conservation of gene function and conservation of gene expression was also observed for the set of 385 developmentally essential genes, where changes in gene expression were more group 4-specific and changes in functional domains more scattered across the phylogeny . Analysis of 25 phenotypic traits over 99 Dictyostelium species showed that the most dramatic changes in phenotype occurred in the last common ancestor to group 4 [6, 7]. The current and earlier analyses of genotypic change indicate that these phenotypic innovations were more likely caused by changes in the regulation of existing genes than by the appearance of novel genes or novel functional domains. The observed limited importance of change in functional domains does however not exclude that more subtle mutations that alter gene function strongly affect phenotypic evolution.
When comparing developmental expression profiles across TF families (Fig. 5c), it is striking that except for the general transcription factors which are mostly constitutively expressed, over 70% of the sequence-specific transcription factors are upregulated after the transition from growth to development, with the small families of Cud and MIZ TFs being exclusively expressed in development. Early upregulation around the aggregate stage or a peak of expression in mid-development are the most dominant expression profiles. Apart from the jmjC TFs, no sequence-specific TFs are predominantly expressed in the vegetative stage.
Cell-type specificity of transcription factors
To investigate whether families of transcription factors are associated with specific cell fates, we also calculated how families with more than 3 members were percentage-wise expressed in each of the six scored cell types and for Ppal in the process of encystation. Across all sequence-specific TFs, 38% was specifically expressed in the prespore cells and 18% in the prestalk cells of group 4 slugs, and this difference was even more extreme for the general TFs with 45 and 5% expression in prespore and prestalk cells respectively (Fig. 5d). Only the JmjC and GATA families contained more members with prestalk than prespore expression, while no MADS or STAT TFs were specifically expressed in prespore cells and no E2F_DP, CBF or GBF TFs in prestalk cells.
In the fruiting body stage, this cell fate specificity was almost reversed for the sequence-specific TFs, of which 14% were expressed in spores and 17% in stalk cells (Fig. 5e). Another 5% of TFs were expressed in cup cells, a population that is derived from prestalk cells [12, 18, 19]. This suggests that most genes that define the spore phenotype are already expressed in the slug stage, but that those that define the stalk and cup phenotypes are only expressed late in fruiting body formation. Here there was also evidence for more cell-type preference of TF families, with bZIP and AT-hook TFs favouring expression in spores and the GATAs, Hox TFs and members of the small families of Gal4, MADS and Cud TFs favouring expression in stalk cells. CBFs, GBFs and MIZ TFs favour expression in cup cells. For the MADS TFs, their stalk and cup preference is consistent with their prestalk preference, but for the GBFs it is the reverse of their prespore preference.
As was also evident from the developmental profiles (Fig. 5c), many more sequence-specific TFs are specifically expressed during development into fruiting bodies than in the vegetative stage, but this not the case for the general TFs, which as expected are more constitutively expressed. Finally, in Ppal, where in addition to multicellular development, starving amoebas can also individually encyst, over 30% of members of all families are upregulated during the encystation process.
Predicted roles for TFs from cell-type specificity and developmental profiles
Cell-type specific transcription factors
Lastly, we explored the extent to which cell type specificity predicts TF function. Of the 254 TFs detected in Ddis, there is only functional information from gene knock-outs and knock-down studies for 34 TF genes. Deletion of 12 TFs causes specific defects in, or lack of, terminally differentiated cell types and 9 of these TFs are only expressed in the cell type that is lost upon knock-out (Additional file 5: Table S4). Deletion of 9 TFs causes alterations in the proportion of prespore and prestalk cells. Of this set only 2 TFs are specific to the diminished cell-type and 1 TF is specific to the increased cell type. The remaining 6 TFs are not cell-type enriched. This suggest that cell-type specificity of TFs predicts their role in ultimate cell fate well, but that cell type proportioning is subject to more subtle cross-regulation. Also, logically, a TF that instigates a presumptive cell fate has to be present before that fate is assigned.
Across five genomes that represent the four major groups of Dictyostelia, around 440 different sequence-specific TFs across 33 TF families were detected. Due to genome- and species-specific gene amplification, this is about twice the number of TFs present in individual genomes. For instance, we detected 254 TFs in Ddis (as opposed to 106 in the initial genome annotation ), of which a core set of 181 TFs is conserved across at least three other genomes.
The large family of GATA TFs is subject to extensive single gene amplification and the number of conserved genes in this family is therefore low. On the other hand, members of the almost equally large family of Myb TFs are mostly conserved. Nine members of the Pipsqueak family are unique to one genome (Ppal) and are all strongly upregulated in encystation. Gene amplification occurred about equally across four genomes, but was much lower in the Dlac genome, which is also 1/3rd smaller than the other four.
Changes in developmental expression profiles of conserved TFs occurred more frequently between group 4 and groups 1–3, than between the more distantly related branches I and II. This correlates with phenotypic change, which is also most pronounced between group 4 and the other three groups [6, 7]. Since group 4 has neither more novel TFs nor more different functional domains in its TFs, this suggests that altered expression of existing TFs plays an important role in phenotypic innovation.
There are marked differences between TF families in developmental expression with e.g. 78% of bZIPs being developmentally up-regulated and 77% of jmjC TFs being constitutively expressed or developmentally down-regulated. Not surprisingly, most (65%) of the general TFs are constitutively expressed or down-regulated after growth, but across all sequence-specific TFs, 68% are developmentally up-regulated. This suggests that most of the Dictyostelid sequence-specific transcriptional machinery serves the developmental programme, with a relatively low number of TFs left to adapt cells to environmental challenges in the growth stage.
The prespore cells in slugs express over two times more TFs than the prestalk cells, with particularly many AT-hook, CBF, E2F-DP, GBF and general TFs being prespore-specific. However, this changes in the fruiting body stage, when the stalk cells express somewhat more TFs, with some smaller families like the CudA-like, Gal4-like, GbfA-like and MADS TFs being solely expressed in cells of the stalk and cup. Strikingly, TFs that are essential for spore formation, such a cudA, spaA and stkA [20, 21, 22] are expressed in prespore, but not spore cells, as if upon sporulation their task is finished. This pattern is similar across all prespore-specifc TFs, of which only 12% persists into the spores. For the prestalk-specific TFs, 34% remain expressed in the stalk and cup. This temporal disparity in cell type specific gene expression likely reflects the different ontogenies of the mature cell types. The prespore cells start prefabrication of the spore wall in Golgi-derived vesicles after aggregation. The vesicles fuse with the plasmamembrane during spore maturation, thus rapidly completing the cell wall . In contrast, stalk cells start cell wall synthesis gradually from the tip at the onset of fruiting body formation, while most cup genes are only expressed once the fruiting body is fully formed .
About 34 of the 254 TF genes of Ddis have been deleted, resulting in specific loss of or severely defective mature cell types for 12 TFs. For 9 out of 12 cases, the TF was in normal development expressed in the affected cell type and all 12 TFs were conserved throughout Dictyostelia. This implies that bioinformatics-based evidence on cell-type specificity and gene conservation is likely a useful tool for guiding discovery of the function of many of the remaining 220 TF genes.
Dictyostelia jointly contain 440 different sequence-specific TFs, which are subdivided across 33 families, of which four are thus far unique to Amoebozoa.
Only 32% of sequence-specific TFs are expressed constitutively or during growth, while the rest is developmentally up-regulated, indicating that most of transcriptional machinery serves the multicellular phase of the life cycle.
Changes in developmental expression of TFs, but not in TF functional domains or TF gene gain or loss, are correlated with major changes in phenotype across Dictyostelia, suggesting that altered expression of TFs is a major driver of phenotypic change.
The study presents detailed information on cell-type specificity of TFs, which correlates with an essential role in cell differentiation for 9 out of 12 TFs with known functions. This makes the current analysis an effective tool for gene function discovery.
Sequence retrieval and phylogeny reconstruction
TF protein sequences were firstly retrieved from the Ddis, Dlac, Ppal and Dfas genomes using the Interpro (https://www.ebi.ac.uk/interpro/) domain identifiers of all known TF families as query in the “advanced search” option of the social amoeba comparative genome browser SACGB (http://sacgb.fli-leibniz.de/cgi/index.pl). For Dpur a similar query was performed in the Pubmed “protein” option (https://www.ncbi.nlm.nih.gov/pubmed) with the combined query “Dictyostelium purpureum and [Interpro domain identifier]”. Next, a BLAST library was prepared in CLC-workbench v8.0 (https://www.qiagenbioinformatics.com) from the combined Ddis, Dpur, Dlac, Ppal and Dfas proteomes, downloaded from Dictybase (http://dictybase.org/) and SACGB, which was queried with the protein sequences of representative functional domains of each TF family.
The domain architectures all detected proteins were analysed using SMART , with the visual display of the architecture saved as an .svg file. The domain coordinates were used to isolate the sequences corresponding to the TF functional domains. These sequences were subsequently aligned using Clustal Omega  with 5 combined iterations. When functional domain sequences were short, a stretch of 20 amino-acids flanking the domain on either side was included in the alignment. Phylogenies were constructed using RAxML in Topali v2.5  or MrBayes v3.2.6 , with the latter run for 106 generations, using a mixed amino acid model with rate variation between sites estimated by a gamma distribution. When otherwise conserved genes appeared to be absent from species, their proteomes or genomes were queried once more by BLASTp or tBLASTn, respectively, using the orthologous sequence as bait. Phylogenetic trees were then reconstructed, including the novel sequences. Trees were rooted at midpoint using FigTree v1.3.1. and saved as .svg files. The tree .svg file was combined with the domain architecture .svg files for each protein in Adobe Illustrator CS5.
RNA sequencing and analysis
To obtain total RNA for Dlac stalk, spore and vegetative cells, amoebas were co-cultured with Klebsiella aerogenes on lactose-peptone agar. For vegetative cells, cells were harvested before bacteria started to clear. For stalk and spore cells, cells were harvested, freed from bacteria and incubated for 24 h on non-nutrient agar until fruiting bodies had formed. Spores were separated from stalks and RNA was isolated from the three cell types as described previously . The qualities of the RNAs isolated in three independent experiments were assessed with TapeStation (Agilent) to be good (RIN > 7.5) and cDNA libraries were prepared using the Truseq Stranded mRNA Library Prep Kit (Illumina) with Low Sample Protocol. 75-bp paired end reads were sequenced with Illumina NextSeq 500 at the Tayside Centre for Genomic Analysis in two independent runs. The qualities of the RNA-Seq reads were inspected with FastQC . The RNA-Seq reads were then mapped to the previously assembled transcriptome of D. lacteum  using RSEM  with the bowtie2 aligner and with the read start position distribution (RSPD) estimation option. The read counts were normalized to Transcripts Per Million (TPM)  with RSEM.
To monitor gene expression during Ppal encystation, Ppal PN500 was co-cultured with K. aerogenes on LP agar. Cells were freed from bacteria and incubated at 2.5 × 106 cells/ml in 250 mM sorbitol in 20 mM K-phosphate to induce encystation . Total RNA was extracted with an RNAeasy Midi Kit (Qiagen), directly after harvest (t = 0 h) and after 8, 16 and 24 h of incubation at 22 °C, at which point 80% of cells had encysted. Library construction, sequencing and sequence quality control and mapping of transcripts to the Ppal genome  were performed by Eurofins Genomics (https://www.eurofinsgenomics.eu/). Paired-end Illumina sequencing was performed on the Hi-seq2000 platform using the TruSeq (TM) SBS v5 sequencing kit. A total of 177,292,620 reads containing 8.8 Mb were obtained. The reads were mapped to the Ppal genome, using BWA 0.5.8c software (http://bio-bwa.sourceforge.net). The read counts were then normalized to reads per kilobase per million mapped reads (RPKM).
For comparative analysis of developmental expression and cell type specificity of TF genes across the Dictyostelid phylogeny, normalized read counts from published and purpose-sequenced gene expression studies were combined into a single spreadsheet (Additional file 2: Table S1). The data include i. replicate developmental profiles for Ddis and Dpur obtained by Illumina sequencing, combined with RNAseq data of purified prestalk and prespore cells of migrating slugs , ii. Averaged read counts of three RNAseq experiments comparing purified spore-, stalk- and cup cells from mature Ddis fruiting bodies and vegetative cells , iii. Averaged read counts of three RNAseq experiments comparing purified spore- and stalk cup cells from Dlac fruiting bodies and vegetative cells. iv. A single developmental profile for Dlac and replicate developmental profiles for Ppal and Dfas , combined for Ppal with RNAseq data of purified stalk and spore cells and 24 and 48 h time points of encystation, vi. A separate 24 h time course of Ppal encystation. The developmental profiles are aligned between species with respect to developmental stage, rather than developmental time because species do not develop at the same rate. For each set of orthologous genes, or groups of amplified genes, the normalized read counts for each of the features listed above were transferred to Excel files and recalculated as fraction of the maximum read count for developmental profiles and as fraction of the sum of counts for cell-type specificity data. The conditional formatting option in Excel was used to generate heat maps, which were matched up with the phylogenetic trees in Adobe Illustrator.
GF performed most of the bioinformatics and wrote parts of the manuscript. PS designed the study and completed the manuscript. ZC, HL, CS and YY analysed some TF families. CS isolated RNAs during an encystation time course and KK isolated cell-type specific RNAs and performed RNAseq data analysis. All authors have read and approved the manuscript
YY and PS were supported by Wellcome Trust grant 100293/Z/12/Z, GF, HL and CS were supported by ERC grant 742288, ZC was supported by Leverhulme Trust grant RPG-2016-220 and KK was supported by an EMBO Long-term fellowship and Marie Curie Action ALTF 295–2015 and by JSPS Overseas Research Fellowship H28–1002. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data or in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 6.Romeralo M, Skiba A, Gonzalez-Voyer A, Schilde C, Lawal H, Kedziora S, Cavender JC, Glockner G, Urushihara H, Schaap P. Analysis of phenotypic evolution in Dictyostelia highlights developmental plasticity as a likely consequence of colonial multicellularity. Proc Biol Sci. 2013;280:20130976.CrossRefGoogle Scholar
- 18.Sternfeld J, David CN. Fate and regulation of anterior-like cells in Dictyostelium slugs. DevBiol. 1982;93:111–8.Google Scholar
- 26.FastQC, a quality control tool for high throughput sequence data [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/].
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.