Genome-wide analysis of gene regulation mechanisms during Drosophila spermatogenesis
During Drosophila spermatogenesis, testis-specific meiotic arrest complex (tMAC) and testis-specific TBP-associated factors (tTAF) contribute to activation of hundreds of genes required for meiosis and spermiogenesis. Intriguingly, tMAC is paralogous to the broadly expressed complex Myb-MuvB (MMB)/dREAM and Mip40 protein is shared by both complexes. tMAC acts as a gene activator in spermatocytes, while MMB/dREAM was shown to repress gene activity in many cell types.
Our study addresses the intricate interplay between tMAC, tTAF, and MMB/dREAM during spermatogenesis. We used cell type-specific DamID to build the DNA-binding profiles of Cookie monster (tMAC), Cannonball (tTAF), and Mip40 (MMB/dREAM and tMAC) proteins in male germline cells. Incorporating the whole transcriptome analysis, we characterized the regulatory effects of these proteins and identified their gene targets. This analysis revealed that tTAFs complex is involved in activation of achi, vis, and topi meiosis arrest genes, implying that tTAFs may indirectly contribute to the regulation of Achi, Vis, and Topi targets. To understand the relationship between tMAC and MMB/dREAM, we performed Mip40 DamID in tTAF- and tMAC-deficient mutants demonstrating meiosis arrest phenotype. DamID profiles of Mip40 were highly dynamic across the stages of spermatogenesis and demonstrated a strong dependence on tMAC in spermatocytes. Integrative analysis of our data indicated that MMB/dREAM represses genes that are not expressed in spermatogenesis, whereas tMAC recruits Mip40 for subsequent gene activation in spermatocytes.
Discovered interdependencies allow to formulate a renewed model for tMAC and tTAFs action in Drosophila spermatogenesis demonstrating how tissue-specific genes are regulated.
KeywordsDrosophila Spermatogenesis Gene regulation DamID
testis-specific meiosis arrest complex
testis-specific TBP-associated factor
DP, RB-like, E2F4, and MuvB complex
DNA-adenine methyltransferase identification
Differentiation of spermatogonia into spermatocytes depends on the bag of marbles (bam) gene (Fig. 1a), and mass activation of genes in spermatocytes requires two classes of spermatocyte-specific transcription factors encoded by meiosis arrest group of genes. Mutations in these genes result in meiosis arrest in the G2 that precedes meiosis I (Fig. 1a) and lead to accumulation of mature primary spermatocytes .
Meiosis arrest genes encode the components of two distinct protein complexes. Meiosis arrest complex (tMAC)  includes: Aly (Always early), Comr (Cookie monster), Topi (Matotopetli), and Tomb (Tombola), along with Mip40 (Myb-interacting protein 40) and CAF1-55 (Chromatin assembly factor 1, p55 subunit). These proteins form testis-specific assembly that shares several homologous subunits with the MMB/dREAM complex (Fig. 1b). Other proteins can be involved in tMAC, and their combinations suggest that there may be several tMAC-related complexes [5, 6, 7].
Can (Cannonball), Sa (Spermatocyte arrest), Rye (Ryan express), Mia (Meiosis arrest), and Nht (No hitter) are testis-specific homologues of TBP-associated factors (tTAFs) that probably form a testis-specific paralogue of TFIID (Fig. 1c) [8, 9]. It was previously reported that mutations in tMAC components show dramatic decrease in expression of about 1000 genes; mutations in tTAFs fail to activate about 350 genes, most of which also depend on tMAC .
Previous studies suggested that Polycomb complexes play a central role in repressing spermatocyte-specific genes in undifferentiated precursors [2, 10]. This model, however, has been recently challenged in a genome-wide study that failed to detect association of Polycomb with the promoters of testis-specific genes in spermatogonia . One of the alternative mechanisms of spermatocyte-specific genes repression in spermatogonia may involve MMB/dREAM activity, as this complex has been shown to function as a repressor [12, 13, 14]. In this regard, similarity between tMAC and MMB/dREAM raises the interesting possibility that these complexes interact to regulate spermatocyte-specific gene program. To complete the picture of gene regulation in spermatogenesis, a new mechanism, involving Kmg and dMi-2, that prevents the expression of the somatic genes in Drosophila male germline was recently discovered .
Here, we investigated the binding of tMAC and tTAFs components to the chromosomes and studied their effects on transcription. Specifically, we performed germline cell-specific genome-wide profiling of the Cookie monster (Comr) protein representing tMAC, Mip40, which is a subunit shared by tMAC and MMB/dREAM, and Cannonball (Can, tTAF). Our study revealed the mutual dependencies between these factors that provide the new aspects in regulation of tissue-specific genes.
Germline-specific genome-wide DamID analysis of Comr, Can, and Mip40 identifies their cognate target genes
Despite the fact that mutations in tMAC and tTAFs subunits cause down-regulation of hundreds of genes, very little is known about their direct gene targets. Only three gene targets of the Spermatocyte arrest protein (Sa, tTAF) have been reported in the literature and include dj (don juan), fzo (fuzzy onion), and Mst87F (Male-specific RNA 87F) genes [2, 10].
We used tissue-specific DamID-seq to establish the genome-wide profiles of Cookie monster (Comr, tMAC subunit), Cannonball (Can, tTAFs subunit), and Mip40 (shared between tMAC and MMB/dREAM complexes) specifically in the D. melanogaster male germline [16, 17, 18]. Comr and Can are essential components of tMAC and tTAFs [8, 19, 20]. In our previous paper , we demonstrated direct activating role of Comr in spermatocytes and performed the initial characterization of the interplay between Comr and Can. The present study improves that analysis with higher resolution and sensitivity and allows to uncover the new aspects of regulatory events in Drosophila spermatogenesis.
Peak calling pipeline identified 2140 significantly bound peaks for Can, 5422 peaks for Comr, and 12,981 peaks for Mip40. The following criteria were used for peak calling: FDR<0.05, significance threshold value P < 10−3, and log2(Dam-X/Dam) > 1 (where X stands for one of the proteins mapped, see Methods). Applying these criteria allowed us to detect the most prominent peaks for each Can, Comr or Mip40 (Fig. 2a), which is needed for reliable identification of genes that are under direct control of each protein. The difference between peak numbers was accounted for by the downstream statistical tests so as to calculate the expected threshold values. Genome-scale analysis of peak positions indicated that the colocalization of three proteins is much higher than randomly expected, as assessed with binomial test (Additional file 1: Fig. 1). The Euler diagram describing the intersection of detected peaks (Additional file 2: Fig. 2) shows that despite the significant overlap between the sets of binding sites, considerable amount of stand-alone peaks of Can, Comr, and Mip40 was observed.
Mip40 is a component of tMAC and its absence could affect the distribution of other proteins of this complex, including Comr. On the other hand, putative DNA-binding domain present in Comr protein could ensure its binding to the chromosomes independently from other tMAC components. We performed Comr DamID in testes of mip40 mutants. In mip40 background, virtually all Comr peaks disappeared: only 56 Comr peaks with P < 10−3 were detected in mip40 mutants, while in wild-type 5422 highly specific peaks were found (see above). Comr profiles in mip40 mutants and in wild-type are exemplified in Additional file 3: Fig. 3.
Next, we performed an analysis of the putative DNA motifs within the peaks detected for each of three proteins. Notably, Can peaks appeared to be highly enriched with the consensus sequences of Achi and Vis proteins  that also contribute to gene regulation in testes (Additional file 4: Fig. 4). This suggests that tTAFs may share targets with a complex containing Achi/Vis. Given the non-random coincidence of Comr and Can peaks (Additional file 1: Fig. 1), one would expect similar enrichment with DNA motifs in Comr binding sites. However, no clear consensus motifs were detected in Comr DamID peaks. Probably, Achi/Vis motif found in Can binding sites is masked by the considerable number of non-overlapping peaks between Comr and Can (Additional file 2: Fig. 2). Alternatively, different subsets of Can peaks overlap with Comr and Achi/Vis. Search in Mip40 peaks also did not yield characteristic motif, which could be explained by the involvement of Mip40 into at least two different complexes—tMAC and MMB/dREAM.
Next, we investigated the location of Can, Comr, and Mip40 peaks relative to the 1389 transcripts (Additional file 5: Table 1) that are specifically up-regulated in testes and down-regulated in other tissues (compared to the whole fly according to the FlyAtlas database, see Methods). We calculated relative occurrence of Can, Comr, and Mip40 peaks in promoters (400 bp upstream Transcription Start Sites, TSSs), 5’UTRs, exons, introns, CDSs, and 3’UTRs of these genes and found that all three factors are promoter-proximal (Fig. 2b). Statistical significance was estimated with binomial test (significance threshold P < 10−3 was applied), and expected probabilities were calculated using the genome coverage in each category. A more detailed analysis of genes having Can, Comr, or Mip40 peaks within 1 kb around their TSS demonstrated that Can preferentially binds narrowly at the TSS. Comr and Mip40 demonstrated an asymmetrical binding with the clear shift into regions upstream TSS (Additional file 6: Fig. 5). Remarkably, part of highly significant peaks localized in a considerable distance from genes (681 Can peaks, 2345 Comr peaks, and 5149 Mip40 peaks were located in the intergenic regions, at least 1 kb from the nearest TSS). This could indicate that there are long-distance regulatory effects; however, this suggestion should be tested in direct experiments.
To investigate Can, Comr, and Mip40 contribution to gene regulation, we compared gene expression in wild-type testes with that of can, comr, mip40 mutant males using RNA-seq. We also generated RNA-seq data for bam mutant testes, as they are known to be blocked at spermatogonial stage and served as a reference point. We then explored how Can, Comr, and Mip40 are distributed around TSSs of the genes whose expression changes in the mutants. Therefore, we calculated the distance from each TSS (including alternative TSSs occurring in some genes) to the closest significant enrichment peak of these proteins. We compiled sets of genes that displayed greater than fourfold difference in gene expression and harboring Comr, Can, or Mip40 peaks within 10 kb of their TSSs. We then plotted the distribution of protein enrichment peaks in 1 kb bins around TSSs of such genes (Fig. 2c). Forty-nine percent of genes that are down-regulated at least fourfold in can mutants have a Can peak within 1 kb of TSS. In contrast, only 12% of genes that are up-regulated in can mutants have Can peaks within 1 kb of their TSSs. This difference is statistically significant as assessed by Chi-square test (P = 1.8 × 10−10). Interestingly, no such difference is observed between the genes in the next 1 kb bin (Fig. 2c). Together with the analysis in Fig. 2b, this simple test illustrates the idea that the activating function of Can is restricted to the immediate proximity of TSS of its cognate gene targets.
The same analysis applied to the Comr datasets revealed similar trend, albeit less pronounced (Chi-square test, P = 5.7 × 10−5, Fig. 2c). Somewhat surprisingly, Mip40 peaks were found to cluster around TSSs of genes that are either up- or down-regulated in mip40 mutant testes (Fig. 2c). The fact that many Mip40-enriched genes become activated in mip40 mutants suggests that it participates in both repressive (MMB/dREAM) and activating (tMAC) complexes.
These data allow us to determine the genes that are direct targets of the studied proteins. We strengthened the expression threshold to increase specificity: the gene was considered a direct target if it displayed at least eightfold down-regulation in the mutant and had a protein enrichment peak within 1 kb of TSS. In comr mutant testes, 1043 genes display greater than eightfold decrease in expression. Of these, only 232 genes have pronounced Comr peaks within 1 kb of TSS (Additional file 7: Table 2). Of 630 genes down-regulated in can mutants at least eightfold, only 151 genes have significant Can binding near TSS. For Mip40, we found 436 direct gene targets (Additional file 7: Table 2). The remaining genes that are affected by mutations were conditionally called indirect targets for the further analysis. It cannot be excluded that some direct target genes showing smaller expression changes or DamID values fell into the set of indirect targets. However, the whole genome analysis shows that chosen FDR-based threshold result in higher specificity of target definition (Additional file 8: Fig. 6).
Two of the three known gene targets of tTAF, don juan, and Mst87F  displayed pronounced Can peaks in the promoter regions (Additional file 9: Fig. 7). Thus, our data are in line with the reports [2, 10] that dj and Mst87F are directly controlled by tTAFs. Notably, though, our data imply that Comr controls these genes indirectly.
The sets of direct and indirect gene targets were very different from each other in many ways (Fig. 2D, Additional file 7: Table 2). Fifty-six percent of directly regulated Can target genes also had a Comr binding peak next to the TSS (which is 7 times over the value expected by chance, Chi-square test, P = 3.2 × 10−107). In the case of indirectly controlled gene targets, this number was only 1.44-fold above the expected value (Chi-square test, P = 0.004). Similarly, 78% of direct Can targets had Mip40 peaks near the TSS, which is 2.64 more frequent than expected (Chi-square test, P = 9.4 × 10−17). This is unlike the situation with indirect Can targets that appeared to associate with Mip40 at a nearly background frequency (Chi-square test, P = 0.11; Fig. 2d). The same overall trend was observed for Comr and Mip40 targets (Fig. 2d). This implies that more genes could be attributed to direct targets of Can, Comr, and Mip40 if milder selection criteria were applied; however, we proceeded with the gene sets described above, because they are the most prominent targets of the factors under investigation.
To summarize, Comr, Can, and Mip40 appear to directly control hundreds of genes that become activated in spermatocytes. The gene lists for direct targets display partial overlap (Additional file 10: Fig. 8). The genes that were likely to be indirect target revealed only a modest association with the Comr, Can, and Mip40 at nearly background frequencies suggesting that their activation is controlled by alternative mechanisms.
Mutual regulation of meiosis arrest genes
In order to comprehensively analyze the mechanisms of gene activation in a complex system such as D. melanogaster spermatogenesis, possible cross-regulation of genes encoding tMAC and tTAFs subunits must be taken into account. Our current data (Additional file 11: Table 3) are in agreement with the previous report that Comr does not affect the activity of meiosis arrest genes , which led to the conclusion that tMAC is unlikely to regulate the components of tTAF.
To check this, we looked at RNA-seq data in can and comr mutant testes. It must be noted that testes of meiosis arrest mutants accumulate spermatocytes that fail to enter downstream spermatogenesis stages. This means that some spermatocyte-specific genes may erroneously appear overexpressed in the mutant testes when matched against wild-type controls. However, we can adequately compare the expression of spermatocyte-specific genes in can and comr mutant testes, as they are composed of very similar cell types. Expression of topi, achi, and vis genes in can mutant testes was significantly reduced (multiple testing corrected P < 0.003 in each case) compared to comr mutant background (Fig. 3b), as estimated using Cuffdiff package  (see “Methods”). In order to independently verify this observation, we turned to the microarray data published previously . As appeared, comr (tMAC) mutants indeed had at least tenfold higher expression of topi, as compared to can mutant animals. Notably, even in the absence of Can function, topi is only partially silenced compared with bam mutant spermatogonia (Fig. 3b). Thus, full expression of topi requires tTAFs activity, whereas can mutation significantly, yet incompletely, suppresses topi expression. Our data demonstrate that expression of three meiosis arrest genes depends on Can protein, suggesting that tTAFs participate in their regulation and may affect the expression of their targets.
Activity of genes encoding TBP-like proteins in spermatogenesis
During gene activation, TAFs interact with TBP (TATA-binding protein) to form TFIID complex . Similarly, tTAFs have been hypothesized to form an analogous complex, wherein the TBP-like molecule still remains to be identified [3, 9]. There are 5 genes encoding TBP and TBP-like molecules in D. melanogaster: Tbp, Trf, Trf2, CG9879, and CG15398. We analyzed the expression of these genes in the can, comr, and mip40 mutant testes.
The profile of CG9879 binding indicates that this protein tends to associate with 5’UTR and promoter regions of the genes that are specifically activated in testis (binomial test, P < 0.001, Fig. 4b, Additional file 13: Fig. 10). Using DREME platform, we found that CG9879-bound regions frequently contained AT-rich motifs resembling the TATA-box sequence (Additional file 14: Fig. 11) . In general, CG9879 tends to co-localize with both Comr and Can (Additional file 1: Fig. 1). Furthermore, 43% of direct Can gene targets had a CG9879 peak within 1 kb around TSS (8.4-fold above expected, Chi-square test, P = 5.2 × 10−14). Direct gene targets of Comr and Mip40 had peaks of CG9879 near TSSs in 25% (fivefold enrichment, Chi-square test, P = 2.0 × 10−44) and 18% (3.6-fold enrichment, Chi-square test, P = 1.9 × 10−9) cases, respectively (Fig. 4c). Notably, genes that we considered to be indirectly regulated by Comr, Can, and Mip40 were not enriched with CG9879 peaks (Fig. 4c, Additional file 7: Table 2).
In order to understand how CG9879 affects gene expression in fly testes, we knocked out CG9879 using CRISPR/Cas9 (see “Methods”). Surprisingly, no morphological defects were apparent, and the males remained fully fertile. Furthermore, analysis of gene expression in testes of CG9879 mutants showed that only 28 genes had significantly reduced expression levels (Additional file 15: Table 4), but none of them was associated with CG9879. Taking into account our data on specific binding of CG9879 to direct tTAFs and tMAC targets, this lack of phenotype and expression changes is likely attributable to the redundancy of CG9879 in the presence of other TBPs (Trf, Trf2) that may completely substitute its function.
tMAC is required for Mip40 recruitment to the promoters of testis-specific genes
One intriguing feature of spermatocyte-specific gene activation program is participation of Mip40 (Fig. 1b). Mip40 protein was identified as the subunit of MMB/dREAM complex that is present in various cell types [12, 13, 14, 25, 26, 27]. Mip40 is also an essential component of tMAC .
Given an extensive similarity between the components of tMAC and MMB/dREAM complexes (Fig. 1b), it is possible that in spermatogonia MMB/dREAM complex is bound to the spermatocyte-specific genes thereby keeping them silent. Upon spermatocyte differentiation, the components of tMAC could replace homologous proteins in the MMB/dREAM and turn it into a transcriptional activator. On the other hand, tMAC could recruit the components of MMB/dREAM to the spermatocyte-specific genes resulting in tMAC-dependent recruitment of Mip40 following spermatocyte differentiation.
In order to analyze these effects in relation to gene regulation, we focused on the transcripts having Mip40 peaks within ± 300 bp of the TSSs in each genotype. Overall, there were half as many Mip40-occupied TSSs in aly mutants (1773 genes) compared to bam mutants (3499 genes) (Fig. 5b). In contrast, in can mutant and wild-type testes, the numbers of Mip40-positive TSSs increased (2950 and 3819 genes, respectively). To reveal the main trends in Mip40 profile dynamics, we performed clustering of these genes depending on how they associate with Mip40 during spermatogenesis and six major gene groups were formed (Fig. 5b). These six groups are reproducible across different significance levels chosen for Mip40 peak calling (Additional file 17: Fig. 13).
Since Mip40 is shared by MMB/dREAM and tMAC, its DamID profile likely represents a superposition of two profiles. To distinguish between tMAC and MMB/dREAM localization, we generated an additional DamID profile of specific subunit of MMB/dREAM Mip130, which is homologous to Aly protein but does not participate in tMAC (Fig. 1b), and compared it with the profiles of Mip40 as well as Comr and Can. Mip130 proved to co-localize with Mip40 at numerous genomic locations (Fig. 5c). Characteristically, these have virtually no overlap with the sites of Comr and likely represent the MMB/dREAM localization (Additional file 18: Fig. 14). On the other hand, the sites of Mip40 that coincide with Comr do not typically contain Mip130, thus reflecting tMAC position (Fig. 5c). Accordingly, Mip130 revealed differential representation in 6 gene groups that reflect the main trends of Mip40 redistribution (Fig. 5d), allowing to discriminate Mip40 as a part of tMAC or MMB/dREAM. A highly specific enrichment with Mip130 was observed in groups I, II, V, and VI in comparison with the genome-wide overall distribution (Chi-square test, P < 10−154, Fig. 5d): cooperative signal of Mip40 and Mip130 in these groups indicates the MMB/dREAM binding. The groups III and IV demonstrated no prevalent Mip130 presence suggesting that Mip40 signal in these groups is due to tMAC formation (Fig. 5d).
We used 2252 shared peaks of Mip40 and Mip130 to characterize sequence motifs in MMB/dREAM sites (Additional file 19: Fig. 15). The best motif identified in this search manifested high similarity with the motif for BEAF-32 protein, which is known to interact with CP190 protein at the insulator sites . In turn, CP190 was found to interact with MMB/dREAM complex . Thus, the presence of BEAF-32 motif in the Mip40 and Mip130 binding sites could reflect the similar involvement of MMB/dREAM in regulation of promoter-enhancer regulation in germline.
To check whether the observed dynamics of Mip40 profile is specific for genes involved in spermatogenesis, we turned to the set of 1389 testis-specifically expressed genes (see above). As a control, we generated a list of 707 ovary-specifically expressed transcripts (Additional file 5: Table 1) selected with the same criteria from the FlyAtlas database (Methods). In the groups of genes that display Mip40 binding at the spermatogonial stage (groups I, II V, VI), testis-specific genes were underrepresented, whereas the fraction of ovary-specific genes was above the expected value (Fig. 5e). Among the genes whose TSSs acquire Mip40 binding in spermatocytes and onwards (groups III and IV) testis-specific genes were highly overrepresented (Fig. 5e). Thus, upon spermatocytes differentiation, Mip40 relocates to the promoters of testis-specifically expressed genes in tMAC-dependent manner.
We showed that mip40 mutation results in down-regulation of 1580 transcripts (at least fourfold in mip40 mutant testes vs. wild-type controls) but also in fourfold up-regulation of 208 transcripts (Fig. 2c). These effects are probably caused by participation of Mip40 in two types of complexes, one of which would cause gene repression (like MMB/dREAM), while the other being an activator (tMAC). To investigate the repressive effects of Mip40, we checked how this protein is associated with those two gene sets throughout the first stages of spermatogenesis: in spermatogonia of bam mutants and in spermatocytes of aly and can mutants. Therefore, we analyzed Mip40 binding within 10 kb of the TSSs of transcripts from these two sets. In spermatogonia, 60% of transcripts that are up-regulated in mip40 testes had a Mip40 peak within 1 kb of the TSS (Fig. 5f). These transcripts are normally repressed, and Mip40 is associated with them already in spermatogonia. In contrast, only 30% of the fourfold down-regulated genes contained Mip40 near their TSS, which corresponds to the random expectation and is significantly less than the portion of up-regulated genes (Chi-square test, P = 5.5 × 10−13, Fig. 5f). Similar yet less pronounced situation was observed in spermatocytes of aly and can mutants (Chi-square test, P = 4.8 × 10−5 and P = 1.2 × 10−6, respectively, Fig. 5f). These data indicate that in spermatogonia and in early spermatocytes of aly and can mutants Mip40 directly binds to a large portion of genes that should be down-regulated in spermatogenesis.
To check this further, we generated the fly strain bearing both mip40 and bam mutations. This strain allowed us to estimate the effect of Mip40 on gene expression selectively in the spermatogonia. Analysis of expression in mip40; bam double mutant testes revealed that the genes that are up-regulated in this genotype relative to bam mutants tend to bind Mip40 in spermatogonia. This means that the presence of Mip40 at their promoters correlates with their repression (Additional file 20: Fig. 16). Notably, later in development, neither Can nor Comr showed significant association with the same gene sets indicating that tMAC and tTAFs play no role in their regulation (Additional file 20: Fig. 16).
Figure 5b indicates that tMAC and tTAFs affect Mip40 binding in distinct gene groups. In order to estimate how this is related to gene regulation in the six major groups shown in Fig. 5b, we turned to our differential gene expression datasets for bam, comr, can, and mip40 mutant testes. In each mutant background, the transcripts that showed at least fourfold up- or down-regulation relative to the wild-type control were retained. Next, we calculated the ratio of repressed to activated transcripts in the groups I–VI (Fig. 5g).
In the group I transcripts (bound by Mip40 throughout all stages and genetic backgrounds), a strong enrichment for transcripts up-regulated in mip40 mutants was observed (in comparison with expected value), and so Mip40 likely acts as a repressor for such genes (Fig. 5g). This effect was specific to mip40 mutants, as it was not observed in can and comr mutants, which in turn indicates that tMAC and tTAFs do not significantly affect the regulation of group I transcripts. The transcripts from groups II, V, and VI (whose TSSs show Mip40 binding in spermatogonia) likewise show enrichment for genes up-regulated in mip40 mutants. However, unlike in group I, these genes were also up-regulated in can and comr mutant backgrounds (Fig. 5g). Nonetheless, only a handful of TSSs from these groups are directly bound by Comr or Can (data not shown), and so tMAC and tTAFs are inferred to have indirect effects on expression of these genes. One could suggest that the genes from the groups II, V, and VI that are repressed by Mip40 in spermatogonia are activated upon spermatocyte differentiation independently from tMAC and tTAF. In this case, their up-regulation in can and comr mutants would be explained by spermatocyte accumulation.
In contrast, the transcripts, whose TSSs for the first time recruit Mip40 in spermatocytes (groups III and IV), tend to show reduced expression in mip40 mutants, which argues for the activating role of Mip40 for group III and IV genes (Fig. 5g). Notably, these genes also tend to be Comr targets: 64% transcripts co-bound by Mip40 and Comr in wild-type belong to the groups III and IV. In other words, such transcripts appear to be directly activated by Mip40 and Comr in the context of tMAC (Fig. 5h).
Thus, our data indicate that in spermatogonia Mip40 plays a repressive role. Following spermatocyte differentiation, relocalization of Mip40 occurs, and tMAC but not tTAFs components are required for this relocalization. Establishing the final Mip40 distribution pattern is only possible when both complexes are available. The Mip40 redistribution to the promoters of testis-specific genes is indispensable for their proper activation.
The present work aims at extending our knowledge of the mechanisms of massive gene activation controlled by tMAC and tTAFs complexes in Drosophila spermatocytes. We performed comprehensive genome-wide analyses that uncovered new trends in this process.
DamID data criticism
Before proceeding to the discussion of the intricate biological effects observed, it is important to address the question of whether DamID system accurately represents the dynamic events of transcription factor binding in fly testes. Indeed, in our DamID experiments the removal of transcription terminator stuffer otherwise blocking transcription of Dam-fusion protein is mediated by CRE that is produced early in the stem cells of the germline [16, 17]. Hence, Dam-mediated methylation of DNA may occur at any of the subsequent developmental stages—in spermatogonia, spermatocytes, and spermatids—that all can contribute to the ultimate binding profile. Accordingly, changes in the ratios of cell types between the genotypes may be a confounding factor. On the other hand, in wild-type testes, as well as in meiotic arrest mutants, the fraction of spermatogonial cells among all cell types of the testis is very small and should have little if any influence on the profiles obtained.
Our data may help to address this concern. For example, Mip40 protein was mapped in bam mutant testes at TSSs of nearly 3500 genes (Fig. 5b). Should the contribution of spermatogonial cells into Mip40 binding profiles in aly, can, and wild-type backgrounds be significant, Mip40 peaks observed in spermatogonia should also be present in such samples, likely having reduced magnitude. This was not the case, as in aly mutants roughly half the peaks disappear from the promoter regions, whereas the other half of the peaks remains unchanged (Fig. 5b). Moreover in can mutants and in wild-type testes, many more Mip40 peaks appear and these map to the Mip40-negative genomic loci in spermatogonial cells (Fig. 5a, b). This acquisition of novel Mip40 sites is consistent with continued DamID activity in spermatocytes. Thus, the approach used in our study can be applied for chromatin profiling in spermatogenesis and the data obtained faithfully reproduce protein binding dynamics in the dominant cell populations in each of the genotypes tested.
Activation of spermatocyte-specifically expressed genes
The process of gene activation now appears to be somewhat different from earlier models. First, only fraction of spermatocyte-specific genes undergoes direct tTAFs- or tMAC-mediated activation. Second, regulatory cascades downstream of tMAC and tTAFs may involve other transcription factors, including those that are not particularly testis-specific. For instance, there are many transcription factors, such as invected, apontic, fushi tarazu, gooseberry-neuro, whose expression pattern is detected in, but not restricted to, testes . Thus, the role of tMAC and tTAFs may be to launch the testis-specific gene program that unfolds via other regulators and transcription factors that ultimately results in appropriate gene activation.
It is interesting to note that tTAFs actually control expression of several meiosis arrest genes, topi, achi, and vis. Achi and Vis proteins are absent from the canonical tMAC complex, yet they were found in the context of a distinct complex encompassing Aly and Comr [4, 5]. In can mutant background, topi, achi, and vis undergo only partial down-regulation, and so this may explain why can mutation has a weaker phenotype compared to that of topi/achi/vis knock-outs, although this may also be interpreted the other way around, namely that reduced expression of these genes is partially responsible for the can phenotype.
It is highly probable that tTAFs forms a transcription factor paralogous to TFIID [3, 9, 10]; however, TBP protein that forms the core of tTAFs complex was not identified. An attractive hypothesis that the spermatocyte-specifically expressed TBP-like protein CG9879 may play the central role in tTAFs function was rejected in our study. Indeed, knock-down of CG9879 gene led to very subtle changes in gene expression and did not appreciably affect spermatogenesis. Nevertheless, CG9879 tends to co-localize with tTAFs subunit Cannonball implying that CG9879 participates in tTAF, but its absence may be compensated by other TBP-like proteins expressed in spermatocytes. Such redundancy may help to maintain the stability of this important genetic system.
Dual role of Mip40
Since the description of tMAC, one of the most intriguing facets of this complex was the homology of its subunits to those of MMB/dREAM complex. tMAC and MMB/dREAM complexes are not merely paralogous, and they share common subunits, Mip40 and CAF1-55. Notably, tMAC is clearly involved in gene activation [3, 17, 30], whereas MMB/dREAM predominantly has repressive activity [12, 13, 14], although several examples showing its activating effects have also been reported [25, 26, 27].
Our data indicate that at these early differentiation stages, Mip40 does not tend to associate with TSSs of genes that will later become activated in spermatocytes. This observation is in obvious conflict with the idea that MMB/dREAM orchestrates the repression of spermatocyte-specific genes in undifferentiated cells. Moreover, Mip40-bound genes in spermatogonia are those whose expression is predominantly detected in ovaries, and Mip40 binding in the context of MMB/dREAM complex has inhibitory activity. Whether this mechanism is related to the recently discovered pathway that maintains the silencing of somatically expressed genes  remains to be discovered.
Following spermatocyte differentiation, Mip40 binding pattern changes substantially, and novel Mip40 peaks appear that are clearly tMAC dependent. These data indicate that MMB/dREAM does not contribute to inactivation of spermatocyte-specific genes. Instead, in spermatocytes, tMAC recruits Mip40 to novel binding sites and this redistribution takes place outside the context of MMB/dREAM complex.
In wild-type testes, redistribution of Mip40 is much more pronounced. This points to the possible involvement of tTAFs. Alternatively, in early spermatocytes of can mutants we may actually observe very first steps of Mip40 redistribution, whereas more differentiated cell types are present in wild-type testes and so they may contribute to the final binding pattern. A test to discriminate between the two possibilities is to perform DamID in thoc5 mutants, as this mutation does not interfere with tMAC or tTAFs activity, yet it causes meiotic arrest .
Based on our major findings, we propose an amended picture of transcription-related events during Drosophila spermatogenesis. The mechanism controlling the inactivity of the vast majority of spermatocyte-specific genes is presently unknown: a decisive role for either the Pc  or MMB/dREAM complexes now seems unlikely. tMAC and tTAFs associate with their cognate gene targets and induce their activation. Surprisingly, of all the testis-specific genes, the fraction of high confidence direct gene targets of tMAC and tTAFs is relatively modest. Activation of indirectly controlled gene targets likely proceeds with the help of other transcription factors. Involvement of tTAFs in regulation of three meiosis arrest genes should be taken into account as an additional regulatory mechanism. There is a major redistribution of Mip40 in spermatocytes. This process is tMAC dependent and leads to the relocation of Mip40 to promoters of spermatocyte-specific genes leading to their activation.
All genetic constructs for DamID experiments were based on the hsp70 > loxP-Stop-loxP > Dam (JN993988) vector encompassing a loxP-flanked stop-cassette placed between the hsp70 minimal promoter and the Dam CDS, fused in frame as an N-terminal fusion to the protein of interest . The Dam-Comr (KC845569) construct has been reported earlier . Dam-Can (KY939771), Dam-Mip40 (KY939772), Dam-Mip130 (MG557560), and Dam-CG9879 (KY930504) constructs were generated in this work.
Fly stocks and crosses
To obtain fly stocks needed for DamID experiments, attP40 genomic landing site on chromosome 2 was used (Dam-Comr, Dam-Mip40, Dam-Can, Dam-CG9879, and Dam-alone). To activate the DamID system specifically in the male germline, nanos-cre (attP40) males [16, 17] were crossed to DamID-construct bearing females. In the progeny of these crosses, removal of the stop-cassette occurs only in the germline cells, but not in the somatic cells of the testis. Dam-alone flies were used as a control for DamID experiments.
To perform DamID in animals displaying compromised spermatogonia-to-spermatocyte differentiation (bam-delta86), tMAC activity (aly 5 ) or tTAFs activity (can 1 ), flystocks having said mutations balanced against TM6 and homozygous for Dam-Mip40 (attP40), Dam-alone (attP40) or nanos-cre (attP40) constructs were established by standard genetic crosses. When DamID; mut/TM6 females were crossed to nanos-cre; mut/TM6 females, their sons lacking TM6-linked dominant markers and therefore homozygous mutant were selected. Such males displayed the expected phenotypes: accumulation of spermatogonia (bam) or spermatocyte meiotic arrest (aly and can). Comr profiling at the mip40 EY16520 background was performed using the same experimental design.
For DamID experiments, testes were collected from 3-day-old males. For each biological replicate, 50 pairs of testes were used; each experiment was performed in two biological replicates. Standard phenol-chloroform extraction method was used to isolate genomic DNA from the collected material. 0.5–1 μg DNA was used in each DamID experiment. Overall, DamID was performed according to the protocols published previously [33, 34] with modifications . Specifically, the last amplification step was done using regular Taq-polymerase. Following amplification, the PCR products were treated with DpnII to remove adapter sequences. Next, library preparation followed the TruSeq protocol (Illumina) omitting the additional fragmentation step. Importantly, this helps retain the information on the sequences that must be found at the amplified DNA termini, as they must begin with GATC. This information is used for downstream data filtering and removal of non-specific reads as previously described . Further analysis, including profiles generation and peak calling, was performed exactly as previously described . In all cases, FDR cutoff was required to be 0.05 at most. FDR estimation was performed at different significance levels to assess the impact of experimental noise measured by comparison of biological replicates. Additional file 21: Fig. 17 exemplifies the outcome of this procedure on Can DamID-seq data. Additional file 22: Fig. 18 illustrates the benefits of this approach as compared to traditional DamID data presentation as log2(Dam-X/Dam) on the same dataset.
Gene expression analysis
For gene expression analysis, we used 50 adult testes from 3-day-old wild-type males (y 1 ,w 67 strain) or homozygous mutants for bam delta86 , aly 5 , can 1 , mip40 EY16520 or CG9879 (obtained in this study). Each experiment was run in duplicate. Total RNA was isolated from testes, using TRIZol (Invitrogen) reagent, according to the manufacturer’s instructions. One microgram of total RNA was then processed for library preparation using the RNA TruSeq kit. The libraries were sequenced on the Illumina MiSeq system (paired reads, 75 bp). Data were analyzed using Galaxy tools: reads were aligned on D. melanogaster BDGP R5/dm3 genome assembly (https://genome-euro.ucsc.edu/) using TopHat (− r 200 − mate-std-dev 50) . Transcript differential expression testing between samples was performed with Cuffdiff using geometric normalization, pooled dispersion estimation, and FDR = 0.05 .
Testis-specific and ovary-specific transcripts
To determine the list of testis- and ovary-specific transcripts, we used FlyAtlas Database . We used following criteria to assign transcript as a testis-specific (or ovary-specific)—it should be up-regulated in testis (or ovary) (log2(Testis(Ovary)/FlyMean) > 0) and down-regulated (log2(Tissue/FlyMean) < 0) or demonstrate null expression in all other tissues of adult fly. This approach allowed us to generate the list of 1389 testis-specific and 707 ovary-specific transcripts.
CRISPR/Cas9 genome editing
To generate full-size deletion of CG9879 gene coding sequence, we used transgenic line MI04214 from MiMIC transposon insertion collection (Bloomington Drosophila Stock Center, ). This stock contains insertion of MiMIC transposon carrying a marker gene (y+) in approximately 600 bp from 5′ end of CG9879. MI04214 flies were crossed to the flies bearing Cas9 nuclease gene (#51326, Bloomington Drosophila Stock Center) (Additional file 23: Fig. 19). Oligonucleotides that target the genomic region that contains MiMIC insert and CDS of CG9879 were designed with CRISPR optimal finder and CRISPRdirect tools [37, 38]. Each oligonucleotide pair (L1: 5′-cttcgacgatggtgacaggtgtct-3′, L2: 5′-aaacagacacctgtcaccatcgtc-3′, R1: 5′-cttcgtgccagtggttggcccgag-3′, R2: 5′-aaacctcgggccaaccactggcac-3′) were annealed on each other and inserted into pU6-BbsI-chiRNA vector (Addgene, #45946, ). Plasmids encoding a chiRNA targeting the genomic region of interest were co-injected into preblastoderm embryos obtained from the crosses mentioned above. Upon eclosion, flies were crossed to y, w flies, and progeny of these crosses were inspected for loss of yellow dominant marker that expected to occur in the case of successful deletion of MiMIC insert and coding region of CG9879. As a result, we obtained flies, bearing required deletion (Additional file 23: Fig. 19). Complete deletion of CG9879 coding region was verified with PCR and Sanger sequencing (Additional file 23: Fig. 19).
PPL performed most of experiments, analyzed results, and wrote the paper, DAM performed bioinformatics analysis of the whole genome experiments, SER contributed to the part concerning CG9879 mapping and deletion, PAA performed Mip130 DamID, OVP performed flywork and genetic crosses, HWC performed RNA in situ hybridizations of whole testes and wrote the paper, DEK performed genetic crosses and sample collections, SNB planned experiments, contributed to the data analysis, and wrote the paper. All authors read and approved the final manuscript.
Authors are grateful to Andrey Gortchakov for critical reading of the manuscript. We thank the IMCB SB RAS collection of Drosophila lines (Project No. 0310-2016-0001) for providing wild-type and mutant stocks that were used in this work. The authors gratefully acknowledge the resources provided by the “Molecular and Cellular Biology” core facility of the IMCB SB RAS.
The authors declare that they have no competing interests.
Availability of data and materials
All data are available from Gene Expression Omnibus (GSE97182).
Consent for publication
Ethics approval and consent to participate
This work was supported by a Grant from the Russian Science Foundation (14-14–00641), Russian Fundamental Scientific Research Project (0310-2018-0009), and Russian Foundation for Basic Research Grants (17-00-00181, 14-04-32102, 13-04-01731, 13-04-40087, and 12-04-01007). SNB was supported by the grant from the Ministry of Education and Science of Russian Federation #14.Y26.31.0024.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 22.Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5.CrossRefPubMedPubMedCentralGoogle Scholar
- 27.Georlette D, Ahn S, MacAlpine DM, Cheung E, Lewis PW, Beall EL, Bell SP, Speed T, Manak JR, Botchan MR. Genomic profiling and expression studies reveal both positive and negative activities for the Drosophila Myb MuvB/dREAM complex in proliferating cells. Genes Dev. 2007;21(22):2880–96.CrossRefPubMedPubMedCentralGoogle Scholar
- 28.Vogelmann J, Le Gall A, Dejardin S, Allemand F, Gamot A, Labesse G, Cuvier O, Negre N, Cohen-Gonsaud M, Margeat E, et al. Chromatin insulator factors involved in long-range DNA interactions and their role in the folding of the Drosophila genome. PLoS Genet. 2014;10(8):e1004544.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.