Transient reduction of DNA methylation at the onset of meiosis in male mice
Meiosis is a specialized germ cell cycle that generates haploid gametes. In the initial stage of meiosis, meiotic prophase I (MPI), homologous chromosomes pair and recombine. Extensive changes in chromatin in MPI raise an important question concerning the contribution of epigenetic mechanisms such as DNA methylation to meiosis. Interestingly, previous studies concluded that in male mice, genome-wide DNA methylation patters are set in place prior to meiosis and remain constant subsequently. However, no prior studies examined DNA methylation during MPI in a systematic manner necessitating its further investigation.
In this study, we used genome-wide bisulfite sequencing to determine DNA methylation of adult mouse spermatocytes at all MPI substages, spermatogonia and haploid sperm. This analysis uncovered transient reduction of DNA methylation (TRDM) of spermatocyte genomes. The genome-wide scope of TRDM, its onset in the meiotic S phase and presence of hemimethylated DNA in MPI are all consistent with a DNA replication-dependent DNA demethylation. Following DNA replication, spermatocytes regain DNA methylation gradually but unevenly, suggesting that key MPI events occur in the context of hemimethylated genome. TRDM also uncovers the prior deficit of DNA methylation of LINE-1 retrotransposons in spermatogonia resulting in their full demethylation during TRDM and likely contributing to the observed mRNA and protein expression of some LINE-1 elements in early MPI.
Our results suggest that contrary to the prevailing view, chromosomes exhibit dynamic changes in DNA methylation in MPI. We propose that TRDM facilitates meiotic prophase processes and gamete quality control.
KeywordsDNA methylation Hemimethylation Spermatogenesis LINE-1 Retrotransposon Mouse Meiosis Meiotic prophase
Meiosis is a specialized cell division program that produces haploid gametes. To achieve haploidy, a diploid germ cell replicates its DNA once and divides twice. Following the final round of DNA replication (meiotic S phase), chromosomes pair and recombine in meiotic prophase I (MPI) . Meiosis is a highly protracted cell cycle due to meiotic S phase being much longer than mitotic S phase in the same organism [2, 3] and MPI itself lasting about 2 weeks , during which time the chromosomes undergo dramatic changes in organization. Based on these changes, MPI is subdivided into leptotene (L), zygotene (Z), pachytene (P) and diplotene (D) substages, which are immediately preceded by meiotic S phase also known as the preleptotene (PL) stage (Additional file 1: Fig. S1). These descriptive names of MPI substages serve as common reference points in the studies of meiosis across species since they associate with specific molecular processes such as double-stranded break (DSB) formation in L, the onset of homolog synapsis in Z, completion of synapsis and meiotic recombination in P and the onset of homolog decondensation and desynapsis in D. Upon the completion of MPI, homologous chromosomes segregate in the first meiotic division (Meiosis I), while sister chromatids separate in Meiosis II.
Extensive changes in meiotic chromosome configurations in MPI raise an important question concerning the contribution of epigenetic mechanisms to meiosis. Indeed, prior studies have implicated histone modifications in chromosome condensation, synapsis and meiotic recombination in plants, fungi and animals [5, 6, 7, 8, 9]. Disruption of DNA methylation also interferes with MPI processes in a wide range of species capable of this modification including mammals [10, 11, 12, 13, 14, 15]. Here, we focused on genome-wide DNA methylation of male meiotic germ cells of mice. Studies over the past decade revealed important roles of DNA methylation, repressive histone modifications and small Piwi-interacting RNAs (piRNAs) in LINE-1 (L1) control in male germ cells [13, 16, 17, 18]. Intriguingly, despite these defensive mechanisms, early meiotic male germ cells exhibit L1 ORF1p expression albeit at levels significantly lower than those observed in mutants deficient in transposon defense [18, 19, 20, 21, 22]. To explain this observation, we have previously posited a transient change in DNA methylation at the onset of meiosis . Critically, unlike the detailed knowledge of genome-wide DNA methylation in mouse postnatal spermatogonia [24, 25] and the evidence that bulk DNA methylation precedes meiosis , the precise dynamics of DNA methylation during MPI remain unknown. To a large extent, this gap in understanding of epigenetic makeup of meiotic chromosomes was due to inaccessibility of cell populations from all MPI substages. To overcome this limitation, we first have optimized the method for the purification of adult mouse male germ cells from all substages of MPI [27, 28, 29]. In this study, using this method, we obtained high-quality germ cell samples that allowed us to discover genome-wide transient reduction of DNA methylation (TRDM) during MPI, a previously unrecognized epigenetic feature of meiotic chromosomes in male mice.
Genome-wide DNA methylation levels in meiotic prophase I
To characterize the dynamics of DNA methylation across MPI, we used an optimized flow cytometry cell sorting method to obtain two biological replicates of spermatogonial (Spg), PL, L, Z, P, D spermatocytes and epididymal spermatozoa (Spz) [27, 28, 29] (Additional file 2: Fig. S2). The purity of MPI cell fractions was verified by staining for meiosis-specific (SYCP3, γH2AX) and spermatogonia-enriched (DMRT1, DMRT6) markers as described previously [28, 29]. Critically, all germ cell fractions were devoid of somatic cells (Methods) and gene expression profiling of a wide panel of soma-enriched and germ cell-specific genes by RNA-seq confirmed the purity and stage specificity of our samples (Additional file 3: Fig. S3). Using these samples, we performed whole-genome bisulfite DNA sequencing (WGBS) for genome-wide analysis of DNA methylation at single CpG resolution (Additional file 4: Table S1). Over 90% of reads aligned to the mouse genome and exhibited high efficiency of bisulfite conversion (Additional file 4: Table S1 and Additional file 5: Table S2). Each biological replicate accounted for 87–94% of genomic CpGs with 3 to 6 × average CpG coverage per individual sample after read de-duplication and processing (Additional file 6: Table S3). Pairs of biological replicates exhibited high inter-individual Pearson correlation indicating excellent reproducibility of our data (Additional file 7: Table S4). Since cytosine methylation levels at non-CpG sites were negligible (0.3–0.4%), we excluded them from later analyses.
To examine the chromosome-wide distribution of DNA methylation in individual MPI substages, we summarized DNA methylation levels over a distance of 100-kb-wide non-overlapping windows spanning the length of each chromosome. We found that global hypomethylation in PL is chromosome-wide (Fig. 1b). This was true for all autosomes examined in both biological replicates (Additional file 9: Fig. S4, Additional file 10: Fig. S5). Interestingly, although the X chromosome also exhibits TRDM, it tended to be less methylated in all MPI substages (Additional file 9: Fig. S4, Additional file 10: Fig. S5, Additional file 11: Fig. S6). X chromosome DNA methylation levels in Spg-to-PL and PL-to-L transitions are distinctly less correlated than in the autosomes, further suggesting differences in the dynamics of its demethylation and remethylation (Additional file 11: Fig. S6). Nonetheless, these results showed that TRDM holds true for all chromosomes and that remethylation in MPI appears as a gradual chromosome-wide process.
To determine whether DNA hypomethylation in PL is specific to a particular genomic feature, we examined DNA methylation dynamics of exons, introns, intergenic and repetitive regions, as well as functionally specialized sequences such as promoters and CpG islands (CGIs) (Fig. 1c, Additional file 12: Fig. S7A, Additional file 13: Table S6). This analysis showed that all genomic features were highly methylated in Spg and then demethylated in PL (most prominently at introns, repeats and intergenic regions), except for CGIs whose methylation levels are already very low. Likewise, the analysis revealed comparable DNA methylation dynamics of repetitive DNA with major classes of TEs, namely the LINEs, SINEs, LTRs and DNA transposons (Additional file 12: Fig. S7B). Finally, we asked whether differentially methylated regions (DMRs) of imprinted genes also become hypomethylated in PL. The analysis of a subset of imprinted DMRs  showed that DNA methylation levels of paternal imprinted DMRs follow the same dynamic observed for other genomic features while maternal DMRs remained unmethylated as expected (Additional file 12: Fig. 7C). Cumulatively, these results show that TRDM is indeed a genome-wide event that encompasses all chromosomes and all genomic features.
Dynamics of DNA methylation in the course of MPI
Although global levels of DNA methylation in Z were higher relative to the preceding substages, Z is still hypomethylated relative to P (Fig. 2a). Accordingly, our DMR analysis showed that during the Z-to-P transition there is an increase in methylation at ~ 57% of analyzed CpGs from 81% in Z to 88% P (Additional file 15: Table S8A, Fig. 2b). Therefore, while the bulk of remethylation occurs by Z, remethylation that reaches premeiotic or almost Spz-like levels occurs between Z and P. Indeed, the original Spg-to-PL DMRs explain most (~ 75%) of all DMRs observed between Z and P (Methods). We find that gradual remethylation concerns all genomic features examined (exons, introns coding sequences and repeats) (Additional file 15: Table S8B). In Z, up to 60% of these features are still hypomethylated compared to P, although mean DNA methylation difference is relatively small (Fig. 2d, Additional file 15: Table S8A). In P, less than two percent of these features are found in hypomethylated DMRs relative to D, due to remethylation.
Interestingly, while P and D share very similar DNA methylation profiles overall (Fig. 1a), we observed the emergence of hypomethylated P-to-D DMRs that involve 8% of all examined CpGs in common and are of a relatively small mean genomic size (~ 9 kb, as compared to ~ 35 kb in PL) (Fig. 2a, b, d, Additional file 15: Table S8A). Considering that Spg-to-PL DMRs account for only 50% of all P-to-D DMRs (Additional file 15: Table S8A), it is likely that the hypomethylation observed in late MPI is unrelated to TRDM.
Evidence of DNA replication-dependent DNA demethylation in TRDM
Given the above dynamics of DNA methylation in the PL-to-L transition, we considered a role for DNA replication and replication timing domains in this phenomenon. Replication domains are large-scale genomic territories that replicate at particular times during S phase [32, 33]. Global early or late replication timing profiles appear relatively preserved between different cell lines and cell types tested, although there are tissue-specific differences [33, 34]. Remarkably, an overlay of the chromosome-wide DNA methylation pattern from our data with replication timing domains of a mouse B cell lymphoma CH12 cell line  revealed a strong overlap between the two (Fig. 3a). Specifically, in PL, we observe an overlap between large hypermethylated regions and late-replicating domains. Correspondingly, an overlap is observed in PL between large-scale hypomethylated regions and early-replicating domains. Interestingly, a switch between DNA hypo- and hypermethylation in PL is marked by an opposite switch in DNA methylation pattern in L (Fig. 3a). This switch in DNA methylation pattern in PL-to-L transition matched the transition from early to late replication timing domains (Fig. 3a). The overlap between DNA methylation pattern and replication timing pattern in PL was true of both biological replicates (e.g., Fig. 3a, Additional file 16: Fig. S8, Additional file 17: Fig. S9). To test the strength of the association of a switch in DNA methylation levels with late replication genome-wide, we determined their Pearson correlation coefficient in the course of MPI. This analysis showed an abrupt switch in the directionality of correlation from PL to L, supporting that late-replicating domains switch from high to low DNA methylation levels between the two MPI stages (Additional file 18: Fig. S10A).
To further explore a role of DNA replication in hypomethylation of the PL genome, we evaluated the uniformity of genome sequencing coverage in our WGBS data (Fig. 3b). Previously, DNA sequence coverage was used to estimate replication timing and to evaluate underreplication in Drosophila polytene chromosomes [36, 37]. We summarized read frequency over a distance of 5-kb non-overlapping windows spanning the length of the chromosome and corrected for the difference in total read count between the samples. Remarkably, we observe consistently lower sequencing coverage in the hypermethylated regions/late replication timing domains in PL, disappearing in L (Fig. 3b, Additional file 16: Fig. S8, Additional file 17: Fig. S9). The lower sequencing coverage in PL is consistent with DNA replication during this time, while recovery of sequence coverage in L agrees with the lack of replication in L, as no replication occurs then and during the rest of meiosis. Specifically, lower sequencing coverage of late replication timing domains in PL indicates that these regions have not yet completed replication (with lower sequencing coverage reflecting lower DNA content), while early replication timing domains have already replicated (hence exhibit higher sequencing coverage associated with higher DNA content). To confirm that PL spermatocytes used in our studies are replicative, we performed FACS enrichment of PL cells from mice injected with EdU 2 h prior to cell sorting. Subsequent EdU detection showed that > 70% of FACS-enriched PL cells were replicative, with the majority of EdU patterns corresponding to middle and late S phase (Additional file 19: Fig. S10B) .
Transposon expression during TRDM
To determine how these two bursts of L1 transcription relate to L1 protein expression, we performed immunofluorescence analysis using antibodies to the L1-encoded ORF1 protein, an acrosome-specific marker sp56 and double-strand break marker γH2AX [20, 42, 43]. This analysis established that L1ORF1p expression in MPI begins in L, persists until mid-P and extinguishes in late P (Additional file 22: Fig. S13). These results suggest that the initial, smaller wave of L1 mRNA in the early MPI is productive, while the second burst of L1 transcription at P-to-D transition does not lead to a corresponding increase in L1ORF1p levels. These L1 mRNA and protein expression dynamics fit well with the relatively low activity of the piRNA pathway in early MPI and its robust transcriptional activation in P [18, 44].
Expression analysis of DNA methylation machinery supports passive DNA methylation in TRDM
At the same time, genes for proteins implicated in active DNA demethylation exhibit low level of expression across MPI arguing against the leading role of this mechanism in TRDM (Fig. 6a). Indeed, previous genome-wide analysis of 5-hydroxymethylcytosine uncovered only minor contribution of this modification to the epigenetic landscape in MPI . However, in our data, DNA demethylation in at the P-to-D transition in narrow genomic windows is consistent with active DNA methylation and corroborates the previous study . Taken together, the above results strongly suggest that reduction of DNA methylation in TRDM occurs primarily by passive, DNA replication-coupled mechanism.
Gradual but uneven genome-wide DNA remethylation occurs over the period of 70-h spanning early MPI substages. Given the predominance of hemimethylated DNA in the meiotic genome, restoration of premeiotic levels of DNA methylation is likely accomplished by Dnmt1 whose mRNA expression (along with Uhrf1) gradually recovers in P and D (Fig. 6a). In addition, despite low mRNA expression levels of de novo methyltransferases, we cannot exclude a role for Dnmt3a2 in remethylation of MPI genomes (Fig. 6c). Cumulatively, the results support the idea of the leading role of passive DNA demethylation and DNMT1-mediated DNA remethylation in TRDM.
In this study, we systematically examined genome-wide DNA methylation across all MPI substages in adult male mice. This analysis provided the first evidence of a genome-wide, transient reduction of DNA methylation at the onset of meiosis. The central implication of this work is that critical MPI events (homology search, chromosome pairing, meiotic DSB formation and repair) occur in the context of hemimethylated genomic DNA.
With respect to the mechanism of DNA demethylation in TRDM, our data are most consistent with a passive, DNA replication-coupled mechanism. We base this conclusion on observations of (a) the initial drop of DNA methylation in PL cells going through meiotic S phase, (b) the genome-wide scope of DNA demethylation, (c) the dynamics of DNA methylation levels of early- and late-replicating domains in PL and L, (d) low expression of genes implicated in active DNA methylation and (e) direct measurement of the dynamics of DNA hemimethylation of L1 elements which allowed us to assess the extent of DNA hemimethylation in thousands of locations throughout the genome. Results of this analysis strongly support the idea of passive DNA replication mechanism of DNA demethylation at the onset of meiosis.
Overall, our analysis provides strong evidence of genome-wide DNA methylation by a passive, DNA replication-dependent mechanism. It is important to note that while a theoretical maximum reduction of DNA methylation levels is 50% of the starting values, the lower observed percentage in PL (by ~ 13 percentage points) is consistent with the non-uniform dynamics of DNA demethylation across individual genomes (early and late replication domains) and a non-uniform PL germ cell population studied (at different times of S phase, e.g., early, mid and late) which together lead to the apparently higher DNA methylation levels.
The dynamics of genome-wide CpG methylation point to a robust reduction of DNA methylation in PL compared to Spg. Interestingly, high methylation levels across Spg chromosomes argue against a possibility of preexisting DNA methylation levels determining the timing of DNA replication along chromosomes. Instead, consistent with semiconservative mechanism of DNA replication, DNA methylation levels remained high in late-replicating domains in PL but dropped in L upon replication of these regions in late PL, but before the recovery of DNA methylation.
Our observations underscore the uniqueness of meiotic S phase whose significance goes beyond being simply the last round of DNA replication prior to meiosis. Previous studies in numerous plant, fungal and animal species documented the longer duration of the meiotic S phase [2, 3, 51, 52, 53, 54]. In addition, several studies provided evidence linking the meiotic S phase to meiotic recombination [38, 55, 56, 57, 58]. Our results suggest that meiotic DNA replication in the male mice is different from somatic cells in that it fails to restore premeiotic DNA methylation levels in a timely manner.
What role(s) does TRDM play in meiosis? We envision several possibilities. First, TRDM might be a by-product of genome-wide remodeling of structure of chromosomes during mitosis-to-meiosis transition. Incorporation of meiosis-specific cohesin complexes, meiotic DNA recombination machinery and other events may preclude or reduce the accessibility of newly replicated DNA to DNMTs and their accessory proteins. Therefore, the observed reduction of DNA methylation might be non-consequential to meiosis.
Second, alternatively or in parallel, lower DNA methylation of the genome may create permissive conditions for some aspect of meiosis. Prior studies in plants, fungi and animals demonstrated that the disruption of normal pattern of DNA methylation in meiosis influences the meiotic recombination landscape [10, 11, 12, 59]. Although these prior studies do not demonstrate that the reduction of DNA methylation is an absolute prerequisite for wild-type levels of meiotic recombination, our finding of TRDM demonstrates that reduced DNA methylation is a common feature both of male and female meiotic germ cells of mice.
Third, DNA replication-coupled mechanism of TRDM suggests potential for distinct epigenetic states of hemimethylated sister chromatids of meiotic chromosomes. The impact of this epigenetic asymmetry of genetically identical DNA sequences on meiosis remains to be understood. We speculate that the epigenetic asymmetry of sister chromatids may lead to differential expression of their gene content. Indeed, as a result of passive DNA demethylation, genetically identical alleles will exist in distinct hemimethylated states with one allele inheriting the methylated coding strand while the coding strand of the other having no methylation. In addition, hemimethylation might stimulate inter-sister chromatid recombination. This idea is supported by early studies in cultured mammalian somatic cells that implicated DNA hemimethylation in increased sister chromatid exchanges in mitosis [60, 61]. In addition, given the essential role of the mismatch repair pathway in meiosis, it is tempting to speculate that DNA hemimethylation may influence the usage of sister chromatids in MPI akin to methylation-directed mismatch repair system of E. coli .
Finally, our results of analysis of L1 DNA methylation, and L1 RNA and protein expression suggest that TRDM contributes to gamete quality control. We base this conclusion on the finding of the appearance of fully unmethylated L1 elements that accounted for 2% of obtained sequencing reads in PL. This finding is consistent with incomplete DNA methylation (presumably hemimethylation) of a population of L1 elements in spermatogonia. Prior studies in embryonic stem cells and somatic cells demonstrated the existence of L1 elements showing DNA hemimethylation that necessitate cooperativity between de novo and maintenance DNA methyltransferases [63, 64]. We suspect that hemimethylated L1 elements remain largely transcriptionally silent until the meiotic S phase when they become fully demethylated and expressed. While fully demethylated L1s are the minority of potentially active L1 elements, they still correspond to dozens of L1 elements that may be expressed. Intriguingly, a potential role for TRDM in gamete quality control parallels a previously described selective elimination of MPI fetal oocytes with excessive L1 levels during the evolutionarily conserved process of fetal oocyte attrition . If this were the case, L1 elements may be contributing to gamete quality control in meiotic germ cells in both sexes.
Our results suggest that chromosomes exhibit dynamic changes in DNA methylation in MPI in male mice. These changes in DNA methylation arise by a passive mechanism during meiotic S phase resulting in hemimethylation of the genome in early MPI. We propose that TRDM facilitates meiotic prophase processes and gamete quality control.
Adult C57BL/6 J male mice (2- to 5-month-old) (Jackson Laboratory) were used as a source of adult testes. All experimental procedures were performed in compliance with ethical regulations and approved by the IACUC of Carnegie Institution for Science.
Germ cell isolation
Germ cell fractions were enriched by fluorescence-activated cell sorting (FACS) as described previously . Sorted germ cell fractions were devoid of somatic contamination but contained small amounts of germ cells from adjacent MPI stages. Cell fraction purity was determined to be > 85% for Spg, ~ 85% for PL, ~ 85% for L, ~ 80% for Z, > 90% for P, > 90% for D.
After dissection of the testis, the tunica was removed, the testis was fixed (2% PFA in PBS) at 4C for 4 h, shaking. Samples were passed through sucrose solutions (10% for 1 h, 20% for 1 h, 30% overnight at 4C), embedded in OCT and stored at − 80 °C. Sections of 10 μm were used for IF.
IF on testicular sections or meiotic spreads was performed as described before . ImageJ was used for image analysis.
The following primary antibodies were for immunofluorescence (IF): monoclonal anti-γH2AX (Mouse, 1 mg/ml, 1:1000, Millipore 05-636), polyclonal anti-Sycp3 (Rabbit, 1 mg/ml, Abcam, ab15092. IF: 1:500 dilution), polyclonal anti-ORF1p (Rabbit, 1 mg/ml, a kind gift from Dr. Martin. IF: 1:500 dilution), monoclonal anti-Dmrt1 (Mouse, 200 µg/ml, Santa-Cruz, sc-10222. IF: 1:200 dilution), polyclonal anti-Dmrt6 (Rabbit, a kind gift from Dr. Zarkower. IF: 1:200 dilution), monoclonal anti-sp56 (Mouse, Pierce, MA1-10866. IF: 1:750). The following secondary antibodies (2 mg/ml) were used in this study: donkey anti-rabbit Alexa Fluor 594, donkey anti-rabbit Alexa Fluor 488, donkey anti-mouse Alexa Fluor 488.
Adult mice 1–3 months old were injected with 12.5 μg/g of body weight EdU (0.5 mg/ml DMSO stock) dissolved in 200 μL water. Mice were killed 2 h after injection and processed for FACS or for cryosections as described above. EdU detection with Click-iT EdU Alexa Fluor Kit was performed as described in the manual (Invitrogen).
Whole-genome bisulfite sequencing (WGBS)
Each biological replicate consisted of pooled cells from 2 to 3 different animals from different FACS procedures. For WGBS, two biological replicates (2×) were used for Spg, PL, L, Z, P, D and epididymal spermatozoa (Spz). Genomic DNA (gDNA) was prepared by incubating cells in tail lysis buffer with 5 µl of Proteinase K (Life Technologies, 20 mg/ml) at 55 °C for 2–3 h. At the end of lysis, 2 µl of linear acrylamide (Ambion, 5 mg/ml) was added to samples. DNA was extracted with phenol–chloroform–isoamyl alcohol (Life Technologies, #15593) using Phase Lock Gel (PLG) Light tubes (5 Prime). One microliter of RNase A (Thermo Scientific, #EN0531, 10 mg/ml) was added to the aqueous phase, and the samples were incubated at 37 °C for 30 min, transferred to a PLG tube and mixed with chloroform. DNA was precipitated using ethanol and quantified with Quant-iT PicoGreen reagent (Molecular Probes) using SpectraMax microplate spectrophotometer. Isolated mouse gDNA was spiked with approximately 0.1% unmethylated cl857 Sam7 Lambda DNA (Promega) and sheared to fragments with a range of 200–600 bp using a Covaris M220 Ultrasonicator.
WGBS library preparation was based on Illumina’s ‘WGBS for Methylation Analysis’ protocol. The adaptor-ligated DNA fragments were processed for bisulfite conversion according to the manufacturer’s protocol (EZ DNA methylation Gold kit, Zymo Research). Bisulfite-treated DNA underwent 15 rounds of PCR amplification; libraries were prepared using Illumina TruSeq RNA Sample Prep Kit v2 and sequenced on Illumina HiSeq 2000 platform, yielding 100-bp paired-end reads. Each sample was run in a single lane, spiked with 5% Illumina PhiX genomic DNA control. Data were downloaded onto our servers in FASTQ format for processing.
WGBS read alignment and extraction of methylation evidence
We used Bismark program for alignment of bisulfite-converted reads to mouse genome assembly NCBI37/mm9. The alignment was performed with respect to the bisulfite-treated Watson (original top) and Crick (original bottom) strands, and not their reverse complements, as the library was prepared in a strand-specific (directional) manner. No trimming was performed prior to alignment. After alignment Bismark de-duplication module was used to remove PCR duplicates.
Bismark was used to extract and summarize CpG methylation evidence present in the unique alignments. CpG evidence was filtered based on evaluation of methylation bias (M-bias) plots, and we excluded the first 6 nt from 5′ end of read 1 and 10 nt from read 2, and the last 1 nt from 3′ end of both reads prior to the extraction of methylation. Subsequently, using Bismark, we extracted CpG coverage into a file containing information for both strands. Finally, we merged strand-specific information. The final output text file contained chromosome (chr) name, chr start, chr end, CpG methylation percentage, count C and count unmethylated C. The final DNA methylation files were then examined with bsseq package and supplemented with R-based data analysis or in-house scripts.
Bioinformatics analysis of global DNA methylation levels
Correlation between replicates of WGBS data was performed as follows: Biological replicates were compared pairwise (e.g., Spg1 with Spg2). Final Bismark output files containing CpG methylation and coverage were imported into R. DNA methylation was extracted and summarized in non-overlapping bins of 500 CpGs using rep() function, followed by aggregation of data and computation of mean values, using aggregate() function in R. Pearson correlation coefficient was then calculated using cor() function in R.
Global DNA methylation analysis was performed using bsseq package. Two replicate groups were formed, each consisted of seven samples (Spg, PL, L, Z, P, D and Spz) and made up a single Bsseq object. Only those CpGs that were covered by at least one read in all samples (common CpGs) were analyzed. Since only these common CpGs were involved in data analysis, the overall DNA methylation levels for each sample slightly differ from “raw” DNA methylation values obtained for all mappable CpGs of a given sample.
For plotting DNA methylation across the length of the chromosome, a custom Python script was used and DNA methylation was averaged using sliding non-overlapping windows of 100 kbp.
Annotation used for DNA methylation analysis
The following genomic feature annotations were directly extracted from UCSC Table Browser based on mm9 genome: introns, exons, intergenic regions, CpG Islands and RepeatMasker (RMSK) repeats. Promoter coordinates were extracted from the UCSC/knownGene transcriptome file by taking + 1 kb to − 1 kb relative to the TSS, in a strand-conscious manner. The geneID and geneSymbols were obtained from specifying the selected fields in the output format and selecting knownGene (name), and kgXref (geneSymbol) fields. A custom script was used to change ucsc_ids to geneSymbols in the BED files. The genomic coordinates for imprinted gametic DMRs included 11 maternal (Grb10, Igf2r, Impact, Kcnq1ot1, Mest, Nespas-Gnasxl, Peg10, Peg3, Snrpn, U2af1-rs1 and Zac1) and 3 paternal (H19, Dlk1-Gt12 and Rasgrf1) DMRs.
Analysis of sequencing coverage after WGBS
For chromosome-based visualization, de-duplicated BAM files were sorted using SAMtools (samtools.sourceforge.net). Individual chromosomes were analyzed separately (extracted from sorted BAM files). Biological replicates were analyzed separately for independent assessment. The files were uploaded into SeqMonk and read coverage was quantitated in the following manner: running, non-overlapping window probes of 5 kb were created to span the chromosome length. Read counts (the probes) were quantitated using the SeqMonk’s Read Count Quantitation approach where we counted all reads and corrected for total read count based on the largest data set. For overall coverage quantitation, running, non-overlapping window probes of 5 kb were created to span the chromosome length. Data store summary report was exported.
Determination of overlap between datasets
Generally, overlaps were computed using bedtools intersect. For example, for intersecting DNA methylation with late-replicating regions, <intersect –wa –wb> was used, with replication timing (RT) file (-a) and DNA methylation file (-b). The RT file contained <chr RT/start RT/end RT/RT value>. The CpG methylation file contained <chr C/start C/end C/methylation level. Pearson correlation between DNA methylation values and RT values was performed using cor() function in R. To examine proportion of Spg-to-PL DMRs that overlap with PL-to-L and L-to-Z DMRs, we used bedtools intersect –wa –wb –a option to intersect BED files of DMRs in question.
Analysis and annotation of differentially methylated regions (DMRs)
DMR analysis was performed using Bsseq package with previously optimized settings for DMR blocks. Bsseq employs local likelihood method, aggregating information from neighboring CpGs in a coverage-conscious manner and uses the combined data from two biological replicates to estimate DNA methylation at single CpG level. For this analysis, we required that each CpG be covered at least once in all four samples compared pairwise (two biological replicates per two stages). This selection resulted in a median CpG coverage of 3X–7X per sample (or 6X–14X per duplicate) and an overall coverage of > 77% of all genomic CpGs.
For analysis of DMRs and replication timing, genomic coordinates of DMR blocks were intersected with early or late replication timing (RT) coordinates. For every region within A (early or late RT domain), a number of intersections with B (DMR block) were computed. To calculate the proportion of an overlap between DMR blocks formed between Spg and PL (‘WTSpgPL’ DMRs) and all other DMR blocks, <bedtools intersect –wa –wb> was utilized, using a file containing WTSpgPL DMRs as (-a) and separate files, each containing DMRs between WT PL and L, L and Z, Z and P, P and D as (-b). Subsequent processing of the output involved adding the number of all intersections that matched WTSpgPL DMRs and normalizing to the total number of DMRs for a particular pairwise comparison.
Annotation from Illumina’s iGenomes (genes.gtf), based on the RefSeq dataset (July 2015), was used for annotating DMRs with genes.
We used Fisher’s exact test to examine the strength of overlap between DMRs and gene transcription. For each set of DMRs, a custom Python script was used to form a 2 × 2 table containing significantly upregulated genes that overlapped with DMRs, significantly upregulated genes that fell outside of DMRs, all genes found inside of the DMRs, and all genes found outside of the DMRs. To evaluate the significance of overlap, we calculated p values using Fisher’s exact test.
Each sample consisted of pooled cells from up to two different animals from different FACS procedures. Sixty to 200 ng of DNA was restriction digested with BspE1 for 16 h at 37 °C and enzyme heat inactivated at 80 °C for 20 min. Digested DNA was extracted using phenol–chloroform–isoamyl, precipitated with ethanol and digestion verified on a 1% agarose gel. Genomic DNA was digested with BspEI, and the two complementary DNA strands were linked with a hairpin linker (5′P-CCGGGGGCCTATATAGTATAGGCCC) in a 25-μl reaction containing: 17 μl digest, 2.5 μl water, 2 μl 10 μM hairpin linker, 2.5 μl 10X T4 ligase reaction buffer and 1 μl (400U) of T4 ligase. The ligation reaction proceeded 10 h at 16C followed by bisulfite conversion using EZ DNA Methylation-Direct Kit (Zymo Research). Conditions for bisulfite conversion of hairpin L1 sequences were adjusted to include additional thermal denaturation steps (Laird et al., PNAS 2004) as follows: (1) 99 °C for 15 min, (2) 64 °C for 1 h, (3) 99 °C for 5 min, (4) 64 °C for 1.5 h, (5) 99 °C for 5 min, (6) 64 °C for 1.5 h. L1MdTf-specific PCR was performed in a 50-μl reaction containing 5 μl DNA, 1x PfU Turbo Cx reaction buffer and 2.5 units of Pfu Turbo Cx Hotstart DNA Polymerase (Agilent), 12.5 mM dNTPs, 10 μM each forward and reverse primers 5′TGGTAGTTTTTAGGTGGTATAGAT and 5′TCAAACACTATATTACTTTAACAATTCCCA resulting in 332-bp amplicon. PCR conditions were as follows: (1) 96 °C for 5 min followed by 35 cycles of (2) 96 °C for 60 s, (3) 55 °C for 45 s, (4) 72 °C for 90 s followed by the final extension at 72 °C for 5 min. PCR products were analyzed by agarose gel electrophoresis prior to processing for sequencing.
The resulting product was prepared for sequencing using Illumina TruSeq mRNA v2 kit, starting with end repair step, and sequenced on NextSeq 500. The 150 paired-end reads were aligned to L1MdTf promoter consensus sequence.
Hairpin-bisulfite sequencing analysis
Fastq reads were trimmed with Trim Galore! using the following parameters: q 30, length 100, phred33, paired, three_prime_clip_R1 6, three_prime_clip_R2 6, stringency 5. We used Bismark  to align Reads 1 and 2 independently to Bismark-indexed L1TfMd 5′ end consensus sequence (5′ tccggaccggaggacaggtgcccacccggctggggaggcggcctaagccacagcagcagcggtcgccatcttggtcccgggactccaaggaacttaggaatttagtctgcttaagtgagagtctgtaccacctgggaactgccaaagcaacacagtgtctgagaaaggtcctgttttgg), using Bowtie 1 option. We used Bismak to align trimmed reads 1 and 2 to L1TfMd, independently, using the following parameters: non_directional, n 3, l 20. Subsequently, the reads were split into reads aligned to original bottom (OB) and those that aligned to complementary to original top (CTOT) using SAMtools. Thus, each read resulted in 2 files containing alignments to OB and CTOT, with read 1 OB plus read 2 CTOT, or read 2 OB plus read 1 CTOT corresponding to the two complementary strands of hairpin-bisulfite L1TfMd DNA. Bismark was used to extract CpG methylation from each file, and a custom script was used to generate a matrix where each line represented a “stiched” string of CpG methylation values for both strands of the hairpin-bisulfite L1TfMd DNA. Finally, R script was used to evaluate methylation, hemimethylation and unmethylated status of DNA.
Total RNA was isolated from FACS-enriched fractions from adult C57BL6 male mouse testis. In most cases, due to the limited availability of enriched cells, total RNA from 2–4 mice (2–4 independent FACS enrichment sessions) was pooled to create one sample. One microliter of RNA was used for evaluation on the BioAnalyzer. Ribosomal RNA (rRNA) was removed from total RNA (up to 50 ng) using Ribo-Zero Gold rRNA Removal Kit according to the manufacturer’s protocol. The TruSeq RNA Sample Preparation Kit v2 was used to prepare cDNA library from ribosomal RNA-depleted RNA. The libraries were prepared as described in the manufacturer’s protocol (Pub. Part No.: 15026495) following low sample protocol. DNA fragments were enriched with PCR for 15 cycles. One microliter of the resulting library was used for validation and quantification analysis, using Agilent Technologies 2100 Bioanalyzer and Agilent DNA-1000 chip. The cDNA libraries were sequenced as single end 50-mers using the Illumina HiSeq 2000 platform, yielding a total of ~ 246 million reads (26–66 million total reads per sample).
The quality of the raw RNA-seq libraries was evaluated using fastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The fastQC-reported “Per base sequence quality” measure was very good, with more than 92% of all reads having a quality score of more than 30, and mean quality score of more than 36.
The read alignment was performed with TopHat (v2.0.7) , using short read mapping program Bowtie 2 (v2.0.6). During the alignment, we provided a transcriptome file that contained gene annotation. The reads were processed based on NCBI37/mm9 mouse genome and UCSC RefSeq gene annotation obtained from Illumina iGenomes (July 2015).
We used HTSeq package to count sequencing reads that overlap with gene transcriptome . Specifically, we used <htseq-count –s no –a 10 input.sam iGenomes.gtf> command. The output is a tab-delimited text file containing counts for each gene (gene id and number of read counts). Subsequently, to evaluate differential expression we used edgeR. Specifically, we (1) built a counts table with all samples, using DGElist function, (2) normalized counts using the default TMM method, which accounts for compositional differences between the libraries, using calcNormFactors function, (3) obtained a table with normalized count-per-million (CPM), using cpm function, which we used directly for data analysis, or converted CPM to RPKM by (cpm/gene length/1000). For the differential expression analysis, an exact test was performed with an estimated Biological Coefficient of Variation (BCV) of 0.1, and topTags function was applied. A final table containing logFC (is log2FC), logCPM (is log2CPM), p value and FDR value for each gene was obtained.
For the analysis of the transcriptional landscape of repetitive elements, we used RepEnrich according to the suggested protocol . Briefly, we aligned RNA-seq data to the genome using Bowtie 1 parameters that allow only unique mapping (-m1) and outputted multi-mapping and uniquely mapping reads into separate files. We ran RepEnrich python script on the data and then used EdgeR for subsequent processing of fraction counts file, which contained 1444 repetitive element entries. Specifically, we (1) built a counts table with all samples, using DGElist function, (2) normalized counts using the default TMM method, which accounts for compositional differences between the libraries, using calcNormFactors function, (3) obtained a table with normalized count-per-million (CPM), using cpm function, which we used directly for data analysis. For the differential expression analysis, an exact test was performed with an estimated dispersion specific to each pairwise comparison and topTags function was applied. A final table containing logFC (is log2FC), logCPM (is log2CPM), p value and FDR value for each repetitive element entry (subfamily) was obtained.
AB and VG conceived and designed the experiments and wrote the manuscript. VG, GWvdH, CDL performed the experiments. VG (data analysis and bioinformatics), BM (bioinformatics and custom Python scripts) and KH (bioinformatics advice and guidance related to WGBS analysis) analyzed the data. All authors read and approved the final manuscript.
We thank Fred Tan for helping with bioinformatics analyses, Svetlana Deryusheva, Safia Malki and Marla Tharp for constructive feedback on the manuscript.
Availability of data and materials
The datasets supporting the conclusions of this article are available under BioProject accession number PRJNA326117 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/).
Consent for publication
The authors declare that they have no competing interests.
Ethics approval and consent to participate
All experimental procedures were performed in compliance with ethical regulations and approved by the IACUC of Carnegie Institution for Science. The study did not involve human participants, human data or human tissues.
This research was supported by the endowment of Carnegie Institution for Science. The funding body had no role in the design of the study and collection, analysis, and interpretation of the data.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 30.Tomizawa S, Kobayashi H, Watanabe T, Andrews S, Hata K, Kelsey G, Sasaki H. Dynamic stage-specific changes in imprinted differentially methylated regions during early mammalian development and prevalence of non-CpG methylation in oocytes. Development. 2011;138(5):811–20.PubMedPubMedCentralCrossRefGoogle Scholar
- 50.Gan H, Wen L, Liao S, Lin X, Ma T, Liu J, et al. Dynamics of 5-hydroxymethylcytosine during mouse spermatogenesis. Nat Commun. 1995;2013:4.Google Scholar
- 67.TopHat. https://ccb.jhu.edu/software/tophat/index.shtml. Accessed 20 Mar 2018.
- 68.HTSeq: analysing high-throughput sequencing data with Python. http://htseq.readthedocs.io/en/release_0.9.1/. Accessed 20 Mar 2018.
- 69.RepEnrich. https://github.com/nskvir/RepEnrich. Accessed 20 Mar 2018.
- 70.Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular biology of the cell. 4th ed. New York: Garland Science; 2002.Google Scholar
- 71.Replication Domain. https://www2.replicationdomain.com. Accessed 20 Mar 2018.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.