Hypomethylation coordinates antagonistically with hypermethylation in cancer development: a case study of leukemia
Methylation changes are frequent in cancers, but understanding how hyper- and hypomethylated region changes coordinate, associate with genomic features, and affect gene expression is needed to better understand their biological significance. The functional significance of hypermethylation is well studied, but that of hypomethylation remains limited. Here, with paired expression and methylation samples gathered from a patient/control cohort, we attempt to better characterize the gene expression and methylation changes that take place in cancer from B cell chronic lymphocyte leukemia (B-CLL) samples.
Across the dataset, we found that consistent differentially hypomethylated regions (C-DMRs) across samples were relatively few compared to the many poorly consistent hypo- and highly conserved hyper-DMRs. However, genes in the hypo-C-DMRs tended to be associated with functions antagonistic to those in the hyper-C-DMRs, like differentiation, cell-cycle regulation and proliferation, suggesting coordinated regulation of methylation changes. Hypo-C-DMRs in B-CLL were found enriched in key signaling pathways like B cell receptor and p53 pathways and genes/motifs essential for B lymphopoiesis. Hypo-C-DMRs tended to be proximal to genes with elevated expression in contrast to the transcription silencing-mechanism imposed by hypermethylation. Hypo-C-DMRs tended to be enriched in the regions of activating H4K4me1/2/3, H3K79me2, and H3K27ac histone modifications. In comparison, the polycomb repressive complex 2 (PRC2) signature, marked by EZH2, SUZ12, CTCF binding-sites, repressive H3K27me3 marks, and “repressed/poised promoter” states were associated with hyper-C-DMRs. Most hypo-C-DMRs were found in introns (36 %), 3′ untranslated regions (29 %), and intergenic regions (24 %). Many of these genic regions also overlapped with enhancers. The methylation of CpGs from 3′UTR exons was found to have weak but positive correlation with gene expression. In contrast, methylation in the 5′UTR was negatively correlated with expression. To better characterize the overlap between methylation and expression changes, we identified correlation modules that associate with “apoptosis” and “leukocyte activation”.
Despite clinical heterogeneity in disease presentation, a number of methylation changes, both hypo and hyper, appear to be common in B-CLL. Hypomethylation appears to play an active, targeted, and complementary role in cancer progression, and it interplays with hypermethylation in a coordinated fashion in the cancer process.
KeywordsEpigenetic regulation DNA methylation Hypomethylation CLL Cancer Signaling pathway 3′UTR Enhancer
Loss of DNA methylation, also known as hypomethylation, in cancer cells relative to normal cells was one of the first-described epigenetic changes in human cancers. Hypomethylation has been detected at both a global level and on a local scale  in cancer genomes. Many cancer types have been reported to have global loss of methylation like glioblastoma , ovarian epithelial carcinoma , prostate metastatic tumors , B cell chronic lymphocytic leukemia [5, 6], hepatocellular carcinoma , cervical cancer , colon adenocarcinoma , and Wilms’ tumor . However, the biological significance of DNA hypomethylation remains understudied owning to its unclear role in carcinogenesis, in contrast to hypermethylation, which is commonly viewed as a transcription silencing mechanism [11, 12]. Yet, hypomethylation of DNA, despite its unclear role, has been linked to tumor progression [8, 13] in different tumor types and in individual specimens [3, 14]. Also, some experiments have indicated the importance of induced DNA hypomethylation in oncogenesis by using DNA methylation inhibitors in vivo and in vitro [15, 16]. However, the role of hypomethylation is not clearly understood. Hence, it is critical to analyze hypomethylation data in depth to achieve a better understanding of its biological roles in carcinogenesis.
DNA hypomethylation in cancer is often seen in satellite DNAs, Arthrobacter luteus (ALU) repeats, and long interspersed nuclear elements (LINEs) [17, 18], etc. These DNA repeats comprise approximately half of the genome. Hence, DNA hypomethylation is generally considered a global phenomenon not suitable for use as a biomarker. One advantage of the global hypomethylation phenomenon (as it pertains to its genome composition) is that it is often considered a technique to balance focal and conserved hypermethylation in the promoter regions of key genes. Also, it is believed that these hypomethylated genomic regions are randomly spread over the genome, mostly in repetitive regions whose functions, if any, are unclear. Again, this reported disadvantage might actually be an advantage due to recent findings indicating that ALU elements can act as enhancers , which further emphasizes the need for defining the role of hypomethylation in cancers.
As part of our study of hypomethylation patterns, we used B cell chronic lymphocytic leukemia (B-CLL) as an example case. This B-CLL cancer type has a predominant global hypomethylation as its characteristic feature [5, 6], and it is the most common form of blood cancer. It is a clinically heterogeneous disease, with some patients experiencing rapid disease progression and others living for decades without requiring treatment . Although a number of cellular and molecular prognostic markers, i.e., surface markers ZAP70 and CD38, cytogenetic abnormalities, and IGHV mutational status [21, 22, 23], have been identified to help classify CLL into molecular and clinical subgroups and to predict their course of progression, they do not provide clear insight into the underlying biology necessary to develop better targeted and more effective treatments.
In addition to various molecular and genetic changes, several genome-wide DNA methylation studies have identified many aberrantly methylated genes in CLL samples . Initially, DNA hypermethylation in CLL patients was found to affect 4.8 % of CpG islands on average . Furthermore, hypermethylation in the promoters of tumor suppressor genes such as DAPK1 , SFRP1 , and ID4  genes involved in apoptosis, cell cycle regulators p16 and p15 , and prognostic markers ZAP70  and TWIST2  were identified. DNA methylation changes were also found to be associated with disease progression in the Eμ-TCL1 transgenic mouse model of CLL . In addition to hypermethylation, hypomethylation of proto-oncogenes has also been observed particularly in liver tumors and leukemia such as the c-fos, c-myc, ras, Erb-A1 , and the bcl-2 gene . Along with this, many studies have indicated widespread hypomethylation compared to instances of hypermethylation, particularly in the CLL cancer type. However, a detailed account on the genome-wide hypomethylation pattern and its contributing role towards cancer development has not been conducted for CLL. Hence, it is clear that an in-depth methylation analysis focusing more on hypomethylation can be very helpful to unveil the underlying mechanism regulating the disease.
Here, we studied the genome-wide DNA methylation pattern in CLL and investigated whether hypomethylation is also consistent at some locations like hypermethylation across multiple CLL patients. We also investigated the biological role of consistent hypomethylation towards tumor initiation and progression; and finally, we compared instances of consistent hypomethylation to that of consistent hypermethylation. We characterized the epigenetic context of hyper- and hypomethylated regions in CLL and further investigated association of hypomethylation with change in expression of the neighborhood genes along with their potential mechanism of influence.
Methylation data analysis
In order to study genome-wide methylation changes in the CLL genome, we computed differentially methylated regions (DMRs) from genome-wide methylation data of 30 samples from publically available CLL samples in GEO (http://www.ncbi.nlm.nih.gov/geo/). DMRs of size 1000 bp were obtained by comparing each patient sample against each control normal sample individually using Fisher’s exact test. False discovery rate (FDR) was used to correct for multiple testing errors with a q value threshold of 0.01. Hence, three sets of DMRs were obtained by comparing all 30 CLL samples against each of the three control samples.
Entropy and permutation analysis
Next, in order to check randomness in contrast to conservation of methylation change in each 1000 bp region across all CLL samples, methylation entropies were calculated using a probability distribution of methylation changes for each region. Figure 1b shows this opposite pattern of entropy and average methylation change. This plot shows that a high percentage methylation of specific regions is more consistent across all patients; however, as the average methylation goes down, their conservation tends to fluctuate, thereby leading to an increase in entropy (Fig. 1b). After comparing these methylation entropies for each region against the average methylation change across 30 CLL samples, we observed a negative correlation (Pearson correlation = −0.22, p value <0.05) in each of the 3-control sample tests.
Identification of consistent differentially methylated regions (C-DMRs)
In order to obtain consistent DMRs, a binomial test was used to check the significance of each DMR with the probability of being hypo/hypermethylated in 25 or more CLL samples (q value <0.01). Hence, three lists of significant DMRs were obtained for each control sample. Next, 658 hypo- and 982 hypermethylated regions that were found common in all three lists and referred to as C-DMRs (see Tables S1–S4 in Additional file 1 and Additional file 2 for lists and details).
To further check the statistical significance of our lists of hypo- and hyper-C-DMRs, we performed two permutation tests, one by permuting samples and the other by permuting methylation values for 1000 bp regions (see Methods in Additional file 1 for more details). The sample permutation test helped in detecting whether we observed hyper/hypo-C-DMR patterns by chance if there was no difference among cancer vs. normal samples. On the other hand, the methylated region permutation test detected whether hyper/hypo-C-DMR pattern can occur by chance if there is no difference between regions. In both permutation tests, all of our obtained C-DMRs had q values <0.05, showing the statistical significance of hyper- and hypo-C-DMRs in cancer samples against normal and non-DMRs.
Differences in positional genomic location analysis of hyper- and hypo-C-DMRs
Across each sample data, we found 25–30 % of all DMRs as hypomethylated, and only 10–15 % hypermethylated, as shown in Figure S1 in Additional file 1. However, they were not targeted for any specific chromosomal region (Figure S2 in Additional file 1). Also, only 100 hypo-C-DMRs (15.2 %) co-localized with CpG islands, while 955 out of 982 hyper-C-DMRs (97.2 %) co-localized with CpG islands. Hypo-C-DMRs were mostly present in regions outside CpG islands and shores (Fig. 2c). Next, Fig. 2d shows that almost half of hypo-C-DMRs (46 %) were present on non-repeat regions along with ones mapped on the repeat regions. Overall, hypo-C-DMRs were found more in 3′UTR, intronic, and intergenic regions, mostly outside CpG islands and overlapping both repeat and non-repeat regions over enhancers. Additional file 3 lists the genes with genic-regions overlapped by these C-DMRs.
Enrichment of KEGG pathways, GO annotations and phenotypes
Among the GO annotations, the most important and significantly enriched biological processes for hypo-C-DMRs were “negative regulation of transcription” (p.adj = 4.28E-02), “chromatin modification” (p.adj = 3.92E-02), “regulation of signaling” (p.adj = 3.92E-02), “histone methylation” (p.adj = 3.92E-02); “positive regulation of histone H3-K9 methylation" (p.adj = 4.29E-02), “protein alkylation” (p.adj = 4.28E-02), “programmed cell death” (p.adj = 4.30E-02), “negative regulation of cell proliferation” (p.adj = 4.40E-02), and “negative regulation of leukocyte differentiation” (p.adj = 3.04E-02), “leukocyte activation” (p.adj = 4.28E-02), and “cell morphogenesis involved in differentiation” (p.adj = 4.29E-02). On the other hand, hyper-C-DMRs were enriched for processes which are antagonistic to the processes enriched in hypo-C-DMRs. Hyper-C-DMRs were enriched for processes like “positive regulation of transcription” (p.adj = 4.81E-12) and “positive regulation of cell differentiation” (p.adj = 1.99E-07), “positive regulation cell proliferation” (p.adj = 4.75E-06), “regulation of signaling” (p.adj = 3.53E-05) with more processes related to “cell-fate commitment” (p.adj = 6.09E-13), “cell morphogenesis involved in differentiation” (p.adj = 7.69E-08), and similar processes like “protein and cell localization” (p.adj = 4.77E-06). Also, hypo-C-DMRs were enriched for “H3K9 methylation” but hyper for “H3-K27 methylation.”
Molecular functions (Fig. 3) showed a strong enrichment for binding functions like “protein binding” (like integrin, p.adj = 1.92E-02), “SH3 domain binding” (p.adj = 1.92E-02), “p53 binding” (p.adj = 1.92E-02), “histone-arginine N-methyltransferase activity” (p.adj = 2.20E-02), “tumor necrosis factor-activated receptor activity” (p.adj = 3.20E-02), GTPase regulator activity (p.adj = 3.20E-02), and “transcription factor binding” (p.adj = 3.40E-02) in hypo-C-DMRs. In comparison, hyper-C-DMRs were specifically highly enriched for “sequence-specific DNA binding” (p.adj = 5.49E-32), transcription factor (or regulator, p.adj = 3.86E-25), and trans-membrane transporter activity (p.adj = 6.58E-03), along with also metal/ion (calcium, p.adj = 6.72E-07) and “enhancer binding” (p.adj = 5.47E-05). In summary, both C-DMR types (hypo and hyper) affected similar processes but in different directions, with hypomethylation focusing on transcription and leukocyte activation and hypermethylation focusing on transcription suppression through transcription factors and cell-fate commitment.
We did not observe any differences in enriched phenotypes between hypo- and hyper-C-DMRs. Specific associations with the hematopoietic system, homeostasis/metabolism, mortality/aging, immune system, and growth/size phenotype were enriched in both types of C-DMRs, except embryogenesis (or related terms like differentiation), which was specific to hypermethylation. Genes falling under each enriched top attribute are listed in Additional file 5. These results suggest the importance of both hypo- and hyper-C-DMRs in CLL, and emphasize the functional significance of hypo-C-DMRs.
Enrichment of TFBS, histone modifications, and chromatin states
Next, in order to identify epigenomic signatures associated with both C-DMRs, we systematically tested for enrichment in transcription factor binding sites (TFBSs), histone modification marks, and chromatin states provided by ENCODE project .
Among histone modification marks, we found H3K27me3 to be highly enriched in hyper-C-DMRs (Fig. 4b, p.adj = 2.31E-40). On the contrary, hypo-C-DMRs were strongly enriched in H3K4me1 (p.adj = 1.11E-54), H3K27ac (p.adj = 1.22E-38), and H3K79me2 (p.adj = 1.44E-34) histone modification marks (Fig. 4b). Both C-DMRs were enriched in H3K4me2, H3K4me3, and H2AZ histone modification marks (Fig. 4a). The specificity of the H3K27me3 mark for hyper-C-DMRs was confirmed by using histone modification data from all cell lines provided by the ENCODE project (Figure S3 (b) in Additional file 1 and Additional file 6).
Furthermore, hyper-C-DMRs were significantly enriched for homeobox, E2F and TATAbox/promoter motifs, and hypo-C-DMRS for ETS, IRF4, EGR, ZFX, RUNX1, PU.1, Pax5, BATF, Erra, and bZIP motifs (Table S5 in Additional file 1). All motifs enriched for hypo-C-DMR have been shown to contribute to cell proliferation, B cell development and pathogenesis of lymphomas [37, 38, 39].
Using chromatin state annotations from multiple cell lines (Figure S3 (c) in Additional file 1 and Additional file 6), we found hyper-C-DMRs to be enriched in “repressed” (p.adj = 5.63E-182) and “poised promoters” (p.adj = 1.24E-69) chromatin states (Fig. 4c). In contrast, hypo-C-DMRs were consistently enriched in both “strong/weak enhancers” (p.adj = 6.73E-36 for both) and “weak transcription” (p.adj = 2.73E-24). Both C-DMRs were similarly enriched in “weak promoters” (p.adj = 1.70E-19 and 2.93E-15 for hyper- and hypo-C-DMRs, respectively). These results suggest that distant enhancer regions tend to be hypomethylated in cancer whereas regions associated with repressed or poised transcription are hypermethylated.
Also, 17,811 hyper- and 15,599 hypo-differentially methylated regions (DMRs) were obtained from pooled CLL and control sample comparison (Table S2 in Additional file 1). Each of these region lists exhaustively included all hyper- and hypo-C-DMRs, respectively. KEGG and GO annotation enrichment analysis of pooled sample DMRs also showed a strong enrichment of similar terms (Additional files 1 and 4). Results from pooled sample analysis show that C-DMRs are more specific and significant subset of DMRs that are consistently present in all samples.
Expression data analysis
Expression profiles in relation with methylation
Next, we looked for expression (transcript FPKM values) and methylation (for overlapped CpG sites) relationship in exons and introns individually (Fig. 5d, e). We observed that among exons (Fig. 5d), transcripts in the lowest expression quartile had the highest and most distinct methylation pattern. Overall, all exons combined from transcripts in all expression quartiles had a negative correlation (corr = −0.13; p value <2.2E-16). For introns (Fig. 5e), this relationship appeared to be opposite with the highest expression quartile transcripts showing the highest methylation but almost no correlation (corr = −0.02; p value = 6.2E-2). Also, we identified a very clear distinction in methylation patterns from different expression quartiles in exons specifically from 5'UTRs (Fig. 5b). Overall, 5'UTRs had a negative correlation between methylation and expression (corr = −0.2; p value <2.2E-16). Conversely, exons from 3'UTRs had the opposite methylation pattern (Fig. 5c). 3'UTR exons from the highest expression quartile genes had the highest methylation (Fig. 5d, Figure S8 in Additional file 1) and overall, they had weak but positive correlation (corr = 0.1; p value = 2.4E-3). In summary, we observed different methylation effects on expression for different genic regions, particularly within exons from 5'UTR and 3'UTR that related to methylation changes at widespread genomic locations in CLL.
Correlation module analysis
In order to further investigate the role of 3'UTR methylation change on expression, we used both expression and methylation data to construct co-expression and co-differential methylation network modules, respectively. These modules were generated using the weighted gene correlation network analysis (WGCNA) framework . For this, we selected 1780 transcripts from 19 matching CLL samples, from which both expression and average 3'UTR methylation data were available.
WGCNA identified 21 co-expression modules with sizes ranging from 41 to 181 transcripts from the expression data and 17 co-differential methylation modules with sizes ranging from 37 to 284 transcripts from the methylation data (average of all CpGs within 3'UTRs). Methylation values of transcripts in each CLL sample were compared against methylation of transcripts from one common control sample, individually. Differential methylation values for all transcripts in all CLL samples were thus obtained. Correlation network modules in each dataset were obtained using hierarchical clustering of pairwise gene correlation structures using WGCNA. WGCNA does not use gene ontology information but clusters the interconnected genes defined as branches of a hierarchical cluster tree. Hence, modules are initially labeled by arbitrary integers and then coded by colors for each dataset. Since clustering for module generation has no gene annotation or functional information, functional interpretation for each module in each dataset was further obtained by conducting a GO enrichment analysis. GO enrichment analyses revealed unique and significant enrichment of various GO terms, providing evidence of a functional role for each module as a whole (Additional file 7; Tables S2, S3, S8, and S9 in Additional file 1). Overall, different biological processes for different modules implied biological significance of clustering transcripts in separate modules in both expression and methylation data.
Next, to investigate the relationship between expression and methylation modules, these modules were matched by pairwise comparison of each methylation module to each expression module using two methods. First, they were compared to measure a statistically significant overlap of genes in each pair. Second, we used network-based statistics to assess whether the density and connectivity patterns of genes were also preserved in a two-paired set of modules with significant gene overlaps. The second method generated a composite statistic value, i.e., Z summary, using a permutation test to measure the strength of methylation module and expression module preservation. Also, knowing the Z summary statistic bias towards a module with a large size, a rank-based statistic medianRank was used in the second method to measure their relative preservation irrespective of module size. The medianRank is the statistic calculated from observed preservation values and does not conduct any permutation test against background gene modules.
Significant overlap and preserved modules in WGCNA of 3′UTR methylation and expression
Methylation module (size)
Expression module (size)
Overlap count (p value)
Regulation of Ras protein signal transduction
Cell division and chromosome partitioning/cytoskeleton
Signaling and apoptosis, cell cycle checkpoint
Light green (49)
Midnight blue (55)
Also, “green,” “pink,” “blue,” “black,” and “grey60” color-coded expression modules were the other top modules showing a strong preservation (Z summary >10) and low median rank (Fig. 6b). All these four modules were again enriched for “zinc ion binding,” “regulation of transcription,” and “apoptosis”. They overlapped with “light cyan,” “midnight blue,” “black,” and “turquoise” methylation modules. Biological processes like “cell division,” “chromosome partitioning/cytoskeleton,” and “GTPase regulator activity” were enriched in both “grey60” (second least rank = 5, Z summary = 13) expression module along with its overlapping red methylation module (Additional file 7). An additional network analysis was carried out using expression and average methylation of whole genes (not just 3′UTRs) that are non-coding (i.e., transcripts that do not encode a protein product) along with 3′UTR methylation. Results from this analysis can be accessed in Additional file 7. This non-coding RNA analysis also gave results similar to the 3′UTRs, including overlap of modules enriched for similar cancer-related terms. Hence, we see that significantly preserved expression and methylation modules were enriched for similar cancer-related biological processes like leukocyte proliferation, apoptosis, signal transduction, and cell-cycle regulation. Our observation from network preservation of expression and 3′UTR methylation change provides clues for a better understanding of the contribution made by the regulatory role imposed by 3′UTR methylation overexpression.
Interplay between hypo- and hypermethylation
Role of hypomethylation in cell-cycle regulation, histone modification, and transcription activation in CLL
Hypermethylation at the promoter region of tumor suppressors and their subsequent silencing is a well-studied mechanism of tumorigenesis. In contrast, hypomethylation, potentially leading to upregulation of oncogenes, is not fully understood. Also, genic hypomethylation is often considered as a random and non-consistent process due to a particularly predominant de-methylation process in mature B cells in CLL samples. In this study, we showed that consistent hypomethylated regions (referred to here as hypo-C-DMRs) account for a significant pattern of methylation changes in CLL with a distinctive pattern of gene expression and regulatory associations.
In particular, we observed that both hypo- and hyper-C-DMRs were enriched for similar biological processes but in an opposite direction. For example, hyper-C-DMRs were enriched for “positive regulation of cell differentiation”, but hypo-C-DMRs were enriched for “negative regulation of cell differentiation.” We also observed a significant enrichment of BCR signaling associated with hypo-C-DMRs. This further strengthens the importance of hypomethylation in CLL since BCR is a central pathogenic mechanism in B cell malignancies, including CLL . In addition, GO annotations relate to transcription regulation, chromatin modification, apoptosis, cell proliferation, leukocyte differentiation, and signal transduction were enriched for hypo-C-DMR, which also defines their functional role in cancer development.
Also, the most significantly enriched TFBS for hypo-C-DMRs was EBF1, which is a transcription factor that is critical for both B lymphopoiesis and B cell function . EBF1, in collaboration with a hierarchy of partner proteins, including E2A, Runx1, and Pax5 (also enriched in motif analysis) activates the B cell transcriptome and represses programs of alternate hematopoietic lineages. DNA binding by EBF1 has also been linked to changes in epigenetic marks on their target genes. Binding of EBF1 and other factors including E2A have also been correlated with H3K4me1 at target genes, which is also the most enriched histone modification mark in our analysis. H3K4me1 is in fact known to facilitate additional epigenetic modifications necessary for transcription . RUNX3, TCF3, PU.1, and PAX5 are also key transcription factors in B lymphopoiesis and cell proliferation. Other TFBS enriched for hypo-C-DMRs were POL2, CHD2, WHIP, and TBLR1. CHD2 is a DNA-binding helicase that specifically binds to the promoter of target genes, leading to chromatin remodeling and promoting their expression . WHIP is a protein that binds to DNA polymerase delta and increases the initiation frequency of DNA polymerase delta-mediated DNA synthesis . TBLR1 is a key regulator of different properties of the BCL-3  that acts as an oncogenic protein through multiple mechanisms that include the induction of cyclin D1 expression and also inhibits cell apoptosis through induction of the E3 ligase of p53, MDM2 [47, 48]. Among other enriched histone modifications, H3K79 di-methylation is known for regulating the initiation of DNA replication , and H3K36me3 is found in actively transcribed gene bodies of genes involved in G1/S transition in a cell cycle . Presence of hypo-C-DMR overlapping enhancers and weak promoters further emphasize their role in activation of transcription. Overall, both enriched TFBS and histone modifications are known to relate to B lymphopoiesis, transcription activation, and cancer development.
In contrast to hypo-C-DMRs, hyper-C-DMRs, which are known to regulate the expression silencing mechanism, were implicated by the enrichment of EZH2. EZH2 contains a histone methyltransferase SET domain that methylates histone tails on gene promoters to repress their transcription initiation, and this domain is an important component of the polycomb repressive complex 2 (PRC2). The PRC2 protein EZH2 is also known to preferentially methylate Lysine 27 on histone 3 (H3K27)  and also H3K9 under certain conditions. H3K27me3 and H3K9me3 were both enriched for hyper-LSDMRs in our analysis as well. H3K4me3, H3K9me3, and H3K27me3 co-localizes with most polycomb target proteins like SUZ12, CTBP2, and EZH2-binding sites enriched in hyper-C-DMRs (Additional file 1; Figure S5). Several other studies [51, 52] have reported DNA methylation and tumor suppressors in cancers marked with polycomb proteins enriched with EZH2 and H3K27me3. This study also elucidates the known mechanism of hyper-C-DMR in gene silencing and promoting cancer development. Further, enrichment of repressor chromatin region for hyper-C-DMRs confirms their role in silencing the expression of target genes.
Our motif enrichment analysis showed hypermethylation enriched for motifs like homeobox and TATAbox, which are usually present in promoter regions and thus silence many key genes. In contrast, hypomethylation was enriched in motifs of transcription activator binding genes, such as ETS , ZFX , cMYC  (Table 1), which are again involved in cell growth, apoptosis, and metabolism, processes necessary for tumor progression. Enriched transcription factor motifs like Ikaros (IKZF) and PU.1 govern B cell lineage priming, which involves changes in histone modifications and chromatin structure of genes encoding molecules important for the establishment of a B cell program . Other significant classes of motifs enriched in hypo-C-DMRs were motifs containing ETS domain in genes like Elk1 and Fli and RHD domain in genes like NFAT and NFkB-p65. These genes are downstream nuclear targets of Ras-MAP kinase signaling and are also known as oncogenic transcription activators, specific to  cell survival and proliferation.
Methylation pattern in exons, introns, and 3′UTRs
In addition to evidence of transcription activation, we observed that hypomethylation in CLL mostly targets intronic, intergenic, and 3′UTR regions. Regarding the relationship between methylation and expression change with respect to genic locations, we found negative correlation for methylation in exons and whole transcript expression within 30 CLL samples. But, this correlation within exons was inconsistent in UTRs. Exons in 5′UTRs seem to act more like promoters, but exons in 3′UTRs had the opposite effect on expression. Hence, these findings suggest that in contrast to a gene expression-inhibiting role of increasing methylation associated with 5′UTR exons, methylation in 3′UTR exons is in fact required in the normal transcription process.
Regulation of expression by 3′UTR methylation pattern
Our co-expression and co-methylation network analysis revealed that both transcriptome and methylome can be organized into modules. Genes in co-methylation and co-expression modules were found highly enriched for specific gene ontology categories, underscoring their functional importance. Many 3′UTR modules associated with methylation changes were found to have moderate to strong preservation with expression modules. Also, the most preserved module had functional annotations related to signaling and growth and proliferation. Hence, preserved 3′UTR methylation and expression modules revealed the ability of 3′UTR methylation to dictate their expression. The regulatory behavior of methylation change could, therefore, be detected—not only in 5′UTR, promoters, and gene bodies—but also in 3′UTRs in CLL. Also, significantly correlated 3′UTR methylation and expression modules were enriched for biologically important pathways involved in signaling cascade, apoptosis, and cell proliferation. These results provide a fine-grained look at the interaction among 3′UTR co-methylation and co-expression modules altered in CLL.
In summary, we report that hypomethylation of DNA appears to facilitate the aberrant expression of proto-oncogenes/oncogenes, potentially stimulating cell proliferation in CLL. We observed that apart from global hypomethylation of repeat sequences, there also exists site-specific hypomethylation of certain genes and genic regions, especially in genes linked with signaling pathways (e.g., BCR, LYN RAB8A, NFKBIB), chromatin modifications (e.g., CHD2, CHD3, SMARCB1), cell growth and development (e.g., EBF1, EGR1, EGFR, ERBB2, MYC), apoptosis inhibition (e.g., BCL2, TRAF1), and promoting cell proliferation (e.g., CCND1, LYN, BCL3). We observed 3′UTRs to possess a high percentage of hypo-DMRs consistent in the majority of our test samples. We report genes with 3′UTR consistent hypomethylation in CLL like LIF and PIM3. Along with that, we also report genes with consistent hypermethylation in CLL in 3′UTRs like HMX2 and other genic regions. We also observed that methylation changes at 3′UTR had significant correlation with expression along with overlapping network modules in both datasets. Our findings, thus, suggest that hypomethylation in different genic regions might exhibit a significant deleterious effect on gene expression that results in malignant transformation and/or tumor progression.
We observed that hypomethylated regions were less consistent over the genome among different samples, in contrast to hypermethylation loci. However, some hypomethylated regions were highly consistent in most of the samples, and their functional analysis revealed their potential biological significance in CLL.
We observed hypomethylation at many genes containing key TFBS involved in cell growth and development, histone remodeling, apoptosis, and cellular proliferation. We found hypomethylation in many key signaling regulators consistent in majority of samples, which do not appear to be random events or a non-specific part of global hypomethylation. In addition, this study contributes to our understanding about the relationship between methylation and expression levels in CLL samples. Results from positional analyses for genic location indicate that the conventional model of methylation regulating expression in an antagonistic manner is most common. However, we also uncovered an interesting and conflicting relationship between methylation and expression for methylation occurring in exons of 3′UTRs. Specifically, we found evidence of a loss of DNA methylation that not only causes genomic instability but also potentially activates many genes mainly in signaling pathways like BCR in CLL. Finally, we showed that 3′UTR methylome and transcriptome are organized into biologically meaningful modules with significant correlations and strong-to-moderate preservation of their density and connections between two datasets. The preserved modules were also found as functionally related indicating the role of 3′UTR methylation in expression regulation.
Publicly available reduced representation bisulfite sequencing (RRBS) methylation for 30 CLL and three control samples were obtained from the GEO website (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66121). Control samples were CD19+ B cells isolated from peripheral blood of normal controls. Human genome annotation data from the UCSC genome browser (hg19 genome assembly), such as Refseq and UCSC genes, CpG islands, Vista Enhancers, ENCODE Transcription Factor ChIP-Seq, and RepeatMasker Tracks, were used to annotate differentially methylated regions of interest. For genes with multiple isoforms, the longest one was used as the reference. Promoter regions were selected from −2 kb upstream to the transcription start site of each gene.
Read mapping and % methylation
The bisulfite-treated sequencing reads in DNA methylation data for CD19+ B cells were mapped to bisulfite-converted human genome using Bismark  (using Bowtie). Bismark was used also to obtain the genome-wide cytosine methylation calls at a base resolution in the CpG context. Additional file 8 provides read mapping count tables for each sample.
Sequencing reads from RNA-seq experiments were mapped using Bowtie with known ENSEMBL transcripts as gene models. After mapping, FPKM values for each gene/transcript were calculated by Cufflinks  and differentially expressed (DE) genes were defined by abs(ln(fold-change)) > 1.5. FDR-corrected p < 0.05 for all DE genes were calculated by Cuffdiff . Next, the over enriched GO categories were obtained based on a .05 FDR cutoff using the GO-seq R package.
Considering the high correlation between methylation of adjacent CpGs, the methylation information obtained from RRBS data was summarized on 1000 bp tiling windows (step-size 1000 bp) with minimum 3 CpGs and minimum 10 reads mapped on each CpG using the R package, methylKit . For DMR calculation, pairwise comparison of 1000 bp tiles in each of the 30 tumor samples against each control normal sample was performed using Fisher’s exact test. From each such test, differential methylation values were obtained only for the regions that were common between CLL and control sample. Thirty such tests were conducted for each control sample. Next, in order to ensure comparable statistics, only those regions that had differential values from each of the 30 tests were used. This gave us 41,421 common regions obtained from the first control sample comparison tests. Similarly, 39,327 and 41,359 regions were obtained from each of the other two control samples.
Also, the methylation entropy across all CLL samples was calculated in order to see probability distribution of methylation changes for each 1000 bp region across all samples. Entropy for each sample was computed as follows:
Enrichment analysis of genes overlapped by C-DMRs for GO categories and KEGG pathways was performed using GOStats R package . Gene set enrichment for gene symbols overlapping hypo-C-DMRs was also performed using GeneDecks  to highlight shared descriptors between pairs of genes based on annotations within the GeneCards compendium of human genes.
The epigenomic enrichment analysis of C-DMRs was performed using Genome Runner . Briefly, genomic coordinates of hyper- and hypo-C-DMRs were collected and tested for co-localization with three groups of genome annotation datasets: (1) chromatin state segmentation by HMM from ENCODE/Broad, (2) histone modifications by ChIP-seq from ENCODE/Broad Institute and ENCODE/Stanford/Yale/USC/Harvard, and (3) experimentally validated transcription factor binding sites from ENCODE/Broad Institute and ENCODE/Stanford/Yale/USC/Harvard. Genomic regions annotated by the ENCODE with any functional/regulatory information (~80 % of the whole genome) were used as a “background” to estimate co-localizations that can occur by chance. p values were calculated using Chi-square test and corrected for multiple testing using FDR.
Motif analysis was also carried out using Homer  software after retrieving sequences around each DMR CpG along with those around non-DMR CpGs randomly chosen as a background.
Expression in relation to methylation analysis
To associate gene expression changes with methylation differences, all CpG average methylation values were paired with the average expression of transcripts they overlap in all CLL samples. Estimated expression profiles of all transcripts were divided into four expression quartiles and referred to as lowest to highest expression groups within all transcripts. The positional enrichment analysis for each gene region was performed using Homer, and R scripts were used to calculate and plot smoothened density estimates. Locfit library was used for fitting local regression, likelihood models, and related smoothing procedures.
Correlation of whole transcript expression to methylation of CpGs within each specific gene region was calculated by using the Pearson correlation coefficient. For correlation calculation average methylation of CpGs (across all CLL samples) in a specific gene region and average expression of the whole transcript corresponding to these CpGs were used.
Correlation module analysis
From 19 matching CLL samples from both methylation and expression data, we obtained differential methylation and differential expression (FPKM) for 3′UTRs of 1780 transcripts. For this, whole transcript expression and average methylation for all CpGs in each of its 3′UTRs were used. Differential expression and 3′UTR methylation for each transcript was then computed by comparing each CLL sample individually against one common control sample.
We then used differential expression and methylation value matrices to identify co-expression and co-differential methylation network modules through the WGCNA R package (see Additional file 7 for more details), individually on each dataset. WGCNA computes networks by calculating each gene-to-gene pairwise correlation and interconnection strength by checking number of shared neighbors. It then finally generates modules using hierarchical clustering. Tables S1 and S2 in Additional file 7 provide a list of 10 top hub genes (genes with most significant module membership) in each module, and Additional file 9 consists of module membership values for all genes in each expression and methylation module.
After constructing modules in each dataset, we checked to see if any of the modules in one dataset were preserved in any of the modules from the other data set using two approaches: cross tabulation and network-based statistics. In cross tabulation, overlaps of the constituent genes in each pair of modules from the two data sets were calculated and Fisher’s exact test was used to assign a p value to each overlap. In the second method, we used network module preservation statistics (NP) described in and implemented in  the WGCNA R package. The NP method not only assesses the significant overlap of genes, but also whether the density and connectivity patterns of modules defined in a reference data set are preserved in a test data set. We considered expression data as reference data and methylation data as test data. This NP statistic test calculates statistic values based on density and connectivity preservation within reference and test modules. From calculated statistic values, the NP test in the WGCNA package module was used to obtain two values, (1) the median rank, which is the rank for the average of the observed preservation static values and (2) a composite module preservation statistic referred to as Zsummary using a permutation test. Thus, we reported a Z summary for each expression module in the methylation modules.
Also, since WGCNA groups together highly correlated variables to generate modules, we summarized the variable profiles in each module to a single representative, i.e., the module eigengene. The module eigengene, which is defined as the first principal component of the standardized matrix containing variables in the module was used to calculate the correlation between expression and methylation within non-coding genes and 3′UTRs.
Also, since non-coding genes do not translate and do not undergo post-transcription changes (i.e., stripping 3′UTRs) with no defined UTR or a gene body region (or in other words—whole portion can be identified as UTR), we also conducted the same network analysis by including non-coding genes.
This article has been published as part of Human Genomics Volume 10 Supplement 2, 2016: From genes to systems genomics: human genomics. The full contents of the supplement are available online at http://humgenomics.biomedcentral.com/articles/supplements/volume-10-supplement-2.
We thank Drs. Andrew Smith, Kristen Taylor, Jeffrey Bryan, Michael Wang, and Joseph L. Wiemels for their helpful discussions.
This work was supported in part by the National Institute of Health (Grants CA134304 and DA025779).
- 2.Cadieux B, Ching TT, VandenBerg SR, Costello JF. Genome-wide hypomethylation in human glioblastomas associated with specific copy number alteration, methylenetetrahydrofolate reductase allele status, and increased proliferation. Cancer Res. 2006;66(17):8469–76. doi: 10.1158/0008-5472.CAN-06-1547.CrossRefPubMedGoogle Scholar
- 30.Raval A, Lucas DM, Matkovic JJ, Bennett KL, Liyanarachchi S, Young DC, et al. TWIST2 demonstrates differential methylation in immunoglobulin variable heavy chain mutated and unmutated chronic lymphocytic leukemia. J Clin Oncol Off J Am Soc Clin Oncol. 2005;23(17):3877–85. doi: 10.1200/jco.2005.02.196.CrossRefGoogle Scholar
- 38.Hogart A, Lichtenberg J, Ajay SS, Anderson S, Margulies EH, Bodine DM. Genome-wide DNA methylation profiles in hematopoietic stem and progenitor cells reveal overrepresentation of ETS transcription factor binding sites. Genome Res. 2012;22(8):1407–18. doi: 10.1101/gr.132878.111.CrossRefPubMedPubMedCentralGoogle Scholar
- 45.Tsurimoto T, Shinozaki A, Yano M, Seki M, Enomoto T. Human Werner helicase interacting protein 1 (WRNIP1) functions as a novel modulator for DNA polymerase delta. Genes to cells : devoted to molecular & cellular mechanisms. 2005;10(1):13–22. doi: 10.1111/j.1365-2443.2004.00812.x.CrossRefGoogle Scholar
- 49.Fu H, Maunakea AK, Martin MM, Huang L, Zhang Y, Ryan M, et al. Methylation of histone H3 on lysine 79 associates with a group of replication origins and helps limit DNA replication once per cell cycle. PLoS Genet. 2013;9(6), e1003542. doi: 10.1371/journal.pgen.1003542.CrossRefPubMedPubMedCentralGoogle Scholar
- 56.Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5. doi: 10.1038/nbt.1621.CrossRefPubMedPubMedCentralGoogle Scholar
- 62.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89. doi: 10.1016/j.molcel.2010.05.004.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.