CpG traffic lights are markers of regulatory regions in human genome
DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis.
We demonstrate that on the genome level a single CpG methylation can serve as a more accurate predictor of gene expression than an average promoter / gene body methylation. We define CpG traffic lights (CpG TL) as CpG dinucleotides with a significant correlation between methylation and expression of a gene nearby. CpG TL are enriched in all regulatory regions. Among all promoters, CpG TL are especially enriched in poised ones, suggesting involvement of DNA methylation in their regulation. Yet, binding of only a handful of transcription factors, such as NRF1, ETS, STAT and IRF-family members, could be regulated by direct methylation of transcription factor binding sites (TFBS) or its close proximity. For the majority of TF, an alternative scenario is more likely: methylation and inactivation of the whole regulatory element indirectly represses functional TF binding with a CpG TL being a reliable marker of such inactivation.
CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression. CpG TL methylation can be used as reliable markers of enhancer activity and gene expression in applications, e.g. in clinic where measuring DNA methylation is easier compared to directly measuring gene expression due to more stable nature of DNA.
KeywordsCpG traffic lights DNA methylation Transcription regulation Enhancers CAGE Chromatin states NRF1 ETS STAT IRF
Cap analysis of gene expression
Chromatin immunoprecipitation (ChIP) sequencing
CpG dinucleotide, 5’—C—phosphate—G—3’
- CpG BG
background CpG dinucleotide
- CpG TL
CpG traffic light
Encyclopedia of DNA elements
Functional annotation of mammalian genome 5
False discovery rate
Position weight matrice
Reads per kilobase per million
Spearman correlation coefficient
Transcription factor binding sites
Transcription start sites
genome wide bisulfite sequencing
Epigenetic regulation of gene expression has been thoroughly investigated over last decades. DNA methylation, usually in CpG context, is probably the most well-studied mechanism of epigenetic regulation. DNA methylation is linked to many normal and pathological biological processes: organism development, cell differentiation, cell identity and pluripotency maintenance (reviewed in [1, 2, 3]), aging , memory formation [5, 6], responses to environmental exposures, stress and diet [7, 8, 9]. Abnormalities in DNA methylation play an important role in various diseases, including metabolic , cardiovascular , neurodegenerative [12, 13] diseases and cancers (reviewed in ). For about a decade, DNA demethylating drugs (Decitabine, Azacytidine) are used in clinic for the treatment of acute myeloid leukemia and myelodysplastic syndrome . Recent advances in site-specific editing of DNA methylation  suggest DNA methylation as a promising target for non-invasive therapies against diseases linked to aberrant methylation.
Functionally, DNA methylation of promoter regions is tightly associated with repression of transcription initiation, while high levels of gene body methylation, on the contrary, are linked to the increased gene expression (reviewed in ). Enhancers, distant regulatory regions, that contribute to the establishment of the correct temporal and cell-type-specific gene expression pattern, have been shown to initiate transcription of short RNAs . Therefore, it is no surprise that DNA methylation also regulates the enhancer functioning [19, 20, 21, 22].
Methods based on bisulfite sequencing allow detection of single cytosine methylation. Yet, in downstream bioinformatic analysis, methylation levels of several dozens of cytosines are often averaged to increase statistical power [23, 24]. At the same time, multiple examples show that changes in methylation of a single CpG can affect transcription [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]. Recently, we have shown that methylation of particular single CpG dinucleotides are tightly linked to gene expression . We have called such positions CpG traffic lights (CpG TL) and have demonstrated a strong negative selection against them in computationally predicted transcription factor binding sites. In the current study we show enrichment of CpG TL in regulatory elements of different types: in transcription start sites (TSS), in particular, in poised promoters, as well as in enhancers and regions with active chromatin marks. Although CpG TL may regulate transcription factors, co-factors and epigenetic regulators, binding of only a handful of transcription factors could be regulated by direct methylation of a CpG TL within a transcription factor binding site (TFBS). For the majority of TF, an alternative scenario is more likely: inactivation of the whole regulatory element via DNA methylation repress TF binding indirectly; and CpG TL are reliable markers of inactivation. We believe that CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression.
CpG traffic lights detection
The number of genes with significant correlation between expression and methylation
FDR-corrected p-value (significance level)
Total number of genes, which have significant correlations between gene expression and methylation
Average methylation of promoter regions (-1000..500) (1)
Average methylation of gene bodies (+500..TTS) (2)
Methylation of CpG TL (3)
Permutation test (4)
The majority of promoter CpG TL demonstrate negative SCC, while the majority of those located in intronic regions demonstrate positive SCC, which is in line with the previous findings. Exonic CpG TL demonstrate comparable number of both positive and negative SCC with an increase in positive SCC towards gene 3’ end (Additional file 1: Figure S1). CpG TL are uniformly distributed along the genome (Manhattan plot, Additional file 1: Figure S2).
CpG traffic lights are conserved across mammals and primates
CpG traffic lights are enriched in regulatory elements
To narrow down the regulatory role of CpG TL we tested for the overlap between CpG TL and various functional genomic elements. CpG TL are enriched in the open chromatin regions (Fig. 3a) supporting the claim of their regulatory potential. In particular, they are 2-fold enriched at exact transcription start sites (Fig. 3b) determined by CAGE (Cap Analysis of Gene Expression) , as well as in all promoter types determined by chromHMM , including active, bivalent, and poised promoters but not in the regions of transcription elongation (Fig. 3g). Interestingly, the strongest enrichment was observed in poised promoters (>3.5 fold). Since the poised or bivalent chromatin is thought to be able to easily switch between active and repressed states , such enrichment may suggest a contribution of CpG TL to the maintenance of the bivalent state of the chromatin.
CpG traffic lights are enriched in regulatory genes but avoid the majority of transcription factor binding sites
Enrichment of CpG TL in regulatory genes
# genes of in the annotation
# genes with CpG TL
# genes expected
NRF1 binding sites
Despite the observation that overall TFBS do not co-locate with CpG TL, binding sites of NRF1 (Fig. 5a, d) — a transcription factor involved into activation of key metabolic genes — are enriched in CpG TL even when overall enrichment for regulatory regions is taken into account (see “Methods”). Interestingly, core CpG positions of NRF1 binding sites are the most enriched with CpG TL supporting their functional importance for NRF1 binding (Fig. 5e). Being in line with the previous findings , these observations imply that NRF1 may be one of the very few TF whose binding may be directly regulated by DNA methylation.
ETS-family binding sites
Exact binding sites of GABPA (ETS-motif binding TF) and their close proximity (50bp) are 1.3-folds enriched in CpG TL (Fig. 5j, k). The strongest enrichment is observed in C neighboring the core GGAA box. In vitro binding data (HT-SELEX and Methyl-SELEX)  show that methylated C is less frequent in this position (Fig. 5l). Similar CpG TL enrichment was observed for binding sites of another members of ETS-familty: SPIB (Fig. 5f, g) and ETV1 (Additional file 1: Figure S3a-c). Binding of ETS-family members might be directly affected by DNA methylation, yet enrichment of the CpG TL in the closest proximity also supports the hypothesis of the indirect effect of regional methylation.
STAT-family and IRF-family binding sites
Surprisingly, such GA-rich motifs as those bound by STAT1,2,4 and IRF1,4 are also enriched in CpG TL but in their weak positions and in close proximity to the TFBS (Fig. 5h, i, m, n, Additional file 1: Figure S3d-k). In vitro binding data for IRF4 (HT-SELEX and Methyl-SELEX) shows an avoidance of methylated C in this motif position (Fig. 5o). Since the enrichment in CpG TL is observed only in weak motif positions we speculate that binding of the TF from STAT- and IRF-families is indirectly affected by methylation of the whole regulatory region.
In this work we demonstrate that methylation profiles of single CpG dinucleotides (CpG TL) more often significantly correlate with gene expression as compared to average promoter / gene body methylation. It is a surprising observation, since it is widely accepted that DNA methyltransferases once bound to DNA move along it  or multimerize  methylating all neighboring CpG dinucleotides unless a boundary protein, such as Sp1, is reached (reviewed in ). Yet, only a small fraction of CpG TL are co-located within the promoter (or body) of the same gene. We speculate that local change in DNA methylation could be achieved through active DNA demethylation probably with the help of TET proteins. A direct experiment with the use of CRISPR/TALEN-based technology is required to test this hypothesis.
It should be noted that our procedure of CpG TL detection based on correlation (SCC) cannot be applied to CpG dinucleotides that are fully methylated or methylated in all studied cell types. Our dataset consists of 48 cell types and does not cover the whole spectrum of human cell types. Due to this limitation, a significant fraction of regulatory CpG might be missing from our analysis. Novel data on DNA methylation and expression in various cell types will improve our understanding of CpG TL functions.
The enrichment of CpG TL in enhancers, in particular in hematopoietic enhancers, is in line with the recent reports that DNA methylatransferases DNMT3a/b can bind enhancers and regulate the enhancer RNA production in hematopoietic cells . Also, distal regulatory regions can initiate transcription themselves, being in turn regulated by DNA methylation , contributing to the similarity of TSS and enhancers in terms of CpG TL enrichment.
Previously, it has been reported that NRF1 binding is directly regulated by DNA methylation . In our work we demonstrate that such regulated binding is functional and regulate corresponding gene expression at least in some cases when NRF1 TFBS harbor a CpG TL. We also observed the enrichment of CpG TL in the close proximity to the ETS-, STAT- and IRF-family motifs hits. Interestingly, the majority of TF from these families are involved in hemapoietic regulation being in line with the strong enrichment of CpG TL in hematopoietic enhancers. These observations support the importance of the enhancer methylation in the regulation of the hematopoetic cells.
In the light of over-representation in regulatory regions, lack of enrichment of CpG TL within the majority of TFBS is puzzling. We can see several possible explanations. CpG TL may target unknown TFBS, although we believe that this scenario is unlikely. It was previously shown that almost all novel motifs obtained from regulatory regions correspond to known families of TFBS [46, 54, 55]. Furthermore, the HOCOMOCO v11 collection covers almost all structural families of transcription factors, except for the zinc finger family. Among those, there might be some important isolated cases enriched with CpG TL but their contribution to the overall picture is expected to be negligible. Alternatively, cytosine methylation could accumulate as a consequence of the absence of TF binding, which makes methylation of CpG TL not a primary cause, but just a “passive” marker of absent gene expression resulting from inactivation of its regulatory element. The last alternative is supported by previous works [56, 57]. More studies are needed to confirm which alternative is the most accurate. Yet, even if the “passive” marker explanation is true, CpG TL methylation could be a reliable marker of enhancer activity and gene expression, and can be used in practical applications, for example, in clinic where testing for DNA methylation is easier than testing directly for gene expression due to more stable nature of DNA.
In this work we demonstrate that CpG TL are enriched in regulatory regions, including poised/bivalent promoters and enhancers, in particular in hematopoietic enhancers. Only a handful of TFBS, such as those bound by NRF1, could be directly regulated by DNA methylation, while binding of several TF families (ETS-, STAT-, IRF-) could be affected indirectly through methylation and repression of the entire regulatory region. CpG traffic lights provide a promising insight into gene regulation linking single CpG methylation to gene expression.
DNA methylation and expression data processing
We selected 48 tissues and cell types (see Additional file 1: Table S5) for which both WGBS and RNA-seq data were available in Roadmap Epigenomics Project. For all samples sequenced with the Illumina platform read trimming and adapter removal were performed by Trimmomatic  (up to 2 mismatches between an adapter and a read sequence; 5bp sliding window; quality threshold of 20; removing sequences shorter than 20 bp after trimming). For the samples sequenced with the SOLiD platform we used Cutadapt  (up to 10% error rate relative to the length of the matching region; quality threshold of 20; removing sequences shorter than 20 bp after trimming).
We mapped WGBS data to the genome (assembly GRCh38-Ensembl 78) with Bismark  (zero mismatches in the seed, 20bp seed length, 0/500bp the min/max insert size for valid paired-end alignments). Further we consider only methylated cytosines in the CpG context, covered with not less than 4 reads on both strands. For each CpG position in every of the 48 samples, the methylation values were averaged between replicates. We removed all CpG positions if methylation values were available for less than 20 samples.
We mapped RNA-Seq data with Tophat v2.0.13  (up to 2 mismatches and 2 gaps per read, paired-end reads are reported only if both reads are mapped). We generated an expression matrix using FeatureCount , the expression profiles were normalized to RPKM values.
CpG traffic lights detection
To determine CpG TL we considered all pairs of genes and CpG located within 10000 bp upstream of TSS to 3’ gene ends. One CpG might be associated with multiple genes, similarly, one gene might be associated with multiple CpG. For each CpG-gene pair we created two k-dimensional vectors (where k =20..48) of methylation levels (beta-values, [0,1]) and gene expression (RPKM). The length of the vectors (k) varies due to the fact that WGBS does not provide uniform coverage for all genomic CpG leading to missing values in the methylation profile of many CpGs. To avoid vague correlations we did not consider the CpG positions having less than 20 defined values in the respective methylation profiles. We further refer to each of the two vectors as a methylation and expression profiles. In total we had 18,830,232 CpGs associated with 59,396 genes (in total, 25,813,295 pairs).
For each CpG position, we calculated SCC between the methylation and expression profiles for all available samples. We refereed to a CpG position as a CpG traffic light (CpG TL) if it had a significant Spearman correlation coefficient (SCC) between methylation and expression profiles at the level of FDR<0.01 (Benjamini-Hochberg correction for the total number of pairs). We found 33,276 such CpG TL (0.18% of the original number of CpGs) that corresponded to 7997 genes.
Construction of background datasets
GC content (the total number of C and G nucleotides) of the surrounding region of CpG BG must be similar to that of CpG TL. We calculated GC content in 200 bp windows centered on each CpG TL. For each such TL-centered window, we searched for another genomic CpG with the surrounding window having no more than 5% difference in GC content. For example, if there are 80 cytosines and guanines in a 200 bp window around CpG TL, we were looking for a CpG BG having from 76 to 84 cytosines and guanines in a 200 bp window.
CpG content (the total number of CG pairs) of the surrounding region of CpG BG should be similar to that of CpG TL. Again, for each CpG TL we allowed no more than 5% difference in CpG content in a 200 bp window.
CpG BG should have a similar distance to the TSS of the associated gene (while accounting for upstream or downstream location). For this purpose, we separately considered CpG TLs in [−100;TSS] and [TSS;100] distance bins by collecting CpG BG from the respective regions. For CpG TLs located farther than 100 bp from TSS, we considered log10(distance) and allowed up to 5% difference between CpG TL and its respective CpG BG. E.g. if a CpG TL is located + 1000 from a TSS, we are looking for a CpG BG located [708;1413].
A background CpG for a CpG TL with a SCC<0 (SCC>0) should also have a negative (positive) SCC with at least one of the associated genes). We repeated the selection process 50 times.
It is important to note that we did not control for the presence of a CpG island (CGI). Recently it has been shown that even methylated CpG dinucleotides within CpG islands were more conserved in primate evolution compared to methylated CpG outside the CGI . Yet algorithms for CGI search use arbitrary parameters and may not be accurate in determination of CGI boundaries . Therefore, controlling for a presence of a CGI would not necessarily reduce this bias.
We annotated all CpG positions with overlapping genomic features. For each feature we calculated the over-representation of CpG TL over CpG BG within each annotation using the exact Fisher’s test (in the total number of CpG TL and for CpG TL with positive/negative SCC separately). The following genomic annotations were tested: repeats (RepeatMasker http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz); the robust CAGE clusters ; the robust enhancers  (mapped to hg38 with the liftOver); the DNaseI hypersensitivity clusters (http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/wgEncodeRegDnaseClustered.txt.gz). Functional annotation of the enhancers was obtained from [46, 54, 55].
Evolutionary conservation and Eigen scores
Conservation of CpG TL and background sites in mammals and primates was assessed with UCSC Genome Browser GERP RS  and PhyloP  hg19 tracks, respectively. We calculated how many sites in each dataset had GERP RS score greater than 2, which we considered as conserved in mammals and PhyloP score greater than 0.5, which we considered as conserved in primates. Overall functional scores for each site were calculated with Eigen . Higher Eigen scores imply more likely functionality of respective genome sites.
Histone modifications and chromatin states
The Roadmap Epigenomics Consortium 25-state segmentation of 127 epigenomes predicted with ChromHMM [44, 66] was used to assess chromatin states co-located with CpG TL. The annotation based on the imputed data for 12 chromatin marks (H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K27ac, H4K20me1, H3K79me2, H3K36me3, H3K9me3, H3K27me3, H2A.Z, and DNaseI) was downloaded from http://egg2.wustl.edu/roadmap/web_portal/imputed.html#chr_imp. We calculated a CpG TL/CpG BG ratio for each of the 25 chromatin states in each of the 127 epigenomes and then averaged the ratios for a representation on a figure.
Additionally, to verify CpG TL enrichment in the enhancers we selected regions having H3K27ac and H3K4me1 but lacking H3K4me3 (ENCODE, averaged among all samples mapped to hg38 with pre-calculated narrowPeak available, files with major errors and warnings excluded) (Additional file 1: Table S6).
For transcription factor binding sites prediction, we used position weight matrices (PWM) of human TFs provided in full HOCOMOCO v11  collection and its default PWM thresholds according to the pre-calculated motif P-value of 0.0005 as in . In HOCOMOCO v11, the thresholds and P-value were estimated against whole-genome dinucleotide composition. However, prediction of TFBS using PWMs alone can result in a notable number of false positives. Having this in mind, out of all predicted TFBS, we considered only those located in the reproducible and control data-supported cistrome  (only A, B, and C cistrome categories) for each TF. The cistrome was constructed from the ChIP-Seq data on transcription factors provided in the GTRD database  and processed by a common pipeline involving several computational ChIP-Seq peak callers, allowing to capture binding events routinely detected in different experiments. Thus, the TFBS considered in our study, were supported both by computational sequence analysis and by experimental ChIP-Seq data.
Gene enrichment analysis
We tested if genes that harbor CpG TL were enriched in transcription factors, co-factors and epigenetic regulators using Fisher’s exact test (implemented in python library scipy.stats) with Bonferroni correction. A list of TF and co-TF was obtained from Tcof DB  and the list of epigenetic regulators was obtained from EpiFactors .
The authors are very grateful to Marina Lizio and Hideya Kawaji for their help with FANTOM5 datasets.
CpG TL detection was supported by RFBR grant 14-04-00180 to YAM. Functional analysis of CpG TL was supported by RFBR grant 17-54-80033 to YAM. AK and VBB were supported by the base research fund of the King Abdullah University of Science and Technology (KAUST). ChIP-Seq data analysis was supported by Russian Science Foundation [17-74-10188 to I.V.K.]. SELEX data analysis was supported the Program of fundamental research for state academies for 2013-2020 years (No 01201363825). These funding bodies had no role in the design of the study, collection, analysis, and interpretation of data, or in writing the manuscript.
Availability of data and materials
The datasets analysed during the current study are available in the Roadmap Epigenomics Project and FANTOM5, the links to the main datasets are in Additional file 1: Table S7 (expression) and Additional file 1: Table S8 (methylation). The cistrome data are available at Figshare .
AL contributed to data processing and performed the over-representation analysis; AK processed the raw data and contributed to data analysis; AA contributed to statistical analysis; EB performed SELEX data analysis; VR performed data analysis; VBB contributed to study design; IVK contributed to study design and TFBS analysis; YAM designed the study, contributed to statistical analysis and drafted the MS. All authors contributed to MS preparation. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Only publicly available data were used, ethics approval could be found in the cited papers.
Consent for publication
The authors claim no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Tomazou EM, Meissner A. Epigenetic regulation of pluripotency. In: Advances in Experimental Medicine and Biology. Vol 695. Boston: Springer: 2010. p. 26–40.Google Scholar
- 4.Christensen BC, Houseman EA, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL, Nelson HH, Karagas MR, Padbury JF, Bueno R, Sugarbaker DJ, Yeh R-F, Wiencke JK, Kelsey KT. Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet. 2009; 5(8):1000602.CrossRefGoogle Scholar
- 14.Baylin SB, Jones PA. Epigenetic determinants of cancer. Cold Spring Harb Perspect Biol. 2016; 8(9).Google Scholar
- 18.Kim T-K, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J, Harmin DA, Laptewicz M, Barbara-Haley K, Kuersten S, Markenscoff-Papadimitriou E, Kuhl D, Bito H, Worley PF, Kreiman G, Greenberg ME. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010; 465(7295):182–7.PubMedPubMedCentralCrossRefGoogle Scholar
- 20.Heyn H, Vidal E, Ferreira HJ, Vizoso M, Sayols S, Gomez A, Moran S, Boque-Sastre R, Guil S, Martinez-Cardus A, Lin CY, Royo R, Sanchez-Mut JV, Martinez R, Gut M, Torrents D, Orozco M, Gut I, Young RA, Esteller M. Epigenomic analysis detects aberrant super-enhancer DNA methylation in human cancer. Genome Biol. 2016;17.Google Scholar
- 21.Kozlenkov A, Roussos P, Timashpolsky A, Barbu M, Rudchenko S, Bibikova M, Klotzle B, Byne W, Lyddon R, Di Narzo AF, Hurd YL, Koonin EV, Dracheva S. Differences in DNA methylation between human neuronal and glial cells are concentrated in enhancers and non-CpG sites. Nucleic Acids Res. 2014; 42(1):109.PubMedCrossRefGoogle Scholar
- 27.Kitazawa R, Kitazawa S. Methylation status of a single CpG locus 3 bases upstream of TATA-box of receptor activator of nuclear factor-kappaB ligand (RANKL) gene promoter modulates cell- and tissue-specific RANKL expression and osteoclastogenesis. Mol Endocrinol. 2007; 21(1):148–58.PubMedCrossRefGoogle Scholar
- 28.Wang T, Li J, Ding K, Zhang L, Che Q, Sun X, Dai Y, Sun W, Bao M, Wang X, Yang L, Li Z. The CpG Dinucleotide Adjacent to a kB Site Affects NF-kB Function through Its Methylation. Int J Mol Sci. 2017;18(3).Google Scholar
- 29.Lim KH, Park ES, Kim DH, Cho KC, Kim KP, Park YK, Ahn SH, Park SH, Kim KH, Kim CW, Kang HS, Lee AR, Park S, Sim H, Won J, Seok K, You JS, Lee JH, Yi NJ, Lee KW, Suh KS, Seong BL, Kim KH. Suppression of interferon-mediated anti-HBV response by single CpG methylation in the 5’-UTR of TRIM22. Gut. 2018; 67(1):166–78.PubMedCrossRefGoogle Scholar
- 30.Claus R, Lucas DM, Stilgenbauer S, Ruppert AS, Yu L, Zucknick M, Mertens D, Buhler A, Oakes CC, Larson RA, Kay NE, Jelinek DF, Kipps TJ, Rassenti LZ, Gribben JG, Dohner H, Heerema NA, Marcucci G, Plass C, Byrd JC. Quantitative DNA methylation analysis identifies a single CpG dinucleotide important for ZAP-70 expression and predictive of prognosis in chronic lymphocytic leukemia. J Clin Oncol. 2012; 30(20):2483–91.PubMedPubMedCentralCrossRefGoogle Scholar
- 35.Ceccarelli V, Racanicchi S, Martelli MP, Nocentini G, Fettucciari K, Riccardi C, Marconi P, Di Nardo P, Grignani F, Binaglia L, Vecchini A. Eicosapentaenoic acid demethylates a single CpG that mediates expression of tumor suppressor CCAAT/enhancer-binding protein delta in U937 leukemia cells. J Biol Chem. 2011; 286(31):27092–102.PubMedPubMedCentralCrossRefGoogle Scholar
- 38.Pant V, Kurukuti S, Pugacheva E, Shamsuddin S, Mariano P, Renkawitz R, Klenova E, Lobanenkov V, Ohlsson R. Mutation of a single CTCF target site within the H19 imprinting control region leads to loss of Igf2 imprinting and complex patterns of de novo methylation upon maternal inheritance. Mol Cell Biol. 2004; 24(8):3497–504.PubMedPubMedCentralCrossRefGoogle Scholar
- 41.Pardo LM, Rizzu P, Francescatto M, Vitezic M, Leday GGR, Sanchez JS, Khamis A, Takahashi H, van de Berg WDJ, Medvedeva YA, van de Wiel MA, Daub CO, Carninci P, Heutink P. Regional differences in gene expression and promoter usage in aged human brains. Neurobiol Aging. 2013; 34(7):1825–36.PubMedCrossRefGoogle Scholar
- 46.FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest ARR, Kawaji H, Rehli M, Baillie JK, de Hoon MJL, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M, Itoh M, Andersson R, Mungall CJ, Meehan TF, Schmeier S, Bertin N, Jørgensen M, Dimont E, Arner E, Schmidl C, Schaefer U, Medvedeva YA, Plessy C, Vitezic M, Severin J, Semple CA, Ishizu Y, Young RS, Francescatto M, Alam I, Albanese D, Altschuler GM, Arakawa T, Archer JAC, Arner P, Babina M, Rennie S, Balwierz PJ, Beckhouse AG, Pradhan-Bhatt S, Blake JA, Blumenthal A, Bodega B, Bonetti A, Briggs J, Brombacher F, Burroughs AM, Califano A, Cannistraci CV, Carbajo D, Chen Y, Chierici M, Ciani Y, Clevers HC, Dalla E, Davis CA, Detmar M, Diehl AD, Dohi T, Drabløs F, Edge ASB, Edinger M, Ekwall K, Endoh M, Enomoto H, Fagiolini M, Fairbairn L, Fang H, Farach-Carson MC, Faulkner GJ, Favorov AV, Fisher ME, Frith MC, Fujita R, Fukuda S, Furlanello C, Furino M, Furusawa J-I, Geijtenbeek TB, Gibson AP, Gingeras T, Goldowitz D, Gough J, Guhl S, Guler R, Gustincich S, Ha TJ, Hamaguchi M, Hara M, Harbers M, Harshbarger J, Hasegawa A, Hasegawa Y, Hashimoto T, Herlyn M, Hitchens KJ, Ho Sui SJ, Hofmann OM, Hoof I, Hori F, Huminiecki L, Iida K, Ikawa T, Jankovic BR, Jia H, Joshi A, Jurman G, Kaczkowski B, Kai C, Kaida K, Kaiho A, Kajiyama K, Kanamori-Katayama M, Kasianov AS, Kasukawa T, Katayama S, Kato S, Kawaguchi S, Kawamoto H, Kawamura YI, Kawashima T, Kempfle JS, Kenna TJ, Kere J, Khachigian LM, Kitamura T, Klinken SP, Knox AJ, Kojima M, Kojima S, Kondo N, Koseki H, Koyasu S, Krampitz S, Kubosaki A, Kwon AT, Laros JFJ, Lee W, Lennartsson A, Li K, Lilje B, Lipovich L, Mackay-Sim A, Manabe R-I, Mar JC, Marchand B, Mathelier A, Mejhert N, Meynert A, Mizuno Y, de Lima Morais DA, Morikawa H, Morimoto M, Moro K, Motakis E, Motohashi H, Mummery CL, Murata M, Nagao-Sato S, Nakachi Y, Nakahara F, Nakamura T, Nakamura Y, Nakazato K, van Nimwegen E, Ninomiya N, Nishiyori H, Noma S, Noma S, Noazaki T, Ogishima S, Ohkura N, Ohimiya H, Ohno H, Ohshima M, Okada-Hatakeyama M, Okazaki Y, Orlando V, Ovchinnikov DA, Pain A, Passier R, Patrikakis M, Persson H, Piazza S, Prendergast JGD, Rackham OJL, Ramilowski JA, Rashid M, Ravasi T, Rizzu P, Roncador M, Roy S, Rye MB, Saijyo E, Sajantila A, Saka A, Sakaguchi S, Sakai M, Sato H, Savvi S, Saxena A, Schneider C, Schultes EA, Schulze-Tanzil GG, Schwegmann A, Sengstag T, Sheng G, Shimoji H, Shimoni Y, Shin JW, Simon C, Sugiyama D, Sugiyama T, Suzuki M, Suzuki N, Swoboda RK, ’t Hoen PAC, Tagami M, Takahashi N, Takai J, Tanaka H, Tatsukawa H, Tatum Z, Thompson M, Toyodo H, Toyoda T, Valen E, van de Wetering M, van den Berg LM, Verado R, Vijayan D, Vorontsov IE, Wasserman WW, Watanabe S, Wells CA, Winteringham LN, Wolvetang E, Wood EJ, Yamaguchi Y. A promoter-level mammalian expression atlas. Nature. 2014; 507(7493):462–70.CrossRefGoogle Scholar
- 49.Yin Y, Morgunova E, Jolma A, Kaasinen E, Sahu B, Khund-Sayeed S, Das PK, Kivioja T, Dave K, Zhong F, Nitta KR, Taipale M, Popov A, Ginno PA, Domcke S, Yan J, Schubeler D, Vinson C, Taipale J. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science. 2017;356(6337).Google Scholar
- 51.Stepper P, Kungulovski G, Jurkowska RZ, Chandra T, Krueger F, Reinhardt R, Reik W, Jeltsch A, Jurkowski TP. Efficient targeted DNA methylation with chimeric dCas9-Dnmt3a-Dnmt3L methyltransferase. Nucleic Acids Res. 2016; 45(4).Google Scholar
- 53.Sarda S, Das A, Vinson C, Hannenhalli S. Distal CpG islands can serve as alternative promoters to transcribe genes with silenced proximal-promoters. Genome Res. 2017;27(4).Google Scholar
- 54.Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S, Mungall CJ, Arner E, Baillie JK, Bertin N, Bono H, de Hoon M, Diehl AD, Dimont E, Freeman TC, Fujieda K, Hide W, Kaliyaperumal R, Katayama T, Lassmann T, Meehan TF, Nishikata K, Ono H, Rehli M, Sandelin A, Schultes EA, ‘t Hoen PA, Tatum Z, Thompson M, Toyoda T, Wright DW, Daub CO, Itoh M, Carninci P, Hayashizaki Y, Forrest AR, Kawaji H, the FANTOM consortium. Gateways to the fantom5 promoter level mammalian expression atlas. Genome Biol. 2015; 16(1):22.PubMedPubMedCentralCrossRefGoogle Scholar
- 55.Lizio M, Harshbarger J, Abugessaisa I, Noguchi S, Kondo A, Severin J, Mungall C, Arenillas D, Mathelier A, Medvedeva YA, Lennartsson A, Drabl?s F, Ramilowski JA, Rackham O, Gough J, Andersson R, Sandelin A, Ienasescu H, Ono H, Bono H, Hayashizaki Y, Carninci P, Forrest AR, Kasukawa T, Kawaji H. Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals. Nucleic Acids Res. 2017; 45(D1):737–43.Google Scholar
- 56.Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, John S, Sandstrom R, Bates D, Boatman L, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee B-K, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Qu H, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Yan Y, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA. The accessible chromatin landscape of the human genome. Nature. 2012; 489(7414):75–82.PubMedPubMedCentralCrossRefGoogle Scholar
- 64.Medvedeva YA. Algorithms for CpG islands search: New advantages and old problems. In: Bioinformatics - Trends and Methodologies. London: IntechOpen Limited: 2011.Google Scholar
- 65.Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K, Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jørgensen M, Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y, Shimizu Y, Furuhata E, Maeda S, Negishi Y, Mungall CJ, Meehan TF, Lassmann T, Itoh M, Kawaji H, Kondo N, Kawai J, Lennartsson A, Daub CO, Heutink P, Hume DA, Jensen TH, Suzuki H, Hayashizaki Y, Müller F, FANTOM Consortium, Forrest ARR, Carninci P, Rehli M, Sandelin A. An atlas of active enhancers across human cell types and tissues. Nature. 2014; 507(7493):455–61.PubMedPubMedCentralCrossRefGoogle Scholar
- 66.Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu Y-C, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R, Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh K-H, Feizi S, Karlic R, Kim A-R, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V, Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA, Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, De Jager PL, Farnham PJ, Fisher SJ, Haussler D, Jones SJM, Li W, Marra MA, McManus MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai L-H, Wang W, Waterland RA, Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, Kellis M. Integrative analysis of 111 reference human epigenomes. Nature. 2015; 518(7539):317–30.PubMedPubMedCentralCrossRefGoogle Scholar
- 73.Vorontsov IE, Fedorova AD, Yevshin IS, Sharipov RN, Kolpakov FA, Makeev VJ, Kulakovskiy IV. Human and mouse cistromes: genomic maps of putative cis-regulatory regions bound by transcription factors. 2018. https://doi.org/10.6084/m9.figshare.7087697.v1.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.