Dichotomy in redundant enhancers points to presence of initiators of gene regulation
The regulatory landscape of a gene locus often consists of several functionally redundant enhancers establishing phenotypic robustness and evolutionary stability of its regulatory program. However, it is unclear what mechanisms are employed by redundant enhancers to cooperatively orchestrate gene expression.
By comparing redundant enhancers to single enhancers (enhancers present in a single copy in a gene locus), we observed that the DNA sequence encryption differs between these two classes of enhancers, suggesting a difference in their regulatory mechanisms. Initiator enhancers, which are a subset of redundant enhancers and show similar sequence encryption to single enhancers, differ from the rest of redundant enhancers in their sequence encryption, evolutionary conservation and proximity to target genes. Genes hosting initiator enhancers in their loci feature elevated levels of expression. Initiator enhancers show a high level of 3D chromatin contacts with both transcription start sites and regular enhancers, suggesting their roles as primary activators and intermediate catalysts of gene expression, through which the regulatory signals of redundant enhancers are propagated to the target genes. In addition, GWAS and eQTLs variants are significantly enriched in initiator enhancers compared to redundant enhancers, suggesting a key functional role these sequences play in gene regulation.
The specific characteristics and widespread abundance of initiator enhancers advocate for a possible universal hierarchical mechanism of tissue-specific gene regulation involving multiple redundant enhancers acting through initiator enhancers.
KeywordsRedundant enhancers Gene regulation
The area under the precision-recall curve
The area under the receiver operating characteristic curve
Chromatin immunoprecipitation experiments followed by sequencing
DNase I hypersensitive sites
Encyclopedia of DNA Elements
Expression quantitative trait loci
Genome-wide association study
Support vector machine
Topologically associating domain
Transcription factor binding sites
Transcription start sites
Gene regulatory elements such as enhancers establish a spatio-temporal pattern of gene expression in human and other vertebrate genomes. A single vertebrate gene is commonly surrounded by an array of redundant enhancers which often function additively and create a distal, multi-tissue pattern of gene regulation . Multiple redundant enhancers have been identified in the human and mouse genomes and this redundancy acts as not only a regulatory buffer, which prevents deleterious phenotypic effects upon individual enhancer loss, but also as fine-tuning of gene expression [2, 3]. Shadow enhancers, which were originally found in the early Drosophila embryo, are located further away from the target gene and ensure a robust activity matching the primary enhancer . They were reported to be pervasive with one to five copies in more than 60% of examined loci, so that there is no obvious phenotypic changes if one of them is deleted . Large gene loci, which contain multiple non-coding functional elements, such as redundant enhancers, tend to be tissue-specific , while housekeeping genes tend to be shorter and experience selective pressure towards compactness . In addition, a recent study also showed that mammalian housekeeping genes, which evolve more slowly than tissue-specific genes , also contain fewer enhancers per gene . This variation in locus length may cause bias in functional inference for non-coding elements using gene annotation databases . Although enhancers are frequently located far from their associated genes [10, 11] or sometimes act over an unaffected intermediate gene , the proximity between enhancers and transcription start sites (TSSs) of their target genes is critical and reflected in an exponential decay of enhancer-promoter interactions with the increase of the distance . Recent studies of 3D chromatin contact mapping allowed a high resolution profiling of interactions between enhancers and their distantly regulated genes [14, 15], which revealed a hierarchical structure and hub enhancers in a subset of super-enhancers with distinct roles in chromatin organization and gene activation .
Tissue-specificity of gene transcription is associated with sequence encryption of enhancers and promoters, as this sequence encryption is reflective of the binding sites of transcription factors (TFs) regulating the target gene and is independent of the distance and orientation between enhancers and genes . Genomic variants in these binding sites might impact and even deactivate enhancer activity in gene regulation , which in turn could lead to a disease or disorder . Enhancers that recapitulate tissue-specific gene expression patterns are of continuous interest and various experimental protocols were introduced to predict the activity of tissue-specific enhancers, including chromatin immunoprecipitation sequencing (ChIP-seq) of histone modifications and TFs [1, 20, 21, 22, 23]. Using machine learning algorithms such as support vector machines (SVMs) or deep neural networks, one can explore key sequence features and predict enhancers based on the series of consecutive or gapped nucleotides (k-mers) or the TF binding sites (TFBSs) [18, 24, 25, 26, 27]. Although the machine learning methods have been used for genome-wide prediction of shadow enhancers , they haven’t been used to classify and compare single locus enhancers with redundant enhancers. The loss and gain of single locus enhancers has pronounced effects on the regulatory activity of corresponding genes , while the effects of loss of redundant enhancers can be buffered by their duplicates, suggesting that these two enhancer classes might be regulated differently.
We performed a genomic analysis of single and redundant enhancers across nine human tissues and cell lines. We observed that the DNA sequence encryption of single enhancers is distinct from that of redundant enhancers active in the same tissue. This observation allowed us to develop an accurate sequence classifier and identify a set of redundant enhancers, named initiator enhancers, featuring sequence encryption similar to single enhancers. Our results show that single and initiator enhancers are located closer to the nearest TSS and are more evolutionarily conserved than other redundant enhancers. We also demonstrate that initiator enhancers form more chromatin contacts with both nearby TSSs and enhancers, indicating that they may act as primary activators of gene transcription and as intermediate elements establishing regulatory activities between distal enhancers and their target genes. The functional importance of initiator enhancers is further confirmed by overabundance of Genome-wide association study (GWAS) and expression quantitative trait loci (eQTLs) variants within their sequences and an elevated expression level of genes regulated by initiator enhancers.
Definition of single and redundant enhancers
The performance of classifiers and the fractions of three categories of enhancers for nine tissues used in this study
number of enhancers
IMR90 Fetal Lung Fibroblasts
Brain Hippocampus Middle
HepG2 Hepatocellular Carcinoma
HMEC Mammary Epithelial
HUVEC Umbilical Vein Endothelial
Each gene locus was defined as a region that extends from the current gene to the nearest gene in both directions along the genome, which results in a pair of neighboring gene loci overlapping each other. A candidate enhancer was denoted as a single enhancer when it was 1) a single intronic enhancer associated with the host gene or 2) a single intergenic enhancer for both flanking genes. If there were multiple enhancers located in a gene locus, all of them were categorized as redundant enhancers. Finally, an enhancer was defined as a 400 bps long DNA segment, represented by an extension of 200 bps in both directions along the genome from the central position of the candidate enhancer. For all tissue-specific active enhancers, only those containing less than 30% repetitive sequences were retained in our study to ensure a reliable sequence-based analysis.
Training of classifier and predicting initiator enhancers
We first characterized enhancer DNA sequences by their density of all 6-mers. Given a DNA sequence, the density of a 6-mer was calculated as the occurrence of the 6-mer divided by the length of the non-repetitive part of that sequence. Based on these sequence features, we built support vector machine (SVM) models to identify single enhancers from the genomic background, and later to separate initiator enhancers from regular enhancers. Our SVM models used LIBSVM  with a Gaussian kernel (svm-train -t 2 -b 1 -w1 5 -w-1 1). The single enhancers with the top 25% strongest signal (averaging the total strength of overlapped peaks in that enhancer region) were selected as positive training samples. We generated five control sequences by randomly sampling the human genome sequences and matching the length and repeat-content to each enhancer sequence from the positive set. We also excluded all candidate enhancer regions in the corresponding tissue, transcribed enhancers reported in CAGE  and VISTA enhancers  from our control sequence generation. We used a five-fold cross validation to evaluate the performance of our classifiers. We applied the classifier to redundant enhancers to predict those initiator enhancers which feature the same sequence encryption as single enhancers, with a False Positive Rate (FPR) of 5%.
Proximity to TSSs and evolutionary conservation
The central point of an enhancer was used to represent the position of this enhancer for calculating distances and Hi-C contacts. The TSS for an intronic enhancer was the TSS of the host gene, while the nearest TSS of an intergenic enhancer was defined as the closest TSS of its two neighboring genes. We evaluated the phastCons alignment score  at the nucleotide level and the average score for each enhancer sequence was calculated. The phastCons 46way placental wig files were downloaded from the UCSC genome browser  and only non-repetitive regions of the enhancers were evaluated. The background conservation data are based on 10x random genomic regions located at the same distance to randomly selected genes as the corresponding enhancers in each class to their nearest genes to control for distance to the TSS.
Hi-C data from six human cell lines with a 5 kilobases (kb) resolution (IMR90, GM12878, HMEC, NHEK, K562 and HUVEC) were retrieved from Rao’s work (GSE6352) . Knight-Ruiz Matrix Balancing (KR)  and Benjamini-Hochberg FDR controls  were used to correct for the multiple testing hypotheses (FDR rate = 0.1), as suggested in Rao’s work. Chromatin contacts longer than 1 megabase (Mb) were not considered. The background count of chromatin contacts is based on 10x randomly selected pairs of genomic regions located at the same distance as the distance from the corresponding enhancer to its target (to control for distance effects).
TFBSs enrichment and histone mark signal intensities in different classes of enhancers
We took advantage of the available ChIP-seq TFBS data to calculate the TFBS enrichment of enhancers in HepG2, GM12878 and K562 cell lines [29, 32]. In the corresponding cell line, we compared the TFBSs enrichment between single (positive set) and redundant (control set) enhancers, and between initiator (positive set) and regular (control set) enhancers, respectively. The number of overlapping regions between positive or control sequences and ChIP-seq peaks of a particular TFBS was added and averaged by the total length of either positive or control sequences, respectively, to compute the frequency of TFBSs. Fold-enrichment of a TFBS was then computed as a ratio of its frequency in the positive set to that in the control set. A p-value was calculated using the Fisher’s exact test and only TFBSs with the p-value < 0.05 and fold enrichment > 1.5 were included into the analysis. Similarly, for the analysis of a particular histone mark, the signal intensities of overlapping ChIP-seq peaks were averaged by the number of enhancers in both positive and control sets, followed by a fold-enrichment and p-value calculation.
Density of GWAS and eQTLs variants
The GWAS Catalog data were downloaded from NHGRI-EBI  and GTEx eQTLs v7 data were obtained from the GTEx Portal (www.gtexportal.org) for the variant density analysis. The density of variants was calculated as the number of variants falling into genomic regions occupied by enhancers from a particular class over the total number of enhancers in that class.
Sequence classification of single, initiator and regular enhancers
Although widespread redundant enhancers have been previously reported in many comprehensive studies and linked to phenotypic robustness [2, 3, 4, 5], the mechanisms and evolutionary stability of the single enhancer regulatory programs remain to be studied in detail . In this study, we focused on comparing and contrasting single and redundant enhancers, and the regulatory mechanisms employed by them. We selected nine human tissues and cell lines for this analysis and refer to these tissues and cell lines as tissues for simplicity (See Methods). Among all these tissues, IMR90 and the right ventricle have the largest number of enhancers (over 83,000), while HepG2 contains the smallest number of enhancers (about 26,000). The percentage of single enhancers among all enhancers in a particular tissue ranges from 1.2% in IMR90 to 5.6% in HepG2, with an average of 3.5% (Table 1). On average, we observed that 38% of gene loci contain two or more enhancers, 15% of loci contain a single enhancer and the remaining 47% of loci have no enhancers and these percentages vary across different tissues. In addition, 7% of gene loci have more than 10 enhancers in the same locus, with the maximum of 14% for IMR90 and the minimum of 3% for HepG2 and K562, respectively, suggesting a non-negligible amount of gene loci packed densely with enhancers (Additional file 1: Figure S1).
Single and initiator enhancers are closer to genes and more evolutionarily conserved than regular enhancers
To explore functional characteristics of the three classes of enhancers, we first compared their gene ontology (GO) enrichment as quantified using the tool named GREAT , for the right ventricle and HepG2—two tissues involved in distinct biological pathways. Our results show that in both tissues single enhancers are mainly involved in metabolic, biosynthetic and catabolic functions, which are associated with housekeeping genes. Redundant enhancers, however, are more tissue-specific and are associated with multiple cell development and differentiation processes (Additional file 1: Figure S3). For example, in the right ventricle, the genes proximal to redundant enhancers are related to the mechanistic and response functions of the heart, such as regulation of heart contraction, response to oxygen levels, response to hypoxia, regulation of cardiac muscle contraction and striated muscle cell development. About 15 processes are directly related to cardiac functions, while the rest are related to energy, kinase activity, signaling pathway, carbohydrate and glucose metabolic processes (Additional file 1: Fig. S3A). In HepG2, the functions of genes associated with redundant enhancers include liver functions, such as liver development, hepaticobiliary system development, and metabolic processes of alcohol, phospholipid, lipid, steroid, glycerophospholipid, cholesterol, glucose (Additional file 1: Fig. S3B). We didn’t observe a noticeable difference between initiator and regular enhancers in their associated biological processes as this GO analysis is based on flanking genes, while initiator and redundant enhancers are flanking the same genes by definition (with the exception of some loci containing only redundant enhancers that miss initiator enhancers). The fact that single enhancers are highly associated with housekeeping genes and involved in similar fundamental biological processes across different tissues suggests their indispensable roles in regulatory activities.
In general, regulatory elements involved in similar biological functions and pathways tend to experience a similar selective pressure . As single enhancers are associated with similar biological processes across different tissues, populate compact gene loci and establish transcriptional regulation of a target gene lacking a functional backup due to the absence of redundant enhancers, we speculated that they are evolving under a stronger evolutionary constraint. To assess selective constraints acting on the three classes of enhancers, we used the phastCons evolutionary conservation scores derived from 46 placental mammal sequence alignments . For 8/9 tissues, single and initiator enhancers are significantly more conserved than regular enhancers (Fig. 3b, Additional file 1: Figure S6). In the case of HepG2, the difference of conservation levels between initiator and regular enhancers are small and not that significant (p-value = 0.37), which might be caused by its low performance classifier noted previously. Across all tissues, single enhancers have the highest average conservation score, followed by initiator and regular enhancers. The strongest sequence constraint on single enhancers suggests their indispensability in gene regulation and is consistent with the stronger evolutionary constraint of their potential target genes, the housekeeping genes, which evolve slower than tissue-specific genes . Initiator enhancers, which demonstrate a significantly higher level of sequence conservation than regular enhancers (p-value < 2.2 × 10− 16, Wilcoxon rank sum test), are likely to play an important role in regulation of tissue-specific genes and to be supported by secondary (regular) enhancers that results in the establishment of a complex regulatory profile of gene expression.
Initiator enhancers feature chromatin contacts with both promoters and regular enhancers
The difference in average contact numbers among the three classes of enhancers suggests different gene regulatory modes for each class: 1) single enhancers have a high level of direct interactions with nearby genes but fewer interactions with other enhancers, reflecting their self-sustainable gene regulatory activity; 2) initiator enhancers maintain a high level of contacts with both nearby genes and other enhancers, indicating their central position in enhancer networks and a critical role of acting directly on their target genes and propagating regulatory signals of regular enhancers; 3) regular enhancers, which represent the majority of all enhancers, form a high level of enhancer-enhancer interactions but a relatively low level of direct enhancer-TSS interactions. We also observed that initiator enhancers maintain a significantly larger number of both enhancer-promoter and enhancer-enhancer contacts across different topologically associating domain (TAD) regions than regular enhancers (Additional file 1: Figure S8), revealing an ability of initiator enhancers to partake in distal gene regulation and to connect regular enhancers to their distal target genes.
Our analysis shows that enhancer clusters formed by regular enhancers are strongly dependent on the presence of intermediate initiator enhancers connecting them and their target genes. In support of this hypothesis of a general regulatory signal propagation through initiator enhancers, we observed a 2.0-fold enrichment of enhancer-TSS contacts for the half of redundant enhancers closest to the nearest TSS within 1 Mb distance cutoff versus the more distant half (p-value < 2.2 × 10− 16, Wilcoxon rank sum test) (Fig. 4b, Additional file 1: Figure S9). This is consistent with the previous analysis of proximity showing that initiator enhancers are located much closer to the nearest TSS than redundant enhancers. Although this hierarchical structure of enhancer collaboration has already been observed in super-enhancers [16, 62], according to our results, this mechanism of signal propagation from distant regular enhancers to the target genes through the intermediate initiator enhancers might be a common rule for gene regulation rather than being limited to super-enhancers. Additionally, among all the enhancers from each class that maintain chromatin contacts, on average 85% of single enhancers form interactions with nearby TSSs while this fraction decreases to 64% for initiator and 55% for regular enhancers, suggesting that a major role of single enhancers is in activating gene regulation directly. Meanwhile, a much larger fraction of initiator and regular enhancers than single enhancers maintains interactions with nearby enhancers (Additional file 1: Figure S10A, B). In concordance with these observations, the fraction of initiator enhancers interacting with both TSSs and other enhancers is the highest among all three classes. For the CTCF and cohesin factors RAD21 and SMC3, which are important for forming 3D genomic structures, their relative enrichment is much higher in initiator than regular enhancers (Additional file 1: Figures S2B and S10C). In addition to the higher level of enrichment of looping factors, the overall higher enrichment of TFBSs in initiator enhancers than single and regular enhancers may also indicate their role in contacting both promoter and regular enhancers through involved TFs. However, single and regular enhancers also show complementary ability to interact with both target genes and nearby enhancers, although at a reduced rate, which implies a complexity of the human gene regulation landscape.
Initiator enhancers are strongly associated with gene expression changes and human disease variants
After showing that initiator enhancers feature unique genomic characteristics distinguishing them from regular enhancers, we focused on their functional importance in transcriptional events. Since the epigenetic marks, including histone modifications and DNA methylation, are reflective of fundamental regulatory events [63, 64, 65, 66, 67], we quantified the enrichment of available ChIP-seq histone marks for the three classes of enhancers: contrasting single and regular enhancers and contrasting initiator and regular enhancers across different tissues, respectively (Additional file 1: Figure S11). Single enhancers demonstrate an enrichment in TSS-proximal histone marks (H3K4me2 and H3K4me3), which reflects their proximity to their target genes. Initiator enhancers, on the other hand, display an additional strong enrichment in the marks specific to active enhancers—H3K27ac and H3K4me1—when compared to regular enhancers. This further supports our finding that initiator enhancers represent the key and most active subclass of enhancers. To verify that the initiator enhancers are crucial for gene regulation and to study how their activity affects gene expression, we used RNA-seq expression data for four categories of genes neighboring different classes of enhancers: 1) connected with single but not with initiator enhancers, 2) connected with initiator but not with single enhancers, 3) connected with regular only but not with single or initiator enhancers, 4) no connections with enhancers (control set). Our results show that genes that feature Hi-C interactions with initiator enhancers have a significantly higher expression level than those connected only to regular or single enhancers, suggesting a functional importance of initiator enhancers in recruiting regular enhancers and elevating the expression level of target genes (Additional file 1: Figure S12).
Human genes usually employ multiple enhancers in their loci to establish transcription robustness and evolutionary stability. In this work, we separated tissue-specific enhancers into three classes according to the number of enhancers in the corresponding gene locus and their genomic sequence encryption. We demonstrated that each class of enhancers shows specific characteristics that are associated with their distinct roles in transcription and different gene regulatory mechanisms. Single enhancers, which represent the only enhancer existing in a gene locus, are different from redundant enhancers not only because of their lack of backup enhancers, but also because of their proximity to nearby genes and evolutionary conservation greater than in redundant enhancers, as well as GO enrichment showing their strong association with housekeeping genes. A subset of the top TFBSs enriched in single but depleted in redundant enhancers is associated with repressors. All these results suggested that single enhancers perform multiple types of regulatory activity, while in the loci of redundant enhancers these functions of enhancing and repressing of transcription are distributed between multiple enhancers and silencers. An elevated level of chromatin contacts between a single enhancer and its target TSS suggests a direct regulation of target genes by single enhancers, while a low level of contacts between them and other enhancers indicates their ability to fulfil biological functions in an independent manner.
There is a specific subclass of redundant enhancers called initiator enhancers that are different from regular enhancers based on their DNA sequence similarity to single enhancers. Initiator enhancers are located closer to the nearest genes and are more evolutionarily conserved than regular enhancers. Although the two classes of enhancers are involved in similar tissue-specific biological processes (as their loci largely overlap), they have notable differences in forming chromatin contacts with nearby genes. Initiator enhancers feature twice as many contacts with TSSs of nearby genes as regular enhancers, suggesting their role as activators of gene regulation. The fact that initiator enhancers form a large number of contacts with both genes and other enhancers makes them potential intermediate catalysts responsible for collecting transcriptional signals from a cluster of regular enhancers and transmitting these signals to target genes. Strong enrichment of GWAS and eQTL variants and an elevated level of gene expression associated with initiator enhancers also suggest their key role in gene regulation compared to regular enhancers. Although this hierarchical structure of multiple enhancers has also been observed in super-enhancers, a large fraction of the redundant enhancers in our study are not super-enhancers. For example, in K562, about 4.1% of single, 13.0% of initiator and 11.7% of regular enhancers overlap with identified super-enhancers . However, in HepG2, these fractions drop to 1.3%, 4.1% and 4.1%, respectively, suggesting that this hierarchical pattern of multiple interacting enhancers might be a common rule for gene regulation. In summary, we propose that there is a functional dichotomy in redundant enhancers. Gene regulation by regular enhancers depends on the initiator enhancers which are located closer to their target TSS and act as propagators of the regulatory signal from redundant enhancers to facilitate establishment of complex regulatory landscapes in the human genome.
In this study, we identified a subset of redundant enhancers (named initiator enhancers) with DNA sequence encryption similar to self-sufficient (single) enhancers. These initiator enhancers feature distinct genomic characteristics compared to the rest of redundant enhancers: they are proximal to their target genes, they are evolutionarily conserved and they maintain a high level of chromatin contacts. GWAS and eQTLs analyses show a key role of initiator enhancers in establishing human gene regulatory programs, and the elevated level of gene expression associated with initiator enhancers indicates their function in transcriptional activation and propagation of regulatory signals from neighbouring regular enhancers. In summary, our findings reveal the existence of a critical class of enhancers playing a key role in establishing complex regulatory networks of redundant enhancers in vertebrate species.
The authors are grateful to Irina Hashmi for her contribution to this project and are grateful to Timothy Doerr, Di Huang and Shan Li for critical comments and suggestions.
This work has been supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.
Availability of data and materials
In section of Additional files.
IO conceived and designed the study. WS performed the computational analysis. WS and IO wrote the manuscript. All authors read and approved the final manuscript
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 33.Chang C-C, C-J L. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:1-27.Google Scholar
- 38.Hochberg YBaY. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.Google Scholar
- 58.Cinghu S, Yang P, Kosak JP, Conway AE, Kumar D, Oldfield AJ, Adelman K, Jothi R. Intragenic enhancers attenuate host gene expression. Mol Cell. 2017;68(104–117):e106.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.