The first enhancer in an enhancer chain safeguards subsequent enhancer-promoter contacts from a distance
Robustness and evolutionary stability of gene expression in the human genome are established by an array of redundant enhancers.
Using Hi-C data in multiple cell lines, we report a comprehensive map of promoters and active enhancers connected by chromatin contacts, spanning 9000 enhancer chains in 4 human cell lines associated with 2600 human genes. We find that the first enhancer in a chain that directly contacts the target promoter is commonly located at a greater genomic distance from the promoter than the second enhancer in a chain, 96 kb vs. 45 kb, respectively. The first enhancer also features higher similarity to the promoter in terms of tissue specificity and higher enrichment of loop factors, suggestive of a stable primary contact with the promoter. In contrast, a chain of enhancers which connects to the target promoter through a neutral DNA segment instead of an enhancer is associated with a significant decrease in target gene expression, suggesting an important role of the first enhancer in initiating transcription using the target promoter and bridging the promoter with other regulatory elements in the locus.
The widespread chained structure of gene enhancers in humans reveals that the primary, critical enhancer is distal, commonly located further away than other enhancers. This first, distal enhancer establishes contacts with multiple regulatory elements and safeguards a complex regulatory program of its target gene.
KeywordsEnhancer chain First distal enhancer Redundant enhancers
In vertebrate genomes, multiple enhancers are commonly involved in a single gene regulatory pathway by acting additively, establishing phenotypic robustness, and fine-tuning a complex gene expression pattern [1, 2, 3]. Since enhancers frequently regulate their associated genes remotely [4, 5] and sometimes skip unaffected intermediate genes , it is challenging to identify how a set of enhancers cooperatively regulates the same gene. In addition, the genes and their associated regulatory elements in the human genome are not uniformly distributed [7, 8], such as gene deserts which contain no protein-coding sequences but harbor multiple distant regulatory elements [8, 9, 10], making it complicated but important to explore the gene regulatory mechanisms based on a panel of multiple enhancers. Recent studies have focused on the multiple closely positioned enhancers commonly forming regions known as super-enhancers (SEs) [11, 12, 13], including the long-range enhancer interactions , the hierarchical structure of enhancer networks , and the open chromatin interactions inside SEs [16, 17].
Three-dimensional (3D) chromatin conformation experiments, such as Hi-C, provide high-resolution contact information between mapped genomic regions across human tissues and cell lines, including the associations between enhancers and their target genes [18, 19, 20, 21]. The approach of chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) provides further examples of dynamic promoter-enhancer interactions by mapping interactions between genomic regions bound by specific proteins . Since the cooperative gene regulation by multiple enhancers might be related to multi-way contacts for chromatin loops [19, 23, 24], genome-wide methods such as chromosome walks (C-walks), three-way Hi-C contacts, and genome architecture mapping have been used to inform the multi-way genome aggregation of the spatial compartments in the genome [25, 26, 27]. The multi-contact 4C (MC-4C) in high resolution distinguishes cooperative from random and competing interactions to identify higher-order topological phenomena, including a group of interacting enhancers within the beta-globin SE . It has also been found that the connections formed by promoters and their contacts are dynamic and tissue-specific [28, 29]. For example, during different stages of macrophage development, the activator protein-1 (AP-1)-enriched dynamic loops form a multi-loop activation cluster to control tissue-specific transcription . However, current studies have not addressed (1) the organization of multiple enhancers in the 3D space and in the 1D genome, (2) the difference in genomic features between multiple interacting enhancers, (3) the biological function of a multi- enhancer regulatory program on gene expression, and (4) the genome-wide presence of multiple interacting enhancers that are not part of SEs.
To answer these questions, we performed a genomic analysis of interacting enhancers across multiple cell lines. We found that a chain of chromatin contacts may connect multiple enhancers to the same target promoter through a set of intermediate enhancers. The first enhancer in a chain with a direct contact to a promoter is often located distantly along the genome sequence but is close in the 3D space to its target gene and acts as an intermediate to bring other enhancers in an enhancer chain (EC) to their target promoters. These first enhancers maintain more Hi-C interactions with other enhancers and promoters, are more enriched for the loop factors CCCTC-binding factor (CTCF) and cohesin, and commonly overlap with the boundaries of chromatin loops. They also feature tissue specificity similar to their target promoters and preserve the primary interactions with the promoters across different cell lines, indicating their key role in gene activation and regulation. Binding sites of active transcription factors (TFs) are overrepresented in chained enhancers, and the level of gene expression associated with ECs is significantly elevated. In summary, we demonstrate that the gene regulatory programs established by a chain of multiple enhancers feature the first enhancer that directly contacts the promoter despite being positioned distantly from the promoter in the genome sequence and is essential in safeguarding gene regulation by maintaining the primary contact in the regulatory domain and bridging between the distant enhancers and their target genes.
The primary promoter-enhancer contact is commonly established by a distal enhancer
We constructed regulatory element networks of interactions among regulatory elements in GM12878, HMEC, HUVEC, and K562 cell lines . Similar to the ribonucleic acid polymerase II (RNAPII)-associated chromatin interaction network , all enhancers and promoters were denoted as vertices, while the significant intra-chromosome Hi-C interactions among them were denoted as edges. In total, 7374 separate networks were identified in all cell lines, and 66% of them contained at least 1 promoter. The median length of a network is 122 kb across 4 cell lines and the longest one spans 3.6 million bp (Mb) in the K562 genome. We ranked an enhancer in a regulatory element network according to its minimum number of connections to the closest promoter (i.e., E1 as a step-one enhancer and En as a step-n enhancer). Then, we defined an EC as a consecutive sequence of enhancers connected to a promoter (P) in the order of their ranks (P-E1-E2- … -En) and the length of an EC as the number of enhancers in that chain (e.g., an EC with a promoter and 2 enhancers has a length of 2, Additional file 1: Figure S1). Isolated promoter-enhancer contracts (EC of length 1) were not included in the set of ECs under consideration. This approach allowed us to break the network into a set of overlapping ECs. The potential influence of biases from Hi-C experiments on the ECs was evaluated (Additional file 1: Figure S2, see the “Methods” section). For an enhancer partaking in multiple ECs, an EC, in which that enhancer is at the closest rank from the promoter, was selected. Based on our definition, 9108 ECs were identified, with 2.4 enhancers per EC on average. Among 5616 promoters maintaining Hi-C contacts with enhancers, nearly half of them are associated with an EC (2626, 46%). Sixty-four percent of EC promoters are connected to multiple ECs, suggesting a genome-wide abundance of overlapping ECs. The charged multivesicular body protein 6 (CHMP6) gene involved in degrading surface receptors and in the biosynthesis of endosomes  is associated with the longest chain of 8 enhancers active in the GM12878 cells (Additional file 1: Figure S3A). The mitochondrial oxidase assembly protein 1 (OXA1) gene, which is related to mitochondrial adenosine triphosphate (ATP) synthase and whose mutations may cause mitochondrial encephalopathy and a combined oxidative phosphorylation defect , is associated with the largest number of 87 ECs (Additional file 1: Figure S3B and S3D). We applied an approach of flexible false discovery rates (FDR) to identify significant Hi-C interactions across 4 tissues, which balances the variation in the total number of ECs across different tissues (Additional file 1: Figure S3A-3C).
In order to explore organizing scenarios of ECs, we calculated the genomic location of each enhancer relative to the EC promoter and other enhancers in its EC. In 81% of ECs, the E1 is located farther away from the target promoter than the E2 along the genome. In addition, for E3s, 40% of them are located between the E1s and E2s and only 16% are located farther than E1s, resulting in two predominant scenarios of EC organization (Fig. 1d). Based on the most frequent scenarios, we propose a hypothesis of how an EC is built in the 3D space: (1) the E1 is distal but forms the primary chromatin contact with the target promoter and sets up a stable regulatory contact; (2) the E2, located more proximal to the promoter than the E1 along the genome, initiates a secondary interaction with the E1, bringing in additional regulatory information and stabilizing the existing regulatory contact; and (3) the following enhancers along the chain are added to the regulatory contact domain, possibly expanding the regulatory profile of the target gene. Since the alternation of the relative positions of the promoters and enhancers may lead to different 3D structures (Fig. 1d) and since the first intron of a gene has been reported to be an important component of gene regulation [34, 35], we compared cases in which EC enhancers are located upstream or downstream of promoters to investigate whether this positional preference exists in the ECs. However, we did not find a significant bias towards either of the two situations. Regardless of either the upstream or downstream position of the EC enhancers with regard to the promoters, we proposed two organization patterns for the top two most frequent situations (40% and 16% of the total cases), called scenario 1 and scenario 2 respectively in which E1 and E2 are in the same and different sides of a promoter (Fig. 1e and Additional file 1: Figure S4C).
The key reasoning behind establishing ECs was to compute the shortest distance between an enhancer and its target promoter(s) and to highlight the indirect nature of enhancer-promoter interactions in cases when an enhancer is separated from its target promoter by one or more intermediate enhancers. To further investigate the bifurcation of enhancer chains, we quantified the number of contacts between enhancers from different enhancer chains in each original regulatory network and a control set of the same network but with enhancer IDs shuffled ten times, across all four tissues. We found that, on average, about 25% of non-overlapped ECs associated with the same promoter have interactions with each other, compared to 77% in the control set, on average (p value < 2.2 × 10−16, the Wilcoxon rank-sum test). Only 18% of the EC enhancers pair with the same rank but from different ECs display inter-EC interactions, compared to 43% in the control set (p value < 2.2 × 10−16, the Wilcoxon rank-sum test) (Additional file 1: Figure S4B). These enhancer pairs also overlap both anchors of a loop with a significantly lower frequency than EC enhancer pairs (E1-E2, E2-E3, and so on), suggesting that only a few chromatin loops are formed between them (Additional file 1: Figure S4C). These relatively limited contacts between different ECs advocate for the presence of well-defined ECs and justify the selection of ECs as a backbone of our study.
The E1 performs a crucial function by connecting the promoter with the rest of the enhancers in an EC, which suggests that its function is the most competitive and essential among all enhancers in an EC. To investigate the regulatory mechanism of our proposed EC model, we next focused on the following aspects: (1) genomic characteristics of the distal E1s, (2) cooperative or competitive relationships among enhancers in an EC, and (3) the role of an E1 in an EC.
First enhancer forms a more stable chromatin loop with the target promoter than the rest of the enhancers in an EC
To further explore the characteristic features of E1s, we calculated the enrichment of transcription factor binding sites (TFBSs) in E1s using E2s as a benchmark set (using TF ChIP-seq data from GM12878 and K562 cell lines). Our results show that loop factors are the top enriched TFs in both cell lines, suggesting a DNA sequence composition of E1s predisposing them to the formation of chromatin loops by attracting specific factors (Fig. 2b). For example, in E1s of the GM12878 cell line, the density of TFBSs for CTCF and cohesin components, RAD21, and SMC3, is 1.6-, 1.4-, and 1.4-fold higher, respectively, than that in E2s (p value < 1.0 × 10−4 using the Fisher’s exact test), and a similar trend was observed in the K562 cell line. In agreement with their highly enriched loop factors, 32% of the E1s are located at the loop boundaries in the GM12878 cell line, which is 2.0-fold higher than the fraction of E2s (p value < 1.0 × 10−70, using the binomial test), indicating stable loop structures formed by them with both target promoters and the E2s (Fig. 2c). We also observed that E1s maintain a larger portion of both enhancer-promoter and enhancer-enhancer contacts across different sub-topologically associating domains (sub-TADs) than the rest of the enhancers in ECs (Fig. 2d), implying their ability to partake in distal gene regulation and to connect multiple EC enhancers to their distal target genes. However, enhancer chains largely do not cross TAD borders (98% of enhancer chains are located within a single TAD, Additional file 1: Figure S5A). Our analysis, based on the loop factors and sub-TADs, reveals an inherent ability of E1s to form stable and essential interactions with their target promoters in a gene regulatory network (GRN) through chromatin restructuring.
First enhancers establish and maintain the essential baseline in gene regulation
It has been found that chromatin interactions are highly dynamic and linage-specific, and the active enhancers that interact with promoters might mirror the tissue specificity of their associated genes . Since an E1 is the only enhancer that directly interacts with the target promoter in an EC, its tissue specificity should be highly correlated with the tissue specificity of the target gene. To verify this, we calculated the tissue specificity similarity between the promoter and each of the enhancers along an EC. We measured their co-activities using H3K27ac peaks and computed the fraction of tissues in which each element in a pair is active (see the “Methods” section). Our results show that for cases when promoters have only one E1, E1 enhancer has an elevated tissue specificity similarity to its associated promoter compared to other EC enhancers, although this difference only reaches statistical significance for two out of four tissues (Additional file 1: Figure S5C). This finding suggests a possible role of ECs, especially E1s, in establishing the baseline tissue specificity of their target genes. However, in the case when a promoter is in contact with multiple ECs, the trend effectively disappears (Additional file 1: Figure S5D), suggesting a more complex regulatory paradigm in loci with multiple ECs regulating the same gene. Since the overall Jaccard index is low, we calculated the tissue specificity of each promoter and its associated E1 and showed that the promoters are nearly ubiquitously active across tissues and, thus, less tissue specific. By contrast, the chain enhancers, especial E1s, are very tissue-specific (Additional file 1: Figure S5E). This difference in tissue specificity between a promoter and its chain enhancers leads to a small number of common active tissues shared by them, which results in the overall small value of the Jaccard index. In addition, target promoters and E1s contain similar sets of enriched TFBSs, which might contribute to coordinated transcriptional activities and formation of a contact domain through multiple common DNA-binding proteins (Fig. 3c, Additional file 1: Figure S5B).
The observation of E1s and their target promoters being highly correlated is further confirmed by the fraction of enhancers that hold the same corresponding position in an EC across different tissues (Fig. 3d). For those chained enhancers in one tissue, E1s most likely remain as E1s (9.5%, p value < 10−10, the binomial test) instead of changing to either E2s (3.4%) or E3s in another tissue (1.6%, p value < 0.001, the binomial test; disregarding cases in which an enhancer does not overlap an H3K27ac mark or is not part of a chain). Similarly, E2s in one tissue tend to remain E2s (6.1%, p value < 10−10, the binomial test) again in a different tissue rather than changing to either E1s (2.8%) or E3s (1.0%). This trend reveals that E1s maintain their function in building connections with both the target promoters and E2s. The overall largest fraction of 14.5% in E1s remaining in an EC across tissues clearly suggests that they are active in more tissues and are less tissue-specific than the rest of enhancers in an EC.
To further explore the role of step-one enhancers in transcriptional events, we examined the overlap between EC enhancers and the human expression quantitative trait loci (eQTL) variants. Our result shows that the average density of tissue-specific eQTL variants is significantly higher in E1s than that in other enhancers from an EC (Fig. 3e), which validates and strengthens our previous results that E1s maintain a crucial function in gene regulation and that mutations of their sequence are likely to lead to a change in target gene expression.
The target genes expression depends strongly on the presence and number of ECs in the locus
Second, the number of associated ECs and the level of gene expression are positively correlated, suggesting additive effects of multiple ECs on gene regulation (Fig. 4b). Third, we found that another important function of the ECs is to link separate genes and coordinate their expression. A group of genes that are connected by the same EC is more likely to be co-expressed than the random set of genes within the same distance, suggesting that different sets of regulatory information might be retrieved from the shared EC enhancers to regulate different genes (Fig. 4c). To address the value of regulatory information attributed to ECs, we compared the chain enhancers with a set of non-chain enhancers (an enhancer that does not belong to an EC) located in the same locus in respect to their enriched TFBSs (Fig. 4d). Active TFs are overrepresented specifically in chain enhancers (p value < 0.05, Fisher’s exact test), with an especially strong enrichment of loop factors (p value < 1.0 × 10−11, Fisher’s exact test), revealing that ECs account for a large fraction of DNA-binding regulatory events through a higher order of genomic conformations to establish a complex regulatory program of their target genes. The intensity of the epigenetic marks, including histone modifications and DNA methylation marks, which are reflective of fundamental regulatory events, was quantified next by contrasting chain and non-chain enhancers based on available ChIP-seq data (Fig. 4e). Chain enhancers display a high intensity of the H3K27ac mark of active enhancers and the H3K9ac mark of transcriptionally active genes, further supporting the crucial role of ECs in gene activation and regulation.
However, about 23% of all ECs are shared by more than one promoter, in which an enhancer is the E1 for multiple promoters. The fraction of enhancers in an EC associated with multiple promoters among all connected enhancers in the regulatory element network before the application of single ranking is 0.39, 0.13, 0.12, and 0.39 in GM12878, HMEC, HUVEC, and K562, respectively. After a single ranking, these enhancers became categorized as EC enhancers. To investigate the influence of this set of ECs on our results above, we partitioned all ECs into two separate sets: E1s associated with only one promoter (1P-E1) and E1s associated with multiple promoters (MP-E1). We compared the major genomic features between these two EC sets. The distance to the target promoter and the number of connected regulatory elements maintain similar trends between the two sets and are in agreement with the patterns for all ECs (Fig. 4g, h). Interestingly, the E1s that connect to multiple promoters have significantly more tissue-specific eQTLs than the E1s connecting to only one promoter, suggesting their critical role in regulating multiple genes (Fig. 4i). Notably, the expression levels of the genes sharing ECs are lower than those of genes with only a single EC (Fig. 4j), although the multiple genes contacting the same EC are more likely to be co-expressed (Fig. 4c), suggesting that the existence of competing target genes linked to the same EC might partially reduce the expression of each gene.
We considered two possible regulatory modes of an EC: (1) each chain enhancer has a unique, independent function, thus creating a multi-functional regulatory landscape, or (2) chain enhancers are largely redundant and all together establish a stable, but narrow, regulatory program. The first mode suggests propagation of different regulatory signals through an EC, and the second mode suggests amplification of largely the same regulatory signal along an EC. To investigate which mode is the primary mode employed by ECs, we used TFBS enrichment as a proxy of regulatory specificity and compared it among the sets of enhancers along an EC, using background DNase regions in the corresponding cell lines as the control set (Additional file 1: Figure S6D and S6E). We observed that enriched TFBSs in different chain enhancers are very similar to each other, indicating a consistency of the regulatory signal along an EC and advocating for the second, redundant regulatory mode being characteristic of ECs.
This redundancy of chain enhancers may possibly contribute to not only the fine-tuning of the target gene expression but also the coordinated expression of multiple genes connected by the same EC. Since enhancers from the same chain are highly redundant and might be involved in 3D contacts with the target promoter (Fig. 1e, Additional file 1: Figure S4C, S6D, and S6E), it is possible that any part of an EC might be sufficient for establishing and safeguarding expression of the target gene. To address this hypothesis, we focused on those genes with promoters that are connected to an EC only through a neutral DNA segment, i.e., a non-active enhancer, instead of a direct enhancer-promoter connection (Additional file 1: Figure S7A). The level of gene expression associated with these indirectly connected ECs is significantly lower than those associated with either a single enhancer or directly connected ECs, revealing an essential function of E1s in transcription and their potential role in passing regulatory signals from distal enhancers to the target promoter (Fig. 4A). In a previous study based on sequence encryption, we have identified a set of so-called initiator enhancers, which function as primary activators and intermediate catalysts of gene expression by propagating the regulatory signals of redundant enhancers to the target genes . We found that initiator enhancers are significantly overrepresented among the chain enhancers as compared to the non-chain enhancers, which indicates a contribution of initiator enhancers in recruiting multiple enhancers for gene regulation through an EC structure (Additional file 1: Figure S7B). In addition, the highest enrichment of initiator enhancers was observed in E1s, which further confirms a critical function in transcriptional initiation maintained by E1s (Fig. 4f). Our results demonstrate that one possible mechanism of gene regulation by multiple enhancers is the hierarchical chain structure involving a primary contact of E1s and likely the complementary effects of redundant enhancers (Additional file 1: Figure S6D and S6E).
Finally, to explore the hypothesis that non-E1 enhancers are irrelevant to the gene regulation and simply represent open chromatin regions or false-positive enhancer predictions, we compared the density of tissue-specific eQTLs in E1, non-E1, and single and non-chain enhancers (Additional file 1: Figure S7C). The mutations in E1s have the most pronounced impact on the level of target gene expression according to the eQTL data profiled across four cell types. The eQTL density in all categories of enhancers is non-negligible, and the density in E2 and E3+ chain enhancers decreases with the degree of separation from the promoter. In addition, the fold enrichment of bound TFs according to ChIP-seq data for E1, E2, E3, and single and non-chain enhancers is at a similar level (which, in turn, is significantly higher than that in non-enhancer regions of open chromatin), suggesting that all enhancers in an EC, as well as the single and non-chain enhancers, are actively bound by TFs and contribute to the expression levels of their target genes (Additional file 1: Figure S7D).
Mammalian genes are commonly surrounded by multiple enhancers, a regulatory architecture that provides evolutionary stability and phenotypic robustness. Different regulatory models have been proposed to describe how these multiple enhancers coordinately regulate their target genes. However, these studies were largely focused on super-enhancers or different stages of development of the same tissue, and the general mechanisms of gene regulation by multiple regular enhancers remain unclear, including how they are organized in 3D structures and what contributions they cooperatively make to a particular gene regulatory program. We categorized active enhancers according to their 3D contacts with each other and their target promoter, dissecting the enhancer regulatory network into multiple chains of enhancers consisting of consecutive Hi-C contacts. We found that the distal E1, rather than the proximal E2 in a chain, commonly loops to the target promoter to activate the gene expression and possibly propagate the signal from other enhancers in the locus. The E2, which is usually located much closer to the promoter, loops to the E1 rather than the nearby promoter, reflecting an essential role of the E1 in coordinating the primary gene regulation, while the remaining enhancers in a chain possibly stabilize and expand an existing regulatory domain. The position of the E1 in a chain is conserved across multiple tissues, and the TFBS composition of E1s is strongly correlated with the target promoter in terms of tissue specificity and enriched TFBSs. This finding is in line with the model of gene regulation involving a static loop formed by an E1 and potential dynamic loops formed by the other enhancers in a chain .
Since we cannot rule out the possibility of the E2 and following enhancers (E3 to En) in a chain having direct interactions with the target promoter (albeit at a much lower frequency than the E1), it is possible that these enhancers, which are enriched for very similar but lower density of TFBSs as the E1, may partially recover the regulatory program of a gene upon a loss of the E1. However, an indirect EC connected to the target promoter through a neutral DNA segment rather than an E1 leads to a dramatic drop in the level of gene expression as compared to a directly connected chain, revealing a rather critical role of E1s in activating transcription and bridging distal enhancers to the target gene. This further confirms a hierarchical regulatory structure consisting of a primary enhancer with multiple redundant enhancers in an EC. In addition, E1s tend to connect to a single promoter and multiple (3.0 on average) ECs simultaneously. For the top 5% E1s with the most Hi-C interactions, the number of connections increases to 2.0 promoters and 12.6 ECs, suggesting their genomic functions similar to the locus control regions (LCRs)  and their abilities to interact with multiple enhancers equivalently. However, different from LCRs in regulating multiple genes simultaneously, E1s commonly target a single promoter only. Overall, the ability of E1s to connect multiple ECs to their target genes and the influence of ECs on gene expression indicate a possible mechanism of EC regulatory role in coordinating multiple redundant distal enhancers for target gene activation. Although our results emphasize the important and distinguishing characteristics and influence of E1s on target gene expression in contrast to other regulatory elements, such as single and non-chain enhancers, we also observe a significant contribution of E2+ enhancers, which have no direct interactions with the promoter, to the target gene expression. As part of ECs, E2+ enhancers cooperate with E1s to orchestrate gene regulation in a complex manner. It is also likely that these E2+ enhancers are stabilizing the regulatory contact domain dynamically, thereby fine-tuning and amplifying the target gene expression in a cell-specific manner.
The EC mechanism of gene regulation by multiple enhancers is very important and novel in the following aspects: (1) It indicates the crucial role and the distal feature of the first enhancer in the gene regulation. This is a very important message for identification of actual causal variants in GWAS studies, since the majority of current GWAS SNPs are found proximal to the associated promoters. Our study suggests that enhancers that are relatively far away from the promoter might act as important as the proximal enhancers. (2) We have shown that an EC is an appropriate approach to model complex GRNs involving multiple promoters and enhancers, which is also a convenient model to study the 3D organization of multiple regulatory elements. (3) A set of E1s connected to multiple promoters simultaneously features a much higher density of eQTLs than other enhancers, which might help to narrow down the search of the critical regulatory elements in certain diseases.
Our study clearly demonstrates that the first but distal enhancer plays an essential role in maintaining the baseline of the target gene expression. We propose an enhancer chain model and reveal a hierarchical mechanism for gene regulation containing the first enhancer and multiple redundant enhancers. A regulatory domain is initiated from this first enhancer by building stable primary contacts with the target promoter along with a serial of redundant enhancers chained together via enhancer-enhancer interactions. In summary, our findings from this work indicate the distal and multi-contact features of the first critical enhancers and may provide insights into the organizing mechanism of complex regulatory domains.
Enhancer-promoter contacts and ECs
A set of genome-wide chromatin profiles of histone marks, deoxyribonuclease (DNase) I-hypersensitive sites (DHS) and transcription factor binding sites (TFBSs) was downloaded from the Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomics projects [38, 39]. The four human cell lines with high-resolution Hi-C data  and gene expression data  were selected for this analysis: GM12878, HMEC, HUVEC, and K562 (EID: E116, E119, E122, and E123). Tissue-specific active enhancers were defined as 400 bp segments centered on H3K27ac peaks overlapping H3K4me1 peaks. Segments overlapping promoters, H3K27me3 peaks, and/or blackout regions were excluded from the list of enhancers. Promoters were defined as regions 1500 bp upstream and 500 bp downstream from a transcription start site (TSS) of a “University of California, Santa Cruz (UCSC) Known” gene .
We analyzed the regulatory element networks of Hi-C interactions among promoters and enhancers. Each network may contain multiple promoters and enhancers; the size of a network is defined as the genomic distance between the two most distant elements along the genome. For a selected enhancer in the network, its rank is defined as the shortest distance to its closest promoter, which is the minimum number of consecutive connections separating this enhancer from that promoter. We named the rank of enhancers as step-one (E1), step-two (E2), step-three , and the rest (E3+). An EC is defined as an oriented path from a promoter that visits enhancer nodes in the order of their ranks and should contain at least two enhancers. These chain enhancers associated with a promoter are contiguously connected through enhancer-enhancer Hi-C interactions, and only the first enhancer is connected to that promoter directly. Some examples of ECs are shown in Additional file 1: Figure S1. If a promoter is connected to only one enhancer and no other enhancers are connected to this enhancer, we call this “a single enhancer” case which is excluded from the EC category. In the case that a chain of enhancers is associated with two or more promoters, the rank of a particular enhancer will be determined by its closest promoter. We used the shortest distance approach in the construction of ECs. Namely, each enhancer was connected to the promoter using the smallest number of intermediate contacting enhancers possible. For example, in the case of a shorter and a longer path (measured as the number of contacting enhancers), only the shorter path was selected (Additional file 1: Figure S1C). In the case of two equidistant paths, both were selected (Additional file 1: Figure S1D). However, to avoid bias and double-counting, enhancers from equidistant paths were used only once in our statistical tests.
Betweenness centrality and Jaccard index
To explore the essential enhancers in a regulatory network, we measured the centrality of each enhancer in the regulatory element network based on the number of the shortest paths between the rest of the enhancers and their target promoters which pass through this enhancer, defined as its BC score. For the cases when a promoter is connected to a single enhancer only, the BC score is set to 0. We would like to investigate if E1 is more frequently associated with underpinning the regulatory program of its target gene than the rest of the EC enhancers by calculating the tissue specificity similarity between the promoter and each of the enhancers along the chain. We measured their co-activities using H3K27ac peaks and computed the fraction of tissues in which a pair of elements is active. For example, for each pair of P-E1, we calculated the number of tissues (out of 20 total) in which both P and E1 overlap H3K27ac peaks. After that, we compared the similarity of these two lists using the Jaccard index to demonstrate the tissue specificity similarity of each P-E1 pair. Similar procedures were applied to P-E2 and P-E3 pairs.
Hi-C data processing
Raw observed Hi-C data in 5-kb resolution, boundaries of chromatin loops, and topologically associating domains (TADs) in GM12878, HMEC, HUVEC, and K562 cell lines were retrieved from Rao et al.’s work . We followed the procedure described in Huang et al.’s work to filter for significant Hi-C interactions . First, we used the iterative correction and eigenvector decomposition (ICE) algorithm implemented in the Hi-Corrector package  to remove biases [42, 43]. After that, statistically significant interactions were identified by Fit-Hi-C  with the parameters “-U=2000000, -L=5000” and using a flexible false discovery rate (FDR), which is described in the following paragraph. Only cis (intra-chromosome) interactions are considered in this study. The ChIA-PET data of GM12878 and K562 was downloaded from GSM1872886  and GSM970216, respectively.
In order to evaluate the influence of possible biases from Hi-C experiments on our results, we performed a comparison among EC enhancers to identify the potential biases caused by the fragment length and GC content in Hi-C experiments. Since the bias in distance between restriction sites and fragment length may lead to the underrepresentation of very short or very long-range interactions , we compared the distance between EC enhancers and the genomic regions with which they were in contact. Although E1s show a significantly larger range of contacts than other chain enhancers, the actual median values are very close among all En categories (Additional file 1: Figure S2A). The average length of interactions for E1s is only 1.1-fold higher than that for E2s. This observed difference in the range of interactions is not sufficient to explain the effects observed by this study (p value < 0.001, the Wilcoxon rank-sum test). One explanation for this difference might be the intrinsic ability of E1s to form distal interactions (Fig. 2c, d). In addition, the interactions for P-E1 and E1-E2 and E2-E3 are all in the middle distance, mostly shorter than 1 Mb, so the influence of fragment length should be limited for EC enhancers. Similarly, the median value of GC content is similar among all En categories in the four tissues (Additional file 1: Figure S2B). These results indicate that the Hi-C biases are not the primary factor in differentiating E1s from other chain enhancers.
To verify the possible influence of chromatin accessibility on the Hi-C interactions in our analysis, we compared the overlaps with open chromatin regions (DNase I ChIP-seq peaks) for E1 enhancers and the neutral DNA segments connecting to ECs (non-enhs). We also compared the number of elements connected through Hi-C, and the associated gene expression, as shown in Additional file 1: Figure S2. We observed a significantly higher chromatin accessibility in E1s than in non-enhs (Additional file 1: Figure S2C). However, the E1s and non-enhs have a very similar number of Hi-C-connected regulatory elements, suggesting the negligible influence of the accessibility bias in our analysis after normalization and filtering of the raw Hi-C data (Additional file 1: Figure S2D). To further demonstrate that any accessibility bias was removed from our study, we compared the associated gene expression among E1s, all non-enhs, and those non-enhs from open chromatin regions (Additional file 1: Figure S2E). We observed a marginal difference between non-enhs with different chromatin accessibility. All these results suggest that the potential bias for regions easily assayed by the Hi-C experiment was removed and did not influence our analysis.
We assumed that the promoter regions of low-expressed genes should have very few or no chromatin contacts with other parts of the genome, except for the cases of actively repressed genes regulated by connected silencers. In each cell line, the genes were defined as lowly expressed genes if their reads per kilobase million (RPKM) < 1.0 and as normal expressed genes if their RPKM ≥ 1.0 [46, 47]. The top 1000 lowly expressed genes with the highest density of H3K27me3 histone mark were assumed to be inactive genes with a small fraction of them being actively repressed. We adjusted the FDR in the Hi-C data processing, so that less than 5% of those 1000 lowly expressed genes had significant Hi-C interactions with their promoter regions. The resulting cell line-specific FDR was 2 × 10−11, 0.002, 0.002, and 0.0008 for GM12878, HMEC, HUVEC, and K562 cell lines, respectively. This set of FDRs not only maintains a similar distribution of length of ECs across 4 tissues (Additional file 1: Figure S3A) but also takes into account the balance between massive data in GM12878/K562 and insufficient data in HMEC/HUVEC (Additional file 1: Figure S3B and S3C), which we identified as the appropriate cutoff for identifying significant Hi-C interactions. The number of connected regulatory elements for each enhancer was normalized using a standard score (z-score). The top 5% enhancers with the largest z-scores across all networks in the corresponding cell line were selected as the group of most connected enhancers. The boundaries of chromatin loops and sub-TADs were obtained for the matching tissue in each case.
It is important to note that gene regulatory networks (GRNs) are complex and have multiple regulatory elements involved. In this analysis, we used a very stringent FDR cutoff to increase the accuracy of finding the true-positive ECs, which may have led to a reduced number of significant Hi-C interactions and associated genes as well as to the underestimation of the number of ECs. For example, out of the average 9789 genes expressed (RPKM ≥ 1.0) across 4 cell lines, only 1404 genes feature Hi-C interactions between the promoter and at least 1 enhancer.
Enrichment of TFBSs
For the enrichment of TFBSs in a particular set of enhancers, we took advantage of the available chromatin immunoprecipitation sequencing (ChIP-seq) TFBS data in GM12878 and K562. In each cell line, we used all the open chromatin regions marked by deoxyribonuclease I (DNase I) hypersensitivity sites (DHSs) from DNase-seq peak data as background, excluding those regions overlapping with promoters, enhancers, and blackout segments. The final control set was defined as 400-bp sequences at the centers of the remaining DNase-seq peaks. For a set of enhancers, the total number of overlaps between them and ChIP-seq peaks of a particular TFBS was summed and normalized by the total length of enhancers to get the density of the TFBS. Similarly, in the control set, the density of the same TFBS was calculated, and the ratio of these two densities was used to represent the enrichment of that TFBS in enhancers.
Density of expression quantitative trait loci variants
The Gene-Tissue Expression (GTEx) eQTLs v7 data were obtained from the GTEx Portal (www.gtexportal.org) for the variant density analysis. For tissue-specific variants, “whole blood” eQTLs were selected for GM12878 and K562 enhancer chains, “breast mammary” eQTLs for HMEC, and “artery aorta” eQTLs for HUVEC. The density of variants was calculated as the number of variants falling into the genomic regions occupied by enhancers from a particular class over the total number of enhancers in that class.
The authors are grateful to Dorothy L. Buchhagen for the critical comments and suggestions.
The review history is available as Additional file 2.
IO and RS conceived and designed the study. WS established the computational framework and analyzed the data. WS, RS, and IO wrote the manuscript. All authors read and approved the final manuscript.
This work has been supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine. RS was supported by the Israel Science Foundation (grant no. 715/18).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 14.Novo CL, Javierre BM, Cairns J, Segonds-Pichon A, Wingett SW, Freire-Pritchett P, Furlan-Magaril M, Schoenfelder S, Fraser P, Rugg-Gunn PJ. Long-range enhancer interactions are prevalent in mouse embryonic stem cells and are reorganized upon pluripotent state transition. Cell Rep. 2018;22:2615–27.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.