VULCAN integrates ChIP-seq with patient-derived co-expression networks to identify GRHL2 as a key co-regulator of ERa at enhancers in breast cancer
VirtUaL ChIP-seq Analysis through Networks (VULCAN) infers regulatory interactions of transcription factors by overlaying networks generated from publicly available tumor expression data onto ChIP-seq data. We apply our method to dissect the regulation of estrogen receptor-alpha activation in breast cancer to identify potential co-regulators of the estrogen receptor’s transcriptional response.
VULCAN analysis of estrogen receptor activation in breast cancer highlights the key components of the estrogen receptor complex alongside a novel interaction with GRHL2. We demonstrate that GRHL2 is recruited to a subset of estrogen receptor binding sites and regulates transcriptional output, as evidenced by changes in estrogen receptor-associated eRNA expression and stronger estrogen receptor binding at active enhancers after GRHL2 knockdown.
Our findings provide new insight into the role of GRHL2 in regulating eRNA transcription as part of estrogen receptor signaling. These results demonstrate VULCAN, available from Bioconductor, as a powerful predictive tool.
KeywordsBreast cancer Network analysis Dynamics ER Master regulator ChIP-seq VULCAN GRHL2 P300 H3K27ac
Breast cancer is the most common form of cancer in women in North America and Europe accounting for 31% of all new cancer cases. In the USA, it is estimated that 41,400 deaths will have occurred from the disease in 2018 . The majority of breast cancers, approximately 70%, are associated with deregulated signaling by the estrogen receptor-alpha (ER), which drives tumor growth. Therefore, in ER-positive (ER+) tumors, ER is the primary therapeutic target. During activation, ER recruits several cofactors to form an active complex on the chromatin. FOXA1 is of particular interest as the protein shares nearly 50% of its genomic binding sites with ER and has been shown to operate as a pioneer factor before ER activation [2, 3]. It is through FOXA1 and other cofactors (e.g., SRC-1) [4, 5] that ER is able to recruit RNA polymerase II at the gene promoter sites by way of adaptor proteins in order to initiate transcription . Combinatorial treatments targeting ER cofactors present a significant opportunity in breast cancer therapy for increasing patient survival. In particular, the pioneer factor FOXA1  has been identified as a novel therapeutic target for the treatment of breast cancer, while the EZH2-ERα-GREB1 transcriptional axis has been shown to play a key role in therapeutic resistance .
ChIP-seq enables the identification of potential site-specific interactions at common binding sites between transcription factors and their cofactors; however, to fully characterize all potential cofactors of a single project on this scale is laborious and expensive. To follow up all potential cofactors identified by a chromatin-wide proteomics method, e.g., RIME  or ChIP-MS , would take hundreds of individual ChIP-seq experiments. Studies like ENCODE  have gone a long way to provide resources to meet these challenges; however, the inherent scale of the problem means public studies can only offer data for a subset of TF in a limited number of models. A single lab to undertake this level of experimentation is unfeasible and, in cases where suitable antibodies for the ChIP do not exist, impossible.
To enable discoveries beyond collections like ENCODE, we are proposing a computational framework to integrate patient data in the prediction of functional protein-protein interactions. By applying machine learning methods, we are able to surpass the limitation of current predictive tools that exist to support the interpretation of data. Previous methods provide information in the context of predefined biological pathways and established gene sets [12, 13] or through motif analysis , while our method is built on data specific to the disease being studied. Further, standard gene set enrichment analysis has inherent limitations because it was not designed for reconstructing gene networks, whereas one of the advantages of VULCAN is that it down weights genes shared by multiple TFs.
Through the application of VULCAN to the activation of the ER in breast cancer, we were able to identify multiple previously characterized cofactors of the ER along with GRHL2 as a potential co-repressor of the ER. We then demonstrated experimentally that GRHL2 is able to modulate the expression of eRNA at ER bound enhancers, and the removal of the P300 inhibitory alpha helix results in suppression of the inhibitory effect on eRNA production.
VULCAN integrates ChIP-seq data (Fig. 1, step 1) with co-expression networks (Fig. 1, step 2) to predict cofactor activity (Fig. 1, step 3). The initial ChIP-seq data is converted into genomic regions, and if multiple conditions are supplied, the changes in the transcription factor affinity are calculated. In parallel, master regulator analysis of tumor transcriptional data is used to provide tissue-specific information on the regulation of genes by TFs within the tumor type. The integration of these two data types provides context-specific results and differentiates VULCAN from the existing methods which make use of predefined unweighted gene sets or motif analysis. VULCAN additionally makes use of the key functionality of the VIPER algorithm  that assigns edge-specific scores like mode of action and likelihood to the reconstructed network.
In the following, we first benchmark VULCAN’s performance in a comprehensive comparison to alternative approaches. We then apply it to our data on temporal ER binding, which identifies GRHL2 as a novel ER cofactor, and we explore its function.
Comparison of VULCAN to existing methods
Mutual information networks outperform partial correlation networks
We generated a mutual information network with ARCANe alongside several partial correlation networks at different thresholds all from the TCGA breast cancer data. To ensure the robustness of our method, we tested the overlap of every partial correlation network with the mutual information network using the Jaccard index (JI) criterion (Additional file 1: Figure S1). Finally, we showed how the Jaccard index between partial correlation networks and the ARACNe network is always significantly higher than expected by selecting random network edges (Additional file 1: Figure S2). For further analysis, we selected the mutual information network generated by ARACNe as this method outperformed the partial correlation networks at all thresholds.
GSEA is the optimum method for VULCAN’s target enrichment analysis
VULCAN applies gene set enrichment analysis  to identify enrichment of our mutual information network derived regulons in differential ChIP-seq data. To validate our method, we compared the results of VULCAN when applied to our ER binding data against three independent methods previously applied to benchmark VIPER . First, we implemented a Fisher p value integration step. This test lacks stringency and results in nearly all regulons as significantly enriched (Additional file 1: Figure S3). Second, we implemented a fraction of targets method, defining for every TF the fraction of their targets that are also differentially bound. This alternative to VULCAN ignores the MI strength of interaction and the individual strengths of differential bindings, reducing the resolving power of the algorithm (Additional file 1: Figure S4). Finally, we compared to Fisher’s exact method, which assesses the overlap between networks and significant differential binding. This method is too stringent (as observed in the original VIPER paper) ; and even without p value correction, there are no significant results, even at low stringency, demonstrating the low sensitivity of using Fisher’s exact method (Additional file 1: Figure S5). In summary, VULCAN GSEA implementation outperformed all three alternative methods we tested (t test based; fraction of targets method; and Fisher’s exact method) in our dataset and was therefore applied to all downstream analysis of ChIP-seq data.
VULCAN outperforms enrichment analysis tools (GREAT, ISMARA, and ChIP-Enrich)
To further validate our method, we compared the output of our GSEA analysis with different versions of promoter-enrichment approaches implemented by GREAT , ISMARA , and ChIP-Enrich . The VULCAN analysis shows a significant overlap in terms of detected pathways with the GREAT method (Additional file 1: Figure S6). ChIP-Enrich identifies enrichment of a number of TFs also predicted by VULCAN, but it fails to identify ESR1 as the top transcription factor affected by our experiment (Additional file 1: Figure S7). ISMARA succeeds at identifying ESR1 using a motif-based analysis but does not identify other candidate binding TFs (Additional file 1: Figure S8). In summary, VULCAN outperforms both ISMARA and ChIP-Enrich, and significantly overlaps with GREAT, but provides additional value through inference of TF factor activity.
Temporal analysis of ER DNA binding profiles after activation by E2
We performed a gene set enrichment analysis (GSEA)  and an associated rank enrichment analysis (aREA)  using the differential binding at gene regulatory regions with time 0 as reference. Individual differential binding signatures for GSEA were calculated using a negative binomial test implemented by DiffBind . The collective contribution of differentially bound sites highlights several ER-related pathways in both the GSEA and aREA analyses [19, 20, 21] (Additional file 1: Figure S14). The strongest upregulated GSEA pathway in both time points (Additional file 1: Tables S1 and S2) was derived from RNA-seq in an MCF7 study using estradiol treatment , confirming the reproducibility of our dataset.
VULCAN analysis of ER activation
VULCAN identifies coactivators and co-repressors of ER
We leveraged the information contained in mutual information networks to establish TF networks enriched in the differential binding patterns induced by estradiol. From our analysis of ER binding, we established four classes of modulation: early coactivators, early co-repressors, delayed coactivators, and transient coactivators (Fig. 3).
Using VULCAN, we defined TF network activity of occupied regulatory regions (Fig. 3a) according to the binding of ER within their promoter and enhancer regions (limited to 10 kb upstream of the transcription starting site to ensure gene specificity). We define early coactivators as those TFs whose network is upregulated at both 45 and 90 min (Fig. 3b); these genes include AR, SP1, and CITED1. TFs with opposite behavior (namely TFs whose negative/repressed targets in the ARACNe model are occupied by ER), or “early co-repressors,” include GLI4, MYCN, and GRHL2 (Fig. 3c). Some TFs appear to have their targets transiently bound at 45 min but then unoccupied at 90 min, and therefore, we dubbed them “transient coactivators” (Fig. 3d). We further defined TFs active at 90 min but not at 45 min as “delayed coactivators,” noting these cofactors could be the transient if the response is not completed by 90 min. While this category exists, and notably contains both ESR1 and the known ESR1 interactor GATA3, it is just below the significance threshold at 45 min (Fig. 3e).
We repeated our TF network activity analysis of ER activation (Fig. 3a–e) on an independent dataset from TCGA and found similar results to those established from the METABRIC-derived network (Additional file 1: Figures S14, S15, S16, S17, S18, and S19).
VULCAN results are specific to the tissue used for network modeling
Regulatory networks can be tissue specific due to a variety of biological reasons, such as chromatin status, cofactor availability, and lineage-dependent transcriptional rewiring . We tested whether our VULCAN results can be affected by the choice of the ARACNe-inferred regulatory network. In order to do so, we required a gene expression dataset large enough for robust mutual information inference (> 100 samples), based on the same library preparation and sequencing protocols as the breast cancer TCGA dataset used in our analysis (to remove the possibility of technical differences), but ultimately derived from a tissue as distant as possible from breast cancer (BRCA) on which network models on this study are derived. For this purpose, we computed ARACNe regulatory models on the TCGA dataset for acute myeloid leukemia (AML), a liquid tumor histologically very different from BRCA. This AML-derived network shows globally weaker VULCAN enrichment scores than the BRCA-derived network and a weak positive correlation with the results obtained through breast cancer regulatory models (Additional file 1: Figure S20). The positive correlation suggests that regulatory networks inferred in breast cancer are tissue specific and can only in part be recapitulated by a leukemia-inferred network.
VULCAN is able to predict protein-protein interactions in both patient-derived xenografts (PDX) and prostate cancer
VULCAN outperforms classical motif analysis
Finally, we compared VULCAN to a classical motif analysis by exploiting the MsigDB C3 collection v6.1  of gene sets, which contain canonical TF-specific binding motifs in their promoters. Our analysis shows the correlation of VULCAN results for two transcription factors (e.g., between GATA3 and ESR1, Additional file 1: Figure S23) can be relatively high but not significantly overlapping in terms of target genes containing the same canonical motifs (Additional file 1: Figure S24). We could prove that this non-relationship is general as it extends to the majority of TF-TF pairs that were present in the MsigDB database (Additional file 1: Figure S25).
Optimization of VULCAN parameters
The network generation algorithm uses established methods to optimize the parameters for the RNA-seq input (e.g., ARACNE-AP calculates the edge significance based on data-specific permutation test). By default, VULCAN can calculate key settings from the provided ChIP-seq data (e.g., DNA fragment length). Additionally, parameter choice is tunable at the wish of the user. The distance from promoter transcription starting site (TSS) can be tuned to the specific organism investigated by the ChIP-seq experiment. In this manuscript, we used 1000 nt for Homo sapiens, but it can be lowered to 100 nt for bacterial chromosomes, to assign peaks that act as representative for the gene.
GRHL2 is a novel ER cofactor
In our analysis of ER dynamics, the GRHL2 transcription factor was consistently identified as a key player, using both the METABRIC and TGCA networks. GRHL2 is a transcription factor that is important for maintaining epithelial lineage specificity in multiple tissues [28, 29]. It has previously been predicted to exist in ER-associated enhancer protein complexes , but its function in the ER signaling axis is unknown. Therefore, we set out to experimentally validate GRHL2 as an ER cofactor.
There is only a weak, positive correlation between ESR1 and GRHL2 expression in the TCGA and METABRIC breast cancer datasets (Additional file 1: Figure S26). Furthermore, GRHL2 does not change significantly in different PAM50 subtypes, although it is overexpressed in malignant tissue. The low correlation between GRHL2 expression and subtype implies that the protein is controlled by mechanisms such as phosphorylation , subcellular localization, or on-chromatin protein-protein interactions.
qPLEX-RIME detects a significant increase in the ER-GRHL2 interaction on activation
We undertook a complementary, unbiased, experimental approach combining RIME  with TMT , called qPLEX-RIME , to identify interactors of ER within the ER-chromatin complex. We generated ER qPLEX-RIME data from MCF7 cells treated with estradiol at both 45 and 90 min and compared this to the VULCAN dataset (Additional file 1: Figure S27). We found known ESR1 interactors with both methods, namely HDAC1, NCOA3, GATA3, and RARA. These interactors have positive enrichment according to VULCAN , implying the TF’s regulon is over-represented within the differentially bound genes. Importantly, qPLEX-RIME identified a significant increase in the protein-protein interaction between ER and GRHL2 in estrogenic conditions. As GRHL2 has a negative enrichment score in VULCAN, this implies either the protein is recruited by ER to sites that are significantly depleted for GRHL2’s regulon or that GRHL2 is established as having a negative correlation to the genes regulated at these sites, i.e., the protein is a co-repressor of the ER.
To establish if the recruitment of GRHL2 was primarily related to a transcriptional function or the previously described interaction with MLL3, we overlapped our GRHL2 data with that of published H3K4me1/3  and P300  cistromes. While H3K4me occupancy was consistent between conditions, we found P300 binding to be enriched at the E2-responsive GRHL2 sites.
A more detailed analysis of the GRHL2 overlap with P300 sites showed the greatest co-occupancy of GRHL2/P300 sites was when both TFs were stimulated by E2 (Fig. 6d). Moreover, the overlap of GRHL2 peaks with ER ChIA-PET data [ENCSR000BZZ] showed that the GRHL2-responsive sites were enriched at enhancers over promoters (Fig. 6e). These findings suggested that the GRHL2-ER complex is involved in transcription at ER enhancer sites.
Validation of the ER-GRHL2 interaction by qPLEX-RIME and co-IP
qPLEX-RIME  analysis of GRHL2 in both the estrogen-free and estrogenic conditions showed high levels of transcription-related protein interactors including HDAC1 (p value = 6.4 × 10−9), TIF1A (p value = 6.4 × 10−9), PRMT (p value = 6.4 × 10−9), and GTF3C2 (p value = 4.6 × 10−9). p values given for estrogen-free and estrogenic conditions were comparable. The only protein differentially bound to GRHL2 in estrogen-free vs estrogenic conditions was the ER.
We further validated this interaction by co-IP. Our analysis robustly found that GRHL2 and ER interact in both MCF7 and T47D cells (Fig. 7). We further validated the antibody by siRNA knockdown and saw the disappearance of the GRHL2 band at 24 h.
GRHL2 constrains ER binding and activity
We investigated the transcription of enhancer RNAs at these sites using publicly available GRO-seq data  [GSE43836] (Fig. 6f). At E2-responsive sites, eRNA transcription was strongly increased by E2 stimulation; by contrast, eRNA transcription was largely independent of E2 stimulation when the entire GRHL2 cistrome was considered. Analysis of a second GRO-seq dataset, GSE45822, corroborated these results (Additional file 1: Figure S28).
A conserved alpha helix between residues 425 and 437 of GRHL2 has previously been shown to inhibit P300 . We therefore overexpressed GRHL2 Δ425–437, a previously demonstrated non-p300-inhibitory mutant [43, 44], in three ER-positive breast cancer cell lines (MCF7, T47D, and ZR75) and compared levels of eRNA to those recorded for both an empty vector control and for the overexpression of the wild-type protein (Fig. 8b). The results of the wild-type study were concordant to those of our previous analysis (Fig. 8a), suggesting in general that overexpression GRHL2 leads to the inhibition of eRNA production at certain ER sites. Importantly, in all cases, the removal of aa 425–437 from GRHL2 led to a reduction in the inhibitory effect caused by overexpression of the wild-type protein and was found as significant in five out of nine cases test (p < 0.05, t test, single-tail, paired).
VirtUaL ChIP-seq Analysis through Networks
VULCAN is valuable for the discovery of transcription factors acting as co-regulators within chromatin-bound complexes that would otherwise remain hidden. The challenge of highlighting the cofactors from a ChIP-seq experiment lays in the infeasibility of reliable proteomic characterization of DNA-bound complexes at specific regions. On the other hand, while RNA-seq is arguably the most efficient technique to obtain genome-wide quantitative measurements, any transcriptomic approach cannot provide a full picture of cellular responses for stimuli that are provided on a shorter timescale than mRNA synthesis speed, such as the estradiol administration described in our study. VULCAN, by combining RNA-seq-derived networks and ChIP-seq cistrome data, aims at overcoming the limitations of both. Most notably, our method can work in scenarios where candidate cofactors do not have a well-characterized binding site or do not even bind DNA directly.
Through comparative analysis, we have robustly shown that VULCAN is able to outperform other readily available methods for the prediction of on-chromatin interactions of transcription factors. VULCAN achieves this through the integration of ChIP-seq and tumor transcriptional data. The inherent limitation of our method therefore is that tumor transcriptional data must be available in sufficient quantity to build the underlying network for the analysis, whereas tools based on predefined networks have no such limitation. In the majority of cases, this is not a challenge as projects like the TCGA provide transcriptome-wide data for a range of cancers. It is therefore only in the cases of rarer disease types (such as neuroendocrine tumor) and orphan tissues that this limitation will be problematic as these are poorly represented in public data. Even so, in these cases where specific networks cannot be generated, pan-tissue regulatory networks are currently being developed to overcome this limitation and these could be adapted for VULCAN .
By developing VULCAN, we have been able to rediscover the known cofactors of the estradiol-responsive ER complex and predict and experimentally validate a novel protein-protein interaction.
GRHL2 has a key role in regulating EMT
In the 4T1 tumor model, GRHL2 was found to be significantly downregulated in cells that had undergone EMT . The same study showed that knockdown of GRHL2 in MCF10A—an ER-negative cell line—led to the loss of epithelial morphology. Overall, this suggested that the GRHL2 transcription factor plays an essential role in maintaining the epithelial phenotype of breast cells. Similar results were observed with the MDA-MB-231 model, where expression of GRHL2 resulted in the reversal of EMT . This result has been recapitulated in hepatocytes, where GRHL2 was found to suppress EMT by inhibiting P300 . The ability to suppress EMT has also been noted in prostate cancer, another cancer driven by a steroid hormone receptor (AR), and the genes regulated by GRHL2 are linked to disease progression .
GRHL2, a novel co-repressor of ER eRNA production
These earlier data combined with the link between GRHL2 expression and patient survival indicate a significant role for GRHL2 in the progression of breast cancer. However, its role in the ER signaling axis has, until now, been unknown. Here, we show that GRHL2 performs its activity at a subset of ER enhancers. Overexpression of GRHL2 resulted in a significant decrease in eRNA production at the TFF1 and XBP1 enhancers, and in agreement with previous studies that correlate eRNA transcription with gene expression [47, 48, 49], we found the measured eRNA decrease was concurrent with a significant downregulation in the expression of the corresponding gene.
These results are consistent with previous findings that GRHL2 inhibits P300  and, while the ER complex results in the activation of eRNA transcription at these sites, that GRHL2 plays a role in fine-tuning or modulating this process.
GRHL2 role in the ER signaling axis is independent to its role in tethering MLL3
In breast cancer, GRHL2 has previously been shown to directly interact with FOXA1, which may contribute to the tethering of the histone methyltransferase MLL3 and, consequently, epigenetic marks at GRHL2/FOXA1 binding sites . Our analysis, however, showed no particular enrichment for H3K4me1/3 marks at E2-responsive GRHL2 sites compared to other GRHL2 binding sites, and our proteomic analysis of interactors showed a strong association with proteins related to transcription. We proposed that these ER-responsive sites are related to the role of GRHL2 in a transcriptional process independent of its interaction with MLL3. This was supported by evidence of a significant overlap with binding of the coactivator P300, transcriptional proteins detected by qPLEX-RIME analysis of GR, and a pronounced increase in eRNA transcription at E2-responsive GRHL2 sites.
ER is bound more strongly at active enhancers (H3K27ac) that are altered by siGRHL2
Knockdown of GRHL2 led to a genome-wide remodeling of H3K27ac marks, found at active enhancers, confirming a role of GRHL2 in partially regulating these sites. Detailed inspection of the data showed a significant increase of these marks around the XBP1 and GREB1 genes, supporting our hypothesis that GRHL2 has a partial inhibitory role within the ER regulon. The result was further supported by finding enrichment of ER binding events at H3K27ac marks altered by GRHL2 knockdown (Fig. 9c, right panel). The more complex effect on H3K12ac, when compared to the effects on eRNA production, is likely a result of the diversity of roles that GRHL2 holds within the cell, leading to a host of downstream effects in the regulation of chromatin recruitment of key factors such as MLL3, ER, and FOXA1.
Deletion of the P300 inhibitory α-helix from GRHL2 reduces the protein’s ability to repress the production of eRNA at ER bound enhancer sites
Finally, to clarify if inhibition of P300 was occurring, we generated and overexpressed GRHL2 lacking the inhibitory alpha helix between amino acids 425 and 437. In all cases, GRHL2 Δ425–437 had reduced an inhibitory effect compared to overexpression of the wild type, confirming that GRHL2 primarily plays a repressive role at these sites.
VULCAN is built on state-of-the-art network analysis tools previously applied to RNA-seq data. By adapting network-based strategies to ChIP-seq data, we have been able to reveal novel information regarding the regulation of breast cancer in a model system.
We have demonstrated that the VULCAN algorithm can be applied generally to ChIP-seq for the identification of new key regulator interactions. Our method provides a novel approach to investigate chromatin occupancy of cofactors that are too transient or for which no reliable antibody is available for direct ChIP-seq analysis.
Further, because of our use of clinical data, VULCAN results are both more likely to be relevant and are specific to the disease type studied, as demonstrated in the loss of signal when using a control co-expression network generated from an alternative disease type.
Given the central role of the ER in breast cancer development and GRHL2’s own ability to regulate EMT, the discovery that ER recruits GRHL2 leading to the altered eRNA production is an important step in enhancing our understanding of breast cancer and tumorigenesis.
An implementation of VULCAN in R is available on Bioconductor.org [https://bioconductor.org/packages/release/bioc/html/vulcan.html], and the scripts to replicate our analysis are available as Rmarkdown files. Unless otherwise specified, all p values were Bonferroni corrected.
MCF7 cells were obtained from the CRUK Cambridge Institute collection, authenticated by STR genotyping and confirmed free of mycoplasma. All cells were maintained at 37 °C, 5% CO2. For each individual ChIP pulldown, we cultured 8 × 107 MCF7 cells (ATCC) across four 15-cm-diameter plates in DMEM with 10% FBS, glutamine and penicillin/streptomycin (Glibco). Five days before the experiment, the cells were washed with phosphate-buffered saline (PBS) and the media were replaced with clear DMEM supplemented with charcoal-treated serum. The media was refreshed every 24 h, which halted the growth of the cells and ensured that majority ER within the cell was not active. On day 5, the cells were treated with estradiol (100 nM). At the appropriate time point, the cells were washed with ice-cold PBS twice and then fixed by incubating with 10 mL per plate of 1% formaldehyde in unsupplemented clear media for 10 min. The reaction was stopped by the addition of 1.5 mL of 2.5 M glycine, and the plates were washed twice with ice-cold PBS. Each plate was then scraped in 1 mL of PBS with protease inhibitors (PI) into a 1.5-mL microcentrifuge tube. The cells were centrifuged at 8000 rpm for 3 min at 4 °C and the supernatant removed. The process was repeated for a second wash in 1 mL PBS+PI and the PBS removed before storing at − 80 °C.
Frozen samples were processed using established ChIP protocols  to obtain DNA fragments of ~ 300 bp in length. The libraries were prepared from the purified DNA using a ThruPLEX DNA-seq kit (Rubicon Genomics) and sequenced on the Illumina HiSeq Platform. Sequencing data is available from Gene Expression Omnibus, accession GSE109820 and GSE123475.
Differential binding analysis
Sequencing data was aligned using BWA  to the human genome (hg19). Reads from within the DAC Blacklisted Regions was removed before peak calling with MACS 2.1  on default parameters. The aligned reads and associated peak files were then analyzed using DiffBind  to identify significant changes in ER binding.
Gene set enrichment analysis
Gene set enrichment analysis (GSEA) was performed as described by Subramanian et al.  using the curated pathway collection (C2) from MSIGDB v 5.0 with 1000 set permutations for each pathway investigated, followed by Benjamini-Hochberg p value correction.
We reconstructed a regulatory gene network using ARACNe-AP as described by Alvarez . RNA-seq breast cancer data was downloaded from TCGA in January 2015 and VST-Normalized as described by Anders and Huber . The ARACNe transcriptional regulation network was imported into R using the VIPER BioConductor package, and it was interrogated using the differential binding profiles from our ChIP-seq experiment as signatures, 45 min vs control and 90 min vs control. The peak-to-promoter assignment was performed using a 10-kb window with respect to the transcription starting site (TSS) of every gene on the hg19 human genome. The algorithm msVIPER (multi-sample Virtual Inference of Protein activity by Enriched Regulon analysis) was then applied, leveraging the full set of 8 replicates per group, with 1000 signature permutations and default parameters.
TF binding overlap
Publicly available data was downloaded as described in the source publication [3, 30, 34, 35], and overlap was calculated with bedtools (v2.25.0). Presented data was normalized as a percentage of GRHL2 sites.
Nagarajan et al. 
Nagarajan et al. 
Li et al. 
Li et al. 
Peña et al. 
Peña et al. 
ERα (F10) antibody, Santa Cruz (sc-8002), was cleaned using Amicon 10K Buffer Exchange Column (EMD, Cat # UFC501096) to remove the sodium azide. 2.5 μg ERα (F10) antibody rotated overnight at 4 °C with 100 μL Dynabeads Protein A, Invitrogen (10001D).
Nuclear lysate was harvested via cell lysis (20 mM Tris-HCl, 20 mM NaCl, 0.2 mM EDTA) followed by nuclear lysis (20 mM Tris-HCl, 20 mM NaCl, 0.2 mM EDTA 1% IGEPAL). The uclear lysate was then incubated overnight at 4 °C with the Dynabeads Protein A. Elution via 10 min incubation at 70 °C with 1× NuPAGE LDS Sample Buffer, Invitrogen (NP0007), and 1× NuPAGE Sample Reducing Agent, Invitrogen (NP0004), and subjected to western blotting.
Knockdown of GRHL2
Knockdown of GRHL2 was undertaken using ON-TARGETplus SMARTpool Human GRHL2, Dharmacon (#L-014515-02-0050) and Lipofectamine® RNAiMAX Reagent Protocol (Thermo) according to the manufacturer’s protocol. Control samples were prepared following the same method using ON-TARGETplus Control pool Non-targeting pool, Dharmacon (#D-001810-10-50) in place of siGRHL2.
We would like to acknowledge the support of the University of Cambridge, Cancer Research UK, and Hutchison Whampoa Limited. We are grateful for the support of Eva Papachristou in undertaking the qPLEX-RIME analysis of our samples in advance of her initial publication describing the method. We would like to acknowledge the contribution from the CRUK Genomics, Proteomics, and Bioinformatics core facilities in supporting this work.
This work was funded by CRUK core grant [grant numbers C14303/A17197, A19274] and Breast Cancer Now Award [grant number 2012NovPR042] to FM and supported by The Alan Turing Institute under the EPSRC grant EP/N510129/129/1 as a Turing Fellowship to ANH.
Availability of data and materials
All sequence data are publicly available at the GEO, accession numbers GSE109820  and GSE123475 . Data from previously published datasets were also acquired from GEO with accession numbers GSE110824 , GSE39880 , GSE45822 , and GSE43836 .
Code for data analysis is provided as a Rmarkdown document, and supporting data is available from https://github.com/andrewholding/VULCANSupplementary . For convenience, we have provided VULCAN as a BioConductor package https://bioconductor.org/packages/release/bioc/html/vulcan.html  along with a supporting data package https://bioconductor.org/packages/release/data/experiment/html/vulcandata.html .
ANH, FMG, LAS, and FM conceived and designed the experiments. ANH undertook the ER experimentation. ANH and FMG undertook the analysis of ER data. ANH and FMG designed the VULCAN. FMG implemented the VULCAN analysis. AEC, ANH, and AD undertook the GRHL2 experimentation. SN undertook the eRNA quantitation. ANH, AEC, and LAS undertook the analysis of GRHL2 data. ANH, FMG, AEC, LAS, and FM wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 17.Holding AN, Cullen AE, Markowetz F. Genome-wide estrogen receptor-α activation is sustained, not cyclical. Elife. 2018;7:e40854.Google Scholar
- 31.Helzer KT, et al. The phosphorylated estrogen receptor cistrome identifies a subset of active enhancers enriched for direct ER-DNA binding and the transcription factor GRHL2. Mol Cell Biol. 2018;39(3):e00417-18.Google Scholar
- 39.Ribieras S, Tomasetto C, Rio MC. The pS2/TFF1 trefoil factor, from basic research to clinical applications. Biochim et Biophys Acta. 1998;1378(1):F61–77.Google Scholar
- 52.Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):137.Google Scholar
- 57.Anders S, Huber W, S A, W H. Differential expression analysis for sequence count data. Genome Biol. 2010;11:106.Google Scholar
- 61.Holding AN, Markowetz F. Early dynamics of ERa and GRHL2 binding on stimulation with estradiol. Gene Expression Omnibus. 2018; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
- 62.Holding AN, Markowetz F. Knockdown of GRHL2 affects active enhancers (H3K27ac sites) enriched for ER binding. Gene Expression Omnibus. 2018; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
- 63.Bruna A, Markowetz F, Caldas C, Holding AN. Knockdown of GRHL2 affects active enhancers (H3K27ac sites) enriched for ER binding. Gene Expression Omnibus. 2018; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
- 64.Sahu B, Laakso M, Pihlajamaa P, Ovaska K, Sinielnikov I, Hautaniemi S, Jänne OA. FoxA1 specifies unique androgen and glucocorticoid receptor binding events in prostate cancer cells. Gene Expression Omnibus. 2013; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
- 65.Li W, Ma Q, Rosenfeld MG. Functional importance of eRNAs for estrogen-dependent gene transcriptional activation. Gene Expression Omnibus. 2013; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
- 66.Hah N, Murakami S, Nagari A, Danko CG, Kraus WL. Enhancer transcripts mark active estrogen receptor binding sites using GRO-seq. Gene Expression Omnibus. 2013; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
- 67.Holding AN. VULCAN supplementary, (2013), GitHub repository. 2018. https://github.com/andrewholding/VULCANSupplementary. https://doi.org/10.5281/zenodo.2633145
- 68.Giorgi FM, Holding AN, Markowetz FM. VirtUaL ChIP-Seq data Analysis using Networks. Bioconductor. 2018. https://doi.org/10.18129/B9.bioc.vulcan.
- 69.Giorgi FM, Holding AN, Markowetz FM. VirtUaL ChIP-Seq data Analysis using Networks dataset. Bioconductor. 2018. https://doi.org/10.18129/B9.bioc.vulcandata.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.