Integrative analysis of vascular endothelial cell genomic features identifies AIDA as a coronary artery disease candidate gene
Genome-wide association studies (GWAS) have identified hundreds of loci associated with coronary artery disease (CAD) and blood pressure (BP) or hypertension. Many of these loci are not linked to traditional risk factors, nor do they include obvious candidate genes, complicating their functional characterization. We hypothesize that many GWAS loci associated with vascular diseases modulate endothelial functions. Endothelial cells play critical roles in regulating vascular homeostasis, such as roles in forming a selective barrier, inflammation, hemostasis, and vascular tone, and endothelial dysfunction is a hallmark of atherosclerosis and hypertension. To test this hypothesis, we generate an integrated map of gene expression, open chromatin region, and 3D interactions in resting and TNFα-treated human endothelial cells.
We show that genetic variants associated with CAD and BP are enriched in open chromatin regions identified in endothelial cells. We identify physical loops by Hi-C and link open chromatin peaks that include CAD or BP SNPs with the promoters of genes expressed in endothelial cells. This analysis highlights 991 combinations of open chromatin regions and gene promoters that map to 38 CAD and 92 BP GWAS loci. We validate one CAD locus, by engineering a deletion of the TNFα-sensitive regulatory element using CRISPR/Cas9 and measure the effect on the expression of the novel CAD candidate gene AIDA.
Our data support an important role played by genetic variants acting in the vascular endothelium to modulate inter-individual risk in CAD and hypertension.
KeywordsVascular endothelium Endothelial dysfunction Coronary artery disease Blood pressure Hypertension Genome-wide association study Hi-C AIDA CRISPR/Cas9
Assay for transposase-accessible chromatin with high-throughput sequencing
Body mass index
Coronary artery disease
Chromatin immunoprecipitation with high-throughput sequencing
Differentially opened and/or closed
Expression quantitative trait locus
False discovery rate
Genome-wide association study
Acetylation of lysine-27 on histone 3
Human coronary artery endothelial cells
Chromatin conformation capture with high-throughput sequencing
Human umbilical vein endothelial cells
Ribonucleic acid sequencing
Single nucleotide polymorphism
Topologically associated domain
Immortalized human aortic endothelial cells
Tumor necrosis factor-α
Transcription start site
Genetic discoveries in humans have the potential to unravel novel pathophysiological mechanisms and to pinpoint promising drug targets . However, to meet our expectations, these discoveries ought to be supported by mechanistic studies to decipher how genetic variation modulates disease risk. For genome-wide association study (GWAS) discoveries, the design of such functional experiments is particularly challenging as the vast majority of the associated variants are non-coding. Furthermore, we often ignore in which organ(s) or cell type(s) the variants act. Methods have been developed by which we can quantify the enrichment of GWAS variants within regulatory elements identified by transcriptomic or epigenomic profiling of human samples [2, 3, 4]. Although powerful, such methods remain probabilistic and further experiments are required to test their predictions. As a consequence, only a few association signals have been resolved at the molecular level [5, 6, 7].
GWAS have identified hundreds of variants associated with coronary artery disease (CAD) [8, 9, 10] and blood pressure (BP) or hypertension [11, 12]. Many of these association signals implicate excellent candidate genes and independently confirm some of the biology previously known to influence these diseases, such as the role that blood lipid levels play in CAD risk or the importance of smooth muscle contraction in controlling BP. But for many loci, we ignore how they might contribute to the development of these diseases, either because there are no obvious candidate genes nearby or because the variants are not associated with known risk factors. For instance, for CAD, it is estimated that nearly half of the ~ 140 loci identified by GWAS do not associate with the traditional risk factors (e.g., blood lipids, type 2 diabetes, blood pressure) .
Annotation of GWAS discoveries for CAD and BP has revealed an enrichment of associated variants near genes implicated in endothelial functions [9, 12]. Vascular endothelial cells form the inner layer of blood vessels and play a critical role in the etiology of CAD and hypertension. Indeed, healthy endothelial cells form a selective barrier between the blood and the intima for many macromolecules, respond to hemodynamic changes, control the vascular tone, and regulate platelet functions, inflammatory responses, and smooth muscle cell growth and migration . Despite their pathophysiological importance and the noted overlap with GWAS findings, endothelial cells have not been studied extensively to provide further insights into genotype-phenotype associations for CAD and BP/hypertension. Here, we profiled the transcriptome, epigenome, and 3D chromosome conformation of vascular endothelial cells and integrate these results with CAD- and BP-associated genetic variants. Because the effect of genetic variation can be specific to certain pathological states , we characterized not only resting endothelial cells, but also cells activated with the inflammatory cytokine tumor necrosis factor-α (TNFα). Finally, we used our datasets to generate mechanistic hypotheses and tested one such prediction at a CAD locus using the CRISPR/Cas9 genome editing system.
Transcriptomic and epigenomic changes in endothelial cells upon activation
To correlate changes in gene expression with chromatin activity, we also profiled open chromatin regions by Assay for Transposase-Accessible Chromatin using sequencing (ATACseq) in teloHAEC treated or not with TNFα for 4 or 24 h. By combining data from these different time points, we identified 95,491 ATACseq peaks, including 3138 peaks (3.3%) that are differentially opened or closed (FDR < 0.1% and |LFC|) > 0.3) upon TNFα stimulation (Fig. 1c for the comparison of NT vs. 4 h TNFα treatment, Additional file 1 for all other comparisons, and Additional file 4 for the complete list of differentially opened or closed ATACseq peaks). Although results in Fig. 1c seem to indicate that most ATACseq peaks open upon TNFα treatment, a density analysis of these data points shows that most ATACseq peak LFC are centered at 0 (Additional file 5). As for the transcriptional response, the magnitude of open chromatin regions defined by ATACseq was highly concordant between teloHAEC and HCAEC (Fig. 1d). We employed an in silico footprinting method to determine which transcription factor binding motifs are over-represented within differentially opened teloHAEC ATACseq peaks following TNFα treatment (Additional file 6) . Many of these transcription factors are involved in inflammatory responses (e.g., JUN, FOS, NFKB1/2) (Additional file 7). To further characterize our ATACseq open chromatin dataset, we generated histone H3 lysine 27 acetylation (H3K27ac) data in NT and TNFα-treated teloHAEC using chromatin immunoprecipitation followed by sequencing (ChIPseq). H3K27ac marks highlight regions of active transcription and are found at enhancers and promoters . Within each condition (NT or with TNFα), we found that 70–74% of the ATACseq peaks intersected with H3K27ac peaks.
3D chromosomal architecture in endothelial cells
One outstanding challenge in gaining biological insights from GWAS discoveries is to connect variants located in non-coding regulatory elements with their target genes. When cells or tissues from many human donors are profiled, it is possible to use the covariance between open chromatin regions and expression levels of nearby genes to infer that connection. As an alternative solution to link genes and regulatory elements in the context of endothelial dysfunction, we generated genomic contact maps by Hi-C using untreated and TNFα-stimulated (4 h) teloHAEC. For each condition, the contact matrices were highly concordant across biological replicates (Pearson’s correlation r > 0.95 for the contact matrices at 10-kb resolution across all replicates), allowing us to combine datasets to increase the signal-to-noise ratios of our analyses.
Given the central role that TADs play in the regulation of gene expression, we next asked where within TADs are located ENCODE enhancers predicted by histone marks . In contrast to TSSs, we found that enhancers defined in HUVEC by ENCODE were more uniformly distributed with a slight enrichment in the middle of teloHAEC TADs as opposed to the boundaries (Fig. 4a and Additional file 10). Finally, we mapped CAD- and BP-associated SNPs into TADs and compared their physical distance from the closest TAD boundary with the distance of non-associated matched SNPs. Because of the relatively small number of CAD and BP sentinels SNPs (175 and 357 variants, respectively), the distributions of their position relative to the TAD boundaries were uneven (Fig. 4b, c and Additional file 10). For both CAD and BP, associated SNPs tended to be closer from the nearest TAD boundary than matched SNPs (median distance 75 kb for associated SNPs vs. 103 kb for matched SNPs, empirical P values ≤ 0.04), although a larger number of sentinel variants would be needed to provide a definitive answer to this question.
Linking GWAS SNPs and regulatory elements with genes
We used the Hi-C contact matrices to call loops between regulatory elements that contain CAD- or BP-associated variants and the promoter of genes expressed in teloHAEC. To further refine this list, we applied several criteria: we considered 3D loops supported by ≥ 20 Hi-C reads, we excluded genes that are not expressed or expressed at low levels (bottom 10 percentile) in teloHAEC, and we prioritized open chromatin regions that contain CAD or BP SNPs that are expression quantitative trait loci (eQTL) for the linked genes in the GTEx dataset (P value < 0.001) . After filtering, this analysis identified 991 combinations of open chromatin regions and genes linked by physical 3D interactions and eQTL results (Additional files 11, 12, and 13). These combinations map to 38 CAD and 92 BP GWAS loci. The average physical distance between these regulatory elements and gene promoters is 154 ± 158 kb (Additional file 11).
GWAS have identified hundreds of variants robustly associated with CAD and BP/hypertension. Despite recent efforts, the causal variants, genes, and tissues/cell types remain largely unknown at these loci. In this study, we tested the hypothesis that some of these genetic associations are mediated through the activity of DNA sequence variants that control gene expression upon vascular endothelial cell activation. We profiled the transcriptome (RNAseq) and open-chromatin genome (ATACseq) of resting and TNFα-activated immortalized human aortic endothelial cells (teloHAEC). We focus on these transformed cells in order to develop a system amenable for efficient genome editing experiments, an essential component of any GWAS follow-up program. We confirmed the RNAseq and ATACseq results from teloHAEC in primary human coronary artery endothelial cells. Furthermore, we generated and characterized genome-wide chromosome conformation Hi-C contact matrices from NT and TNFα-treated teloHAEC cells to physically link regulatory elements and expressed genes. By integrating our results with publicly available epigenomic datasets from ENCODE, eQTL results from GTEx, and GWAS discoveries for CAD and BP, we created a dynamic regulatory map of vascular endothelial cells. Through this map, we identified CAD and BP variants that overlap with open chromatin regions which themselves physically interact with often distant gene promoters in a specific cellular inflammation/non-inflammation context (Additional file 11).
To support our results, we tested one prediction by deleting a TNFα-induced ATACseq open chromatin region in teloHAEC using CRISPR/Cas9. In heterozygous clones that carry this ~ 1 kb deletion, the expression of AIDA induced by TNFα treatment was strongly hindered (Fig. 5b, c). This is a promising result given that AIDA is differentially expressed in teloHAEC following TNFα treatment (NT vs. 4 h, LFC = 0.49, FDR = 5.2 × 10−19; Additional file 2) and the AIDA promoter interacts with the ATACseq peak as determined by Hi-C (Additional file 11). This locus, defined by the sentinel GWAS variant rs67180937, is associated with CAD and includes 33 other variants in strong LD (r2 > 0.8 in European populations from the 1000 Genomes Project). Our deletion, however, only encompasses one of these 34 SNPs, rs17163363, which is an eQTL for AIDA in GTEx (P = 1.4 × 10−6). rs17163363 does not overlap perfectly with transcription factor binding motifs, although it is located 14 and 23 base pairs away, respectively, from NKX2-5 and MEF2A binding sites. MEF2 transcription factors have previously been implicated in CAD .
Despite several attempts, we failed to identify teloHAEC clones that are homozygous for the ATACseq peak deletion at the AIDA locus. This might indicate that baseline expression levels of AIDA, MIA3, and/or potentially other genes controlled by this regulatory element are essential for teloHAEC cell survival. An extension of this observation is that complete bi-allelic deletion of regulatory elements by CRISPR/Cas9, an approach now routinely attempted to functionally characterize GWAS loci, will often fail or generate negative results that are difficult to interpret. This highlights the importance to develop efficient and high-throughput protocols to combine genome editing and homology-directed repair to precisely replace candidate functional alleles in human cells . Although rs17163363 is the only variant in LD with the CAD sentinel variant rs67180937 within the CRISPR/Cas9 deletion generated at the AIDA locus, we cannot conclude that it is causal as other unknown variants in the deleted region could mediate the effect on AIDA expression. To address the potential causal role of rs17163363 in CAD, we propose that an allele replacement experiment, potentially mediated by CRISPR/Cas9 homology-directed repair, is needed.
Our results implicate AIDA in an inflammatory response that promotes atherosclerosis and CAD. Axin interaction partner and dorsalization antagonist, or AIDA, was first identified in a yeast-two-hybrid screen for interaction with the scaffold protein Axin . AIDA homodimerizes but can also physically interact with NFκB inhibitor-α (NFKBIA) and TNFα-induced protein 3 (TNFAIP3) , two genes that are highly over-expressed in teloHAEC following TNFα treatment (Additional file 2). In zebrafish, aida over-expression in embryos inhibits the dorsalizing activity of Axin by interfering with the activation of the c-Jun N-terminal kinase (JNK) . JNK are multifunctional kinases that are activated by stresses and cytokines, including TNFα, and that can control several cellular stress responses such as apoptosis . In endothelial cells, JNK is also activated in response to pro-inflammatory stimuli . Although it remains speculative, our data leads us to hypothesize that endothelial cell dysfunction mediated by the antagonizing effect of AIDA on JNK contributes to inter-individual variation in CAD risk in humans.
We anticipate that our integration map of vascular endothelial cell transcriptomic, epigenomic, and 3D conformation datasets, when combined with statistical fine-mapping of GWAS loci, will provide sufficient resolution to pinpoint causal variants and genes implicated in CAD and BP/hypertension. This map will allow further investigation into the roles that endothelial cell dysfunction plays in modulating the risk to develop these important chronic diseases. We illustrated our strategy by characterizing a TNFα-responsive regulatory element that controls the expression of the novel CAD candidate gene AIDA. Encouragingly, a recent report identified another CAD-associated regulatory variant of PLPP3 that resides within a vascular endothelial enhancer activated by shear stress , suggesting that many CAD- and BP-associated variants may influence vascular endothelial phenotypes. Finally, our results underscore the critical importance of characterizing both resting and activated cells and lead us to propose a context-dependent, TNFα-induced dysregulation of endothelial AIDA expression as a novel candidate mechanism for CAD.
Immortalized human aortic endothelial cells (teloHAEC) (ATCC, CRL-4052) were grown in vascular cell basal media (VCBM) (ATCC, PCS-100-030) supplemented with endothelial cell growth kit-VEGF (ATCC, PCS-100-041) and 200 U/mL penicillin and 200 μg/mL of streptomycin (ThermoFisher, 15140122). Primary human coronary artery endothelial cells (HCAEC) from a single male donor (ATCC, CC-2585) were grown in EGM-2MV (Lonza, CC-3202) supplemented with 200 U/mL penicillin and 200 μg/mL of streptomycin. TeloHAEC and HCAEC were maintained under a 5% CO2 atmosphere at 37 °C and subcultured to 90% and 70–85% confluency, respectively. Both cell lines were used below three passages after thawing for all experiments.
Endothelial dysfunction induction
Endothelial cells were treated with concentrations ranging from 0.1 to 10 ng/mL of TNFα (PeproTech, 300-01A) prepared in culture media for 4 h and 24 h periods. Treatment with 10 ng/mL induced the most substantial endothelial dysfunction related alterations in both teloHAEC and HCAEC without significantly altering cell proliferation and viability. Two independent biological replicates of 10 ng/mL, 4 h only (Hi-C) or 4 and 24 h (RNAseq, ATACseq, ChIPseq) TNFα treatments for each cell line were used for data generation unless stated otherwise. Non-treated (NT) cells grown in parallel were used as control.
RNA extraction and quantitative PCR
TeloHAEC cells were seeded at 2 × 105 cells per well in 6-well plates, grown for 3 days (refreshed media at day 2) until reaching 95–100% confluency and subjected to TNFα treatment as described above. In order to guarantee the reliability and reproduction of the results, RNA extraction, cDNA synthesis, and qPCR experiments were conducted in accordance to the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines . Total RNA was extracted using RNeasy Plus Mini kit (Qiagen) and analyzed with an RNA 6000 Nano kit (Agilent Technologies) to assess its concentration and integrity on an Agilent 2100 Bioanalyzer. Also, no contamination was found within RNA extracts as assessed by spectrophotometry using Take3 Micro-Volume plates (Biotek) or BioDrop μLite with absorbance ratio of 260/280 nm in a range of 2.0–2.15 for all samples. cDNAs were then generated by reverse transcription from 1 μg of total RNA (with RNA integrity number of 10 for all samples) using 1 U of MultiScribe Reverse Transcriptase, 100 mM dNTPS, 20 U of RNase inhibitor and 1× Random Primers (Applied Biosystems, 4,374,966) in a 20 μL volume reaction. Reverse transcription reaction was carried in three steps: 10 min at 25 °C, 120 min at 37 °C, and 5 min at 85 °C. qPCR reactions were set up with 1.25 μL of cDNA (1/50 dilution based on dynamic range of previously done standard curve for all target genes), 5 μL of Platinum SYBR Green qPCR SuperMix-UDG (ThermoFisher, 11733046), and 3.75 μL of primer pair mix at 0.8 μM each. qPCR reaction for each gene was performed in triplicates and carried out in a CFX384 Touch Real-Time PCR Detection System (Bio-Rad, 1855485) with the following thermal profile: 2 min at 50 °C, 15 min at 95 °C and a three-step cycle of 10 s at 95 °C, 15 s at 55 °C, and 15 s at 72 °C repeated 40 times. Following the amplification process, a melting curve analysis was performed to ensure the specificity of the amplified products. Also, resulting amplification products from previous qPCR standard curve experiments were run on 1% agarose gel and purified prior to Sanger sequencing in order to validate amplification of the desired target. To assert the absence of undesired contamination, qPCR reactions with no template controls for each gene were carried out simultaneously with no fluorescence detected. Cq values corresponding to the number of cycles to reach quantification threshold were determined with the CFX Manager 3.1 (Bio-Rad) software for all genes. Relative expression level for the axin interactor, dorsalization associated (AIDA) gene were calculated by the ΔΔCT method  normalized with the three reference genes glyceraldehyde 3-phosphate dehydrogenase (GAPDH), hypoxanthine phosphoribosyltransferase 1 (HPRT1), and TATA-binding protein (TBP). Based on geNORM principles for accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes, a mean M value always below 0.35 was generated from the GAPDH, HPRT1, and TBP genes for all qPCR experiments. All primers were obtained from IDT Technologies. The primers sequences are listed in Additional file 15.
RNAseq and differential gene expression analysis
Stranded cDNA libraries prepared from quality-controlled RNA (see above) were sequenced using Illumina 100-bp paired-ends on a HiSeq 4000 platform, generating 50–60 million reads per condition per biological replicate. Reads were mapped to hg19 using hisat2 (http://ccb.jhu.edu/software/hisat2/index.shtml). Samtools was used to sort the reads and convert to the BAM format. Transcripts were first identified for each sample, and then pooled together using stringtie (http://ccb.jhu.edu/software/stringtie/index.shtml). Transcript abundance was estimated by stringtie, and a fragments per kilobase of transcript per million (FPKM) count table was generated. Differential analysis of gene expression was performed using DESeq2 . All possible comparisons for NT, TNFα 4 h, and 24 h treatments were performed using the analysis of deviance function with default parameters. Genes with a false discovery rate (FDR, Benjamini & Hochberg correction) < 0.1%, and log10 fold-change > 0.3 or < − 0.3 in any of the 3 possible comparisons (NT vs. 4 h; NT vs. 24 h; 4 h vs. 24 h) were considered differentially expressed. Corresponding biological replicates output were merged using UCSC BigWig and BigBed tools  for visualization purposes in the WashU Epigenome Browser .
Assay for transposase-accessible chromatin with high throughput sequencing (ATACseq)
TeloHAEC and HCAEC cells were seeded at 2 × 105 cells per well in 6-well plates, grown for 3 days (refreshed media at day 2) until reaching 95–100% confluency and subjected to TNFα treatment as described above. Adherent cells were detached using Trypsin-EDTA (ATCC, PSC-999-003) and subsequently neutralized by Trypsin Neutralizing Solution (ATCC, PSC-999-004). Following endothelial cell activation, ATACseq libraries were prepared as previously described  with the following specifications and modifications: 5 × 104 cells were spun down at 500 g for 5 min at 4 °C. Whole cell pellets were subjected to a first round of cell membrane lysis using 50 μL of ice-cold hypotonic buffer (0.1% Sodium citrate tribasic dehydrate (Sigma-Aldrich, C8532); 0.1% Triton X-100 (Sigma-Aldrich, X100)) and incubating on ice for 30 min. The hypotonic buffer was removed by centrifugation at 500 g for 5 min at 4 °C, and we subsequently discarded the supernatant. Crude nuclei lysates were prepared by resuspending cells in lysis buffer (10 mM Tris-HCl pH 7.4 (Fisher Scientific, BP-153-1); 10 mM NaCl (Fisher Scientific, BP-358-212); 3 mM MgCl2 (Sigma-Aldrich, M8266); 0.1% Igepal CA-630 (Sigma-Aldrich, I8896) and incubating for 30 min on ice. Following the removal of lysis buffer by centrifugation at 500g for 5 min at 4 °C, transposase reaction of open chromatin was achieved by resuspending free nuclei in tagmentation mix (22.5 μL Tagment DNA Buffer; 2.5 μL Tagment DNA enzyme; 25 μL H2O) (Illumina, FC-121-1030) and incubating at 37 °C for 30 min. Purification of DNA was performed with MinElute (Qiagen, 28004) according to the manufacturer’s protocol. Barcoding and amplification was prepared using Nextera Index Kit (Illumina, FC-121-1011) as previously described  with the following thermal profile: 30 s at 98 °C and a three-step cycle of 10 s at 98 °C, 30 s at 63 °C, and 1 min at 72 °C repeated 12 times followed by 5 min at 72 °C. Amplified ATACseq libraries were purified using GeneRead Size Selection Kit (Qiagen, 180514) according to the manufacturer’s protocol. Quality and quantity of final ATACseq libraries were assessed with the High Sensitivity DNA kit (Agilent, 5067-4626) ran on an Agilent 2100 Bioanalyzer. ATACseq libraries were sequenced using Illumina 125-bp paired-ends sequencing on a HiSeq2500 platform with, generating between 38 and 43 million reads per condition per biological replicate.
ATAC library reads were processed through the ATACseq pipeline (https://github.com/kundajelab/atac_dnase_pipelines). Adapters were removed using Cut-adapt. Reads were then mapped to hg19 using Bowtie2. Peak calling from BAM files was performed using MACS2 . To create a “masterBED” peak file across conditions, peak files generated for each condition were merged using the merge function from BEDTools . Mean scores from bedGraphs for each individual biological replicate were assigned to masterBED peak files using intersect (default parameters) and merge (-o mean) and used as input for differential analysis using DESeq2 . All comparisons for NT, TNFα 4 h, and 24 h treatments were performed using the analysis of deviance function with default parameters in DEseq2. ATACseq peaks with a false discovery rate (FDR, Benjamini & Hochberg correction) < 0.1%, and log10 fold-change > 0.3 or < − 0.3 in any of the 3 possible comparisons (NT vs. 4 h; NT vs. 24 h; 4 h vs. 24 h) were considered differentially opened or closed. Corresponding biological replicates bedGraphs output from MACS were merged using UCSC BigWig and BigBed tools  for visualization purposes in the WashU Epigenome Browser . For in silico footprinting, we used CENTIPEDE with default parameters . For the enrichment analyses of CAD, BP and BMI SNPs in open chromatin regions, we retrieved sentinel variants from published large-scale GWAS [8, 11, 21]. We identified proxy variants in linkage disequilibrium (r2 > 0.8) using populations of European ancestry from the 1000 Genomes Project .
Chromatin immunoprecipitation of H3K27 acetylation (H3K27ac) combined with high throughput sequencing (ChIPseq)
TeloHAEC were seeded at 1.4 × 105 cells per 100 mm plates (1 plate per condition, 3 independent biological replicates), grown to 90–100% confluency (refreshed media every 2–3 days) and subjected to TNFα treatment as described above. Cells were washed with HBSS (Gibco, 14170161) and fixed in paraformaldehyde (PFA) 1% (Fisher Scientific, 15710) for 10 min at room temperature (RT). PFA was quenched in 134 mM glycine for 5 min at RT. Fixed cells were washed with ice-cold PBS and collected with a cell scraper in ice-cold PBS. Cells were pelleted by centrifugation, washed in ice-cold PBS, and pelleted again before snap freezing in liquid nitrogen. Fixed cells were subject to lysis in 5 mM PIPES-pH 8.5, 85 mM KCl, 1% (v/v) IGEPAL CA-630, 50 mM NaF, 1 mM PMSF, 1 mM Phenylarsine Oxide, 5 mM Sodium Orthovanadate and protease inhibitor cocktail (Sigma, 04693159001). Nuclei were then lysed in 50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1% (w/v) SDS, 50 mM NaF, 1 mM PMSF, 1 mM phenylarsine oxide, 5 mM sodium orthovanadate and protease inhibitor cocktail. Chromatin immunoprecipitation was performed as previously described using 3.7 μg of H3K27ac antibody (Diagenode; C15410196) per samples containing ~ 500,000 cells . ChIPseq libraries were sequenced using Illumina 100-bp paired-end read sequencing on a NovaSeq 6000 instrument for approximately 150 million reads per sample. H3K27ac ChIPseq library raw reads were filtered for quality (phred33 ≥ 30) and length (n ≥ 32), and adapter sequences were removed using Trimmomatic . Filtered reads were aligned to hg19 using BWA and peaks subsequently called using MACS2  with non-IP input DNA as control. Corresponding bedGraphs output of biological replicates and input controls from MACS were merged using UCSC BigWig and BigBed tools  for visualization purposes in the WashU Epigenome Browser .
In situ Hi-C library preparation and analysis
TeloHAEC were seeded at 1.4 × 105 cells per 100 mm plates (4 plates per condition), grown to 90–100% confluency (refreshed media every 2–3 days) and subjected to the 4 h TNFα treatment as described above. In situ Hi-C libraries were prepared as previously described  with the following specifications and modifications: approximately 8 × 106 cells per sample were crosslinked, pelleted and washed in ice-cold PBS prior to lysis and chromatin digestion with DpnII. Reverse crosslinking was performed in two subsequent 16 and 2 h incubations with 500 μg of proteinase K prepared at 10 mg/mL in 5 mM Tris-HCl pH 7.5, 50% glycerol, 1 mM CaCl2 for each step. DNA purification was performed using 15 mL MaXtract High Density tubes (Qiagen, 129,065). Pre-NGS Hi-C DNA was quantified and quality-controlled with a DNA 7500 kit (Agilent, 5067-1506) ran on an Agilent 2100 Bioanalyzer. Prior to next-generation DNA sequencing (NGS), DNA extractions for quality control of chromatin integrity, digestion efficacy were performed with the following procedure: 1 volume of Phenol:Chloroform:Isoamyl Alcohol (25:24:1 v/v) (Invitrogen, 15593031) was added to lysate, vortexed and transferred to pre-spun Phase Lock Gel (VWR, 10847-800) and centrifuged for 5 min at 16,000g. The aqueous phase was kept, concentrated by speed-vacuum and subjected to 0.8% agarose gel electrophoresis. Quality-control 3C-PCR of pre-NGS Hi-C libraries was performed in the ENr313 region using 800 ng of template DNA, PfuUltra II Fusion HotStart DNA Polymerase (Agilent, 600672), 400 nM ENr313_DpnII_Anchor1 primer #1, 400 nM ENr313_DpnII_Anchor1_Near primer #2 and 250 μM dNTPs with the following thermal profile: 2 min at 95 °C and a three-step cycle of 30 s at 95 °C, 30 s at 60 °C and 30 s at 72 °C repeated 35 times followed by 8 min at 72 °C . For NGS preparation, between 20 and 40 μg of purified Hi-C DNA was used as starting material for all subsequent steps. Sonication to 200-300 bp fragments was carried in an S2 Focused-ultrasonicator with no alterations to the suggested parameters. Biotin pulldown was performed with 200 μg of Dynabeads MyOne Streptavidin C1 (Invitrogen, 65001) per sample. Production PCR was carried out with 9 cycles of PCR to obtain sufficient quantity for NGS while limiting PCR duplicates. Quality and quantity of final Hi-C libraries were assessed on High Sensitivity DNA kit (Agilent, 5067-4626) ran on an Agilent 2100 Bioanalyzer. Final Hi-C libraries were sequenced using Illumina 100-bp paired-ends sequencing on a Novaseq 6000, generating between 0.72 and 0.88 billion reads per condition per biological replicate.
Hi-C reads were processed using the Juicer pipeline . Hi-C libraries for all biological replicates had reads with the following quality measures: less than 10% below MAPQ threshold of 30 (average of 9.15%), more than 62% intra-chromosomal interactions (average of 68.5%) and less than 26% of inter-chromosomal interactions (average of 20.3%). Correlation between biological replicates was assessed (Pearson’s r, 10 kb resolution > 0.95; 50 kb resolution > 0.97; 100 kb resolution > 0.98) before merging to increase statistical power. Contacts maps were normalized with Knight-Ruiz (KR) matrix balancing before all downstream analyses.
A/B compartments calling and analysis
Per chromosome principal component analysis (PCA) was performed by calling the eigenvector function from the Juicer pipeline using 50 kb resolution matrices with KR normalization. Using the R packages TxDb and Sushi, PC1 values were aligned to gene density in 50 kb windows. If needed, the sign of PC1 was adjusted to correlate positive PC1 values with gene-rich regions and negative PC1 with gene-poor regions. Contiguous bins of positive and negative PC1 were labeled as A and B compartments, respectively. Switching from A-to-B and B-to-A compartments upon TNFα treatment was retrieved from the differences in A/B compartments called between NT and TNFα-treated cells. Genes, ATACseq peaks, BP and CAD SNPs mapping to switching compartments were identified using map and merge functions from BEDTools with default parameters.
Topologically associated domains (TADs) calling and analysis
TAD calling was performed on teloHAEC (NT and 4 h TNFα). KR normalized sparse matrices of 10 kb resolution were extracted from .hic files by calling the dump function from the Juicer pipeline . TAD calling was performed using the Crane insulation score algorithm  Git version eecc2c9, with the default parameters (insulation delta span = 200 kb, insulation square size = 500 kb, insulation mode = “mean,” boundary margin of error = 3, noise threshold = 0.1). TADs that overlap with the centromeres, as well as regions at either end of each chromosome, were excluded from analyses. To determine if a TAD boundary overlapped with a feature (e.g., SNPs, ChIPseq, TSS, enhancer, promoter), we added a 10 kb outward buffer to the boundary coordinates. To determine if TADs were stable or changed following TNFα treatment, we added a 20-kb outward buffer to the boundary coordinates. The physical distance of CAD and BP sentinel SNPs with the closest TAD boundary was compared with the distance of control SNPs matched based on minor allele frequency, gene density, gene proximity and the number of LD proxies using SNPsnap default parameters . To derive empirical P values, we considered the median distances to the closest TAD boundary of 100 sets of matched SNPs and compared them to the median distance of the CAD or BP SNPs.
Loop calling between regulatory regions and promoters
Hi-C reads were processed using the HiC-Pro pipeline (https://github.com/nservant/HiC-Pro). hichipper (https://github.com/aryeelab/hichipper) was used to call loops between promoters and ATACseq peaks that harbor CAD or BP GWAS SNPs. Gene promoters’ coordinates were downloaded from the EPDnew database (https://epd.vital-it.ch/human/human_database.php). The detailed steps used to integrate and combine the GWAS, RNAseq, ATACseq, Hi-C, and GTEx data are provided in Additional file 11.
CRISPR/Cas9 genome editing
Pairs of guide RNAs (sgRNAs) were designed for each targeted genomic deletion and cloned into the pHKO9 vector under control of the same U6 promoter (Additional file 16). HEK 293 T cells were seeded at 5 × 105 cells/well in 6-well plates for 24 h. Lentivirus were produced by co-transfecting the envelope and packaging plasmids pMD2G and psPAX2 respectively with the dual sgRNA expressing pHKO9 vector in HEK 293 T cells using Lipofectamine 2000 (ThermoFisher, 11,668,027) for 4 h then switched to virus-producing media containing 10 μg/mL of BSA. Viral supernatant was harvested 48 h and 72 h following transfection and filtered through 0.45 μm filters. TeloHAEC cells stably expressing an active Cas9 protein were seeded at 2 × 105 cells/well in 6-well plates and later infected with the virus preparation and media containing 0.7 μg/mL of polybrene (Sigma, H9268). Selection with 200 μg/mL of G418 (Fisher, MT30234CR) was started 48 h post-infection. Antibiotic selective pressure was maintained for 5–6 days or until non-infected cells were dead. Sub-populations of 50 cells were derived and screened via PCR using primers surrounding the expected deletion (out-out PCR) (Additional file 16). Clonal cell lines were then derived from a PCR-positive deletion sub-population. Another round of out-out PCR was performed on these select clonal cell lines and PCR products were purified and cloned into pDrive cloning vector system (Qiagen, 231122) or into the pUC19 vector using In-Fusion HD cloning system (Takara, 638,909). Genotypes of all possible alleles were confirmed by gel electrophoresis (Additional file 14) and Sanger sequencing. Select genome reengineered clones were then seeded at 9 × 104 cells per well in 12-well plates, grown to 90–100% confluency (refreshed media every 2–3 days) and subjected to a 4-h TNFα treatment as described above. Total RNA was then extracted, quantified, quality controlled and reverse transcribed as described above. qPCR was performed for the target gene anchored at the receiving end of the chromatin loop with 2 different primer pairs capturing exons in either 5′ or 3′ of AIDA.
We thank the members of our laboratories for comments and Albena Pramatarova for scientific discussions and technical assistance.
The review history is available at Additional file 17.
SL, V-ACF, and GL conceived and designed the experiments. SL, V-ACF, SMdB, FL, MB, M-MS, RD, TK, and KSL performed the experiments. TP and GL secured funding. All authors analyzed the results. SL, SMdB, KSL, and GL wrote the manuscript with contributions from all authors. All authors read and approved the final manuscript.
This work was funded by the Canadian Institutes of Health Research (MOP #136979), the Heart and Stroke Foundation of Canada (Grant #G-18-0021604), the Canada Research Chair Program, and the Montreal Heart Institute Foundation (to GL). This work was also supported by the NIH Common Fund Program, grant U01CA200147, as a Transformative Collaborative Project Award (TCPA) TCPA-2017-PASTINEN / CIHR NTC-154083.
Ethics approval and consent to participate
Not applicable. Experiments were performed with commercially available cell lines.
Consent for publication
The authors declare that they have no competing interests.
- 40.Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol. 2015;109:21 9 1–9.Google Scholar
- 50.Lalonde S, Codina-Fauteux V-A, Méric de Bellefon S, Leblanc F, Beaudoin M, Simon M-M, et al., Integrative analysis of vascular endothelial cell genomic features identifies AIDA as a coronary artery disease candidate gene. NCBI Gene Expression Omnibus (GEO). Accession GSE126200. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126200.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.