Background

In the heart, the ventricular conduction system propagates the electrical impulses that coordinate ventricular chamber contraction. The QRS interval on an electrocardiogram (ECG) is used clinically to quantify duration of ventricular depolarization in the heart. Prolonged QRS duration is an independent predictor of mortality in both the general population [1,2,3,4] and in patients with cardiac disease [5,6,7,8,9,10].

QRS interval duration is a quantitative trait influenced by multiple genetic and environmental factors and is known to be influenced by both age and gender [11, 12]. The heritability of QRS duration is estimated to be 35–55% from twin and family studies [13,14,15,16].

We previously performed a genome-wide association meta-analysis in 40,407 individuals and identified 22 genetic loci associated with QRS duration [17]. The QRS-associated loci highlighted novel biological processes such as kinase inhibitors, but also pointed to genes with established roles in ventricular conduction such as sodium channels, transcription factors, and calcium-handling proteins. However, the common risk variants identified in genome-wide association studies (GWAS) reside overwhelmingly in regulatory regions, making inference of the underlying causative genes difficult. Furthermore, as with most complex traits, the variants discovered to date explain only a small proportion of the total heritability (the “missing heritability” paradigm), suggesting additional variants are yet to be identified. In fact, the role of rare and low frequency variants, which cannot currently be detected using standard genome-wide single nucleotide polymorphism (SNP) chip arrays, have not been fully investigated. Here we used the Illumina HumanExome BeadChip to focus on rare (MAF < 1%), low frequency (MAF = 1–5%), and common (MAF ≥ 5%) putative functional coding variation associated with changes in ventricular depolarization.

Results and Discussion

We combined genotype data from 77,898 participants of European ancestry and 7695 of African descent participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Exome-Chip EKG consortium (Additional file 1: Table S1). A total of 228,164 polymorphic markers on the exome-chip array passed quality control and were used as a basis for our analyses. Through single variant analysis in the combined European and African datasets, we identified 34 variants across 28 loci associated with QRS duration that passed the exome-chip-wide significance threshold (P < 6.17 × 10−8 for single variants [Table 1, Additional file 2: Figure S1]). Eight of the identified loci were novel and five of these were driven by low frequency (MAF < 5%) and common (MAF ≥ 5%) non-synonymous coding variation. We confirmed 20 of the 29 previously identified QRS duration loci [14, 17,18,19], the remaining loci were not covered by the Exome-Chip and/or did not pass quality control (QC) (Additional file 1: Table S2). As might be anticipated when combining two ancestries in association analyses, we detected heterogeneity of effects for one variant (Cochran’s heterogeneity P < 1.47 × 10−3, a Bonferroni corrected P value of α=0.05/34 variants), Additional file 1: Table S2). We did not observe evidence for inflation of test statistics for any of the analyses (λGC = 1.049, European and African ancestries, combined, Additional file 2: Figure S2, individual ancestry results, Additional file 2: Figures S3–S6). We next sought to replicate the 34 lead variants of our 28 loci in a replication meta-analysis of 111,874 individuals from the UK Biobank [20] and deCODE genetics [21] cohorts. In the replication meta-analysis, 30 lead variants for 25 loci replicated (P ≤ 1.47 × 10−3 = 0.05/34 variants), seven of which were novel, ten of which are known (Additional file 1: Table S2). The remaining four variants that did not replicate in UK Biobank encompass two previously established loci (one in locus SCN5A/SCN10A for which the other five variants replicated) and two novel loci (SENP2, IGF1R). This is likely due to differences in phenotype acquisition methods (UK Biobank having exercise ECGs measured), though effect size directions between discovery and replication remained consistent and P values of non-replicating variants were all below nominal significance (P < 0.05).

Table 1 Lead SNPs for 28 loci identified for QRS duration in a combined European and African American ancestry meta-analysis

Sex-specific associations with QRS duration

Sex differences in QRS duration are well established (men have significantly longer QRS durations than women [22, 23]), and might be attributable to differential effects of genetic variation in men and women. Therefore, we performed sex-stratified association analyses (Additional file 1: Table S3, Additional file 2: Figures S7 and S8). We included only those studies that had both male and female participants to mitigate potential bias due to contributions from single-sex cohorts. In total, up to 31,702 men and 39,907 women were included from both European and African ancestry studies. We found suggestive evidence for a sex-specific locus that was not identified in the combined analysis. The non-synonymous variant rs17265513 (p.Asn310Ser) in ZHX3 (zinc fingers and homeoboxes 3) showed a significant association only in men (Pmale = 4.89 × 10−8, β(SE) = − 0.52(0.09)), whereas no effect was observed for women (Pfemale = 0.86, β(SE) = − 0.01(0.08)); however, there was no significant difference consistent with an interaction with sex (P = 2.3 × 10−5). Additionally, no further evidence was observed in the replication analyses alone (Pmale = 7.95 × 10−4, β(SE) = − 0.30(0.09), Nmales = 50,457), (Pfemale = 3.55 × 10−2, β(SE) = − 0.17(0.08), Nfemales = 61,417).

Association of coding and non-coding variants with QRS duration

Among the eight newly identified loci in the sex-combined analysis, five had lead variants that were non-synonymous: CCDC141 (Coiled-Coil Domain Containing 141); KLHL38 (Kelch Like Family Member 38); DLEC1 (Deleted in Lung and Esophageal Cancer 1); NACA (Nascent Polypeptide-Associated Complex Alpha subunit); and SENP2 (SUMO1/Sentrin/SMT3 Specific Protease 2). Suggestive evidence for association of the same non-synonymous variant in CCDC141 (rs17362588; P = 4.75 × 10−7) and an intronic variant in KLHL38 (rs11991744; P = 1.25 × 10−7) with QRS duration was shown in two earlier GWAS [24, 25]. DLEC1 has recently been suggested to have a possible role as a tumor suppressor [26], and while specific roles for KLHL38 and CCDC141 (a centrosome associated protein) have not yet been elucidated, they show the highest expression in skeletal and/or cardiac tissue, respectively, among the tissues examined in the Genotype-Tissue Expression (GTEx) Portal database (http://www.gtexportal.org) [27]. Two of the novel loci, NACA and SENP2, have established roles in cardiac development and dysfunction. NACA produces the isoform skNAC (skeletal NACA) and acts as a skeletal muscle- and heart-specific transcription factor and is critical for ventricular cardiomyocyte expansion [28]. Cardiac-specific knockdown of skNAC in a Drosophila Hand4.2-Gal4 driver cell-line results in severe cardiac defects [19]. Cardiac-specific overexpression of SENP2, a SUMO-specific protease, leads to congenital heart defects and cardiac dysfunction [29].

In the sex-stratified analysis, the association with ZHX3 (Zinc Fingers and Homeoboxes 3) was also driven by an amino acid changing variant. ZHX3 encodes a transcriptional repressor whose functions are largely unknown. However, the sex-specific association might be explained by hormonal changes that have previously been hypothesized to explain a variety of sex-specific differences observed in ECG measures and conduction disorders [30, 31]. A sex-specific association of ZHX3 has also been previously shown for total cholesterol levels (the effect is only significant in men) [32].

We further identified an intronic variant in the IGF1R (Insulin Like Growth Factor 1 Receptor) locus and two intergenic variants: rs4549631 at locus 6q22.32 and rs961253 at locus 20p12.3. Interestingly, when queried against results from the GTEx project portal [27] for blood and eight tissues (including adipose [subcutaneous], artery [aorta, coronary, tibial], heart [atrium, appendage, left ventricle], lung, muscle [skeletal], nerve [tibial], skin [sun exposed], and thyroid), the lead intronic variant in IGF1R (rs4966020; MAF EA/AA 0.36/0.63) is a left ventricle tissue-specific cis-eQTL (P = 2.4 × 10−7). The variant is also in strong linkage disequilibrium with the strongest cis-eQTL for this tissue (rs4966021, P = 5 × 10−8). IGF1R promotes physiological hypertrophy but protects against cardiac fibrosis [33]; the signaling pathways induced by its binding partner, IGF1, regulate contractility, metabolism, hypertrophy, autophagy, senescence, and apoptosis in the heart [34]. The nearest genes for the two intergenic variants are PRELID1P1 (PRELI Domain Containing 1 Pseudogene 1 [locus 6q22.32]) and CASC20 (Cancer Susceptibility Candidate 20 [non-protein-coding]; locus 20p12.3)—the former a pseudogene and the latter a non-protein-coding gene, both with currently uncharacterized function.

Rare ADAMTS6 variants are associated with QRS duration

By collapsing rare variants in genes as functional units and jointly testing these for association, substantial statistical power-gains can be achieved [35]. We, therefore, performed gene-based analyses using both the Sequence Kernel Association Test (SKAT) (Additional file 1: Table S4) and burden test (T1) (Additional file 1: Table S5), because these tests have optimal power under different scenarios. Analyses were restricted to variants with MAF < 1% in a total of 16,085 genes. One gene-based significant association (P < 5.18 × 10−7) was identified in ADAMTS6 (A Disintegrin-Like And Metalloproteinase with Thrombospondin Type 1 Motif 6; PSKAT = 8.18 × 10−8, Table 2), when including only variants classified as damaging (see “Methods”). Four additional genes showed suggestive evidence of association (P < 1 × 10−4) (Table 2).

Table 2 Gene-based test association results (for genes with variants classified as damaging)

The ADAMTS6 gene-based signal is driven by two rare non-synonymous variants: rs61736454 (p.Ser90Leu) and rs114007286 (p.Arg603Trp), which have allele frequencies of 0.0018 and 0.0021, respectively (Additional file 1: Table S6). Notably, a look-up in the independent deCODE QRS duration analysis showed that rs61736454 was highly significant, however not exome-wide ([P = 2.65 × 10−7, β(SE) = 3.01(0.58)], MAF = 0.002, N = 59,903), and was extremely well imputed (info score = 0.995). Importantly, after meta-analysis with discovery exome summary statistics, the signal reached exome-wide significance ([P = 8.96 × 10−13, β(SE) = 2.75(0.38)], N = 145,496), underscoring the robustness of our initial discovery signal driver. Data for rs114007286 were not available. ADAMTS6 is a highly constrained gene, with a probability of loss of function intolerance score of 1.0 (pLI = 1.0) (Exome Aggregation Consortium [ExAC], Cambridge, MA, USA; http://exac.broadinstitute.org/). The p.Ser90Leu variant lies within the ADAMTS6 propeptide, which is predicted to be important for initiation of folding, because the homologous ADAMTS9 propeptide is an intramolecular chaperone essential for its secretion [36]. The second variant, p.Arg603Trp, is located in the N-terminal-most TSR domain (TSR1) of ADAMTS6. This domain is the target of protein-O-fucosylation, which is a QC signal that prevents secretion of ADAMTS proteins that are improperly folded [37].

ADAMTS6 is necessary for cardiac development and expression of gap junction protein Cx43

ADAMTS6 belongs to a family of metalloproteases that mediates extracellular proteolytic processing of extracellular matrix (ECM) components and other secreted molecules. ADAMTS6 is closely related to ADAMTS10, which interacts with and accelerates assembly of fibrillin-1, mutations in which cause Marfan syndrome [38]. This suggests that ADAMTS6 could regulate cardiac ECM. While no specific ADAMTS6 substrates have been unequivocally identified, it was reported to regulate focal adhesions, epithelial cell–cell interactions, and microfibril assembly in cultured cells [39]. We show by RNA in situ hybridization that Adamts6 is expressed in the atrioventricular and septal cushions and myocardium of the embryonic heart, with expression persisting into adult ventricular, trabecular, and septal myocardium (Fig. 1ad).

Fig. 1
figure 1

Adamts6 cardiac expression, sequence conservation, and cardiac anomalies in Adamts6-deficient mice. ad Adamts6 (red punctate signal) is expressed in the outflow tract (a, blue arrowhead), heart valves (ayellow arrowhead), atria (agreen arrowhead), and ventricular myocardium (aorange arrowhead, b-d). e, f Diagram of the two Adamts6 mutant alleles recovered: Met1Ile and Ser149Arg. The sequence alignment shows conservation of the Ser149 residue in ADAMTS6 across species. gl Congenital heart defects observed in Adamts6 Ser149Arg (Adamts6m/m) mutant embryos. A WT mouse heart with normal atrial, ventricular, and outflow tract anatomy (g), an intact atrioventricular septum (d), and normal ventricular myocardium (i). Homozygous Adamts6 Ser149Arg mutants (Adamts6m/m) exhibit a spectrum of congenital heart defects, such as a double outlet right ventricle (j, in which the aorta and pulmonary artery both arise from the right ventricle; see Additional file 3: Video S1) or an atrioventricular septal defect (AVSD) (k, in which the atrial and ventricular septa fail to form). Thickening of the ventricular wall is commonly observed, indicating ventricular hypertrophy (l). These mutant hearts (jl) are shown at embryonic day (E)16.5 but their development is delayed, giving an appearance similar to WT hearts at E14.5 (as shown in (gi)). Ao aorta, AVSD atrioventricular septal defect, LA left atrium, LV left ventricle, Pa pulmonary artery, RA right atrium, RV right ventricle. Scale bar: (a) 500 μm; (bd) 50 μm; (gl) 1 mm

Mice with recessive Adamts6 mutations were recovered in a forward genetic screen [40] (Fig. 1e and f). One mutation (p.Met1Ile) affects the start codon and is predicted null. The second mutation (p.Ser149Arg) lies in the propeptide. Both mutations cause prenatal/neonatal lethality with identical congenital heart defect phenotypes (Additional file 1: Table S7), comprising double outlet right ventricle (Fig. 1j, Additional file 3: Video S1), atrioventricular septal defect (Fig. 1k), and ventricular hypertrophy (Fig. 1j and l).

Ventricular conduction relies on cardiomyocyte coupling through gap junctions, with connexin 43 (Cx43) being the predominant myocardial gap junction protein in the human and mouse myocardium. Gja1 (encoding Cx43) knockout mice exhibit slow conduction, QRS prolongation, and increased susceptibility to ventricular arrhythmias [41,42,43], consistent with its role in mediating electrical coupling required for efficient propagation of ventricular depolarization. While Adamts6 heterozygous (Adamts6m/+) adult mice are viable and without structural heart defects (Additional file 2: Figure S9), their ventricular myocardium shows reduced Cx43 staining (Fig. 2a and b). Western blot shows reduction of Cx43 protein in the adult Adamts6m/+ myocardium (Fig. 2c and d). Interestingly, parallel quantitative real-time polymerase chain reaction (qRT-PCR) shows unchanged Gja1 messenger RNA (mRNA) expression (Fig. 2e), suggesting post-transcriptional regulation. Analysis of embryonic day 14.5 homozygote Adamts6m/m mutants shows Cx43 is completely absent in the ventricular myocardium (Fig. 2a and b). Thus, whereas Adamts6m/m mice have severe structural heart defects and Cx43 deficiency, Adamts6m/+ hemizygosity leads to reduction in Cx43 expression in the ventricles without defects in cardiac morphogenesis. Together these findings suggest the QRS prolongation in individuals with rare pathogenic ADAMTS6 variants could arise from impaired myocardial connectivity due to Cx43 reduction.

Fig. 2
figure 2

Reduction of Cx43 intercalated disk gap junction staining in Adamts6-deficient mice. a, b Cx43 staining (green) (a) is reduced throughout ventricular myocardium in embryonic day (E) 14.5 Adamts6m/m embryos and 6-week and 12-month Adamts6m/+ mice and quantified in (b). DAPI (blue) was used to visualize cell nuclei. c, d Representative western blot (c) and quantification (d) shows reduced Cx43 in three pairs of 6-week Adamts6m/+ and WT myocardium controls. Gapdh was used as a loading control. e No change in Gja1 RNA level in 6-week and 12-month Adamts6m/+ myocardium as compared to control. Scale bar: 50 μm. *P ≤ 0.01. E embryonic, W weeks, M months

Rare ADAMTS6 coding variants lead to impaired ADAMTS6 secretion

To determine the functional consequences of the two predicted pathogenic human ADAMTS6 coding variants from the exome-chip analysis (p.Ser90Leu and p.Arg603Trp), myc-tagged ADAMTS6 constructs with the variants introduced by site-directed mutagenesis were expressed in HEK293F cells. Western blotting was used to compare the levels of mutant and wild type (WT) myc-tagged ADAMTS6 in the transfected cell lysates and medium. As positive and negative controls, respectively, we transfected the known pathogenic murine variant (p.Ser149Arg) and two rare non-synonymous human ADAMTS6 variants predicted to be benign (p.Ser210Leu and p.Met752Val). Western blotting confirmed that the Adamts6 p.Ser149Arg variant was not secreted (Fig. 3a). The predicted human pathogenic variants show much reduced secretion compared to the WT and benign variants (Fig. 3bd). Significantly, the molecular masses of the secreted p.Ser90Leu and p.Arg603Trp variants observed in cell lysate are comparable to that of the WT protein, indicating normal glycosylation and propeptide excision, which are essential for ADAMTS zymogen conversion to their mature forms [44]. These results suggest that heterozygous individuals have a reduction of secreted ADAMTS6 to 50% of normal, implying reduced proteolytic activity. The resulting disruption of proteolytic remodeling could potentially affect cell–cell and cell–matrix interactions essential for efficient Cx43 gap junction assembly. However, the rs61736454 (p.Ser90Leu) and rs114007286 (p.Arg603Trp) variants were associated with longer and shorter QRS duration, respectively. The reduced secretion observed was more profound for the rs61736454 variant compared to rs114007286, and the assay does not predict what impact a small amount of secreted protein may have, nor how it interacts in the presence of other modifier genes/variants carried by the same individual. Additionally, the two variants might affect overall protein function and interaction with binding partners in different ways.

Fig. 3
figure 3

A mouse Adamts6 ENU mutant and predicted damaging ADAMTS6 variants have impaired secretion. a, b Representative western blots using anti-Myc antibody show a major molecular species of 150 kDa in HEK293F cell lysates, corresponding to the ADAMTS6 zymogen (Z). In contrast, the culture medium of cells transfected with WT ADAMTS6 shows a 130 kDa species, corresponding to mature (M, i.e. furin-processed) ADAMTS6. a The p.Ser149Arg murine variant is not secreted into the culture medium. b The predicted damaging human variants, p.Ser90Leu and p.Arg603Trp, have reduced secretion, whereas the predicted benign variants, p.Ser210Leu and p.Met752Val, are secreted normally. Lysate and medium of HEK293F cells transfected with an empty vector (EV) lack immunoreactivity. The membrane was subsequently re-blotted using an anti-GAPDH monoclonal antibody to demonstrate comparable sample loading. c, d Densitometry of ADAMTS6 signal in lysates (c) and medium (d) shows reduced secretion of p.Ser90Leu and p.Arg603Trp variants and normal secretion of p.Ser210Leu and p.Met752Val into the medium, relative to the WT control (*P ≤ 0.01 for n = 3 transfections of each vector)

Conclusions

In a meta-analysis of data from 77,898 participants of European ancestry and 7695 of African descent in our discovery cohort participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Exome-Chip ECG consortium, we identified 28 loci associated with QRS duration. With the addition of 111,874 individuals of European ancestry from the UK Biobank and deCODE cohorts, all 34 variants across the 28 loci passed the exome-chip-wide significance threshold, indicating our results are robust. Furthermore, effect size directions between discovery and replication remained consistent and P values of non-replicating variants in the replication analysis alone were all below nominal significance (P < 0.05). Novel loci include genes involved in cardiac development and dysfunction, some of which are highly expressed in skeletal and/or cardiac tissue. To establish further evidence for these novel loci and mechanisms underlying each association, future functional experiments are essential.

The present study also highlights the efficacy of large-scale population-based exome-chip analysis for discovery of non-synonymous coding variants with significant functional effects. In gene-based tests, we identified an association between ventricular depolarization and rare non-synonymous variants in ADAMTS6, a gene not previously implicated in cardiac conduction. We chose to focus on this novel locus and seek functional validation as the association was driven by multiple rare coding variants that were predicted to be damaging by in silico tools. The coding variants driving the association in the population study and the mutations identified in the mouse forward genetic screen all impair ADAMTS6 secretion, indicating reduction/loss of function. Significantly, although heterozygosity of the variants in mice is not associated with structural heart defects, we detected reduction of Cx43 gap junctions in the ventricular myocardium. Homozygous Adamts6 mutants show complete loss of Cx43 gap junctions as well as structural heart defects, implying a dosage effect. Together, these findings indicate that ADAMTS6 has a novel role in regulating gap junction-mediated ventricular depolarization, with quantitative reduction in ADAMTS6 causing cardiac conduction perturbation. While our study focuses on cardiac conduction, the findings support the potential broad utility of large-scale exome-chip analysis for interrogating coding variants associated with other physiological or clinical parameters.

Methods

Discovery association analyses

Study cohorts

All participating studies formed the CHARGE EKG exome-chip consortium, including those belonging to the CHARGE consortium and external studies to investigate the role of functional variation in electrocardiographic traits. Twenty-two cohorts participated in the QRS duration analysis effort representing a maximum total sample size of 85,593 samples, consisting of 77,898 participants of European ancestry (91%) and 7695 of African descent. Individual study details and characteristics are summarized in Additional file 1: Table S1.

Phenotype measurements

We analyzed QRS duration measured in milliseconds. In each study, individuals were excluded from the analyses if these had a QRS duration of > 120 ms, atrial fibrillation (AF) on baseline electrocardiogram, a history of myocardial infarction or heart failure, had Wolff–Parkinson–White syndrome (WPW), a pacemaker, or used Class I and class III blocking medications (those medications with prefix C01B* according to the Anatomical Therapeutic Chemical (ATC) Classification System, http://www.whocc.no/atcddd/) [45]. For cohorts that were disease case-control studies, we included only the control subjects in our analyses irrespective of the nature of the case disease.

Genotyping and quality control

Each participating study performed genotyping using the Illumina HumanExome BeadChip / HumanCoreExome platforms. Owing to the difficulty of accurately detecting and assign genotype calls for rare variants (MAF < 1%), an initial core set of CHARGE cohorts, comprising approximately 62,000 samples, assembled intensity data into a single project for a joint improved calling. The quality of the joint calling was assessed through investigating the concordance of genotypes in samples having both exome-chip and exome-sequence data, described extensively elsewhere [46, 47]. Using the curated clustering files from the CHARGE central calling effort, several cohorts within our study re-called their genotypes. The remainder of participating studies used either Gencall [48] or zCall [49], or a combination of both. Full details concerning the genotyping and quality control for each cohort are summarized in Additional file 1: Table S1. Individual studies performed sample-level genotype QC filtering for call rate, removing autosomal heterozygosity outliers, gender mismatches, duplicates as established by identity by descent (IBD) analysis, and removed ethnic outliers as determined by multidimensional scaling. Poorly called variants were typically removed by filtering for Hardy-Weinberg equilibrium test P value (pHWE), call rate, and filtering removing poorly clustering variants. Each study aligned their data reference strand to the Illumina forward strand using a central SNP allele reference and annotation file (SNP info file) [46] for the Illumina Exome Chip. Variants were all mapped to GRCh37/hg19. Only variants present within the SNP info file were initially considered for analyses, 247,871 in total. Next, we filtered out 9252 variants that failed QC in the joint calling effort, as well as 6591 variants with inconsistent reference alleles across studies (a total of 11,392 unique SNPs), and considered furthermore only autosomal and chromosome X variants, and only those that were polymorphic in our study, leaving an initial set of 228,164 variants for analysis. For our single variant analyses, we only included variants with MAF > 0.012% (equal to a minor allele count [MAC] of 10), 162,199 in total.

Statistical methods

All association analyses were carried out using the R-package seqMeta [50]. Each study ran the “prepScores” function and adjusted their analyses for age, gender, body mass index (BMI), height, principal components, and study-specific covariates when appropriate (details in Additional file 1: Table S1). The output of this function is an R “list” object (“a prepScores object”), stored in an .RData file, where each element corresponds to a gene, and contains the scores and MAFs for variants, as well as a matrix of the covariance between the scores at all pairs of SNPs within a gene. All studies performed both gender combined and separated analyses, in addition to separation by ancestry. Using the prepScores objects from each study, we performed meta-analyses using the “singlesnpMeta()” for single variant meta-analyses, and the “burdenMeta” and “skatMeta()” functions of SeqMeta. Coefficients and standard errors from seqMeta can be interpreted as a “one-step” approximation to the maximum likelihood estimates. Ancestry groups were analyzed both separate and combined at the meta-analysis level.

For single variant meta-analyses, we included all variants with a MAC ≥ 10 in order to have well-calibrated type I error rates [51]. Statistical significance was defined using Bonferroni corrections. For single variants, maximally 162,199 variants were included in five separate analyses after filtering for MAC: European and African ancestry separated and combined (n = 3); and sex-stratified analyses (n = 2), resulting in a Bonferroni corrected P value of α=0.05 / 162,199 variants / 5 analyses = 6.17 × 10−8.

Suggestive sexually dimorphic associations were identified by performing sex-stratified meta-analyses, totaling 39,907 women and 31,702 men, including only from cohorts that had both male and female samples. Variants were deemed to be suggestive sex-specific when reaching below a P value threshold of exome-wide significance (P < 6.17 × 10−8) in one sex and above nominal significance in the other (P > 0.05).

For gene-based tests, also performed using seqMeta using the “prepScores” objects from individual cohorts, we assigned variants to genes by annotating all variants on the Exome Chip using ANNOVAR [52] following RefSeq [53] gene definitions mapped to human genome build 37 (hg19). In the collapsed variant tests, we included only variants with MAF < 1% and included only genes for which two or more variants were present (n = 16,085). We performed both SKAT [54] and T1 burden [55] tests, for three different functional sets of variants limited to the following: (I) all variants; (II) missense, nonsense, splice, and indel variants; (III) “damaging”: the same variants as in group II, except for missense only including those that are predicted to be damaging by at least two out of four functional prediction algorithms (Polyphen2 [56], SIFT [57], Mutation Taster [58], and LRT [59]). For the gene-based tests, we used a Bonferroni corrected P value significance threshold of α=0.05 / 16,085 genes / 2 different tests / 3 functional variant classes = 5.18 × 10−7.

We define a physically independent locus as the genomic region that contains variants within 250 kb on either side of LD-independent lead SNPs (exome-wide significant variants with r2 < 0.1), where LD calculations were based on European ancestry. Following this definition, in certain cases LD-independent lead variants are present in overlapping regions, complicating the definition and reporting of associated genetic loci and harbored genes. Therefore, we annealed loci if LD-independent exome-wide significant variants were < 250 kb from each other. Where lead SNPs from previous analyses were not contained in these regions, we considered these as novel. LD calculations were performed on the Illumina Exome Chip genotype data from the TwinsUK cohort [60] (n = 1194), using PLINK 1.9 [61].

Replication association analyses

Study cohort: UK biobank (UKB)

UK Biobank (www.ukbiobank.ac.uk) is a prospective study of 500,000 volunteers, comprising relatively even numbers of men and women aged 40–69 years old at recruitment, with extensive baseline, and follow-up clinical, biochemical, genetic, and outcome measures. Approximately 95,000 individuals were recruited for a Cardio test using a stationary bicycle in conjunction with a four-lead electrocardiograph device at the initial assessment (2006–2008) and ~ 20,000 individuals performed the test again (the first repeat assessment: 2011–2013). The Cardio test, thereafter known as the exercise test, started with 15 s of rest (pre-test), followed by 6 min of exercise (cycling) with an increasing workload, and a 1-min recovery period without exercise. To improve accuracy, we calculated an average QRS waveform by aligning all QRS complexes present in a window of 15 s from the resting stage. Ectopic beats and artifacts were removed. Then, we calculated the correlation between each individual QRS complex and the average QRS waveform and removed those with a correlation coefficient < 0.8. Finally, we repeated the calculation of the average QRS waveform by only considering those highly correlated individual QRS complexes. The QRS width was measured from the average QRS waveform as the interval between the onset of the Q wave and the end of the S wave. Genotyping was performed by UKB using the Applied Biosystems UK BiLEVE Axiom Array or the UKB AxiomTM Array. Single Nucleotide Variants (SNVs) were imputed centrally by UKB using a merged UK10K sequencing + 1000 Genomes imputation reference panel (https://www.biorxiv.org/content/early/2017/07/20/166298). Following phenotype and genotype QC, a total of 51,971 unrelated individuals of European ancestry remained for analysis. Thirty-four QRS discovery lead variants selected for replication were extracted from UKB imputed files, all being of high quality (Hardy-Weinberg P > 1 × 10−4 and an info score > 0.5) using QCTOOL v2 and the association analysis was performed using SNPTEST v2.5.4 assuming an additive genetic model.

Study cohort: deCODE

ECGs obtained in Landspitali—The National University Hospital of Iceland, Reykjavik, the largest and only tertiary care hospital in Iceland—have been digitally stored since 1998. For this analysis, we used information on mean QRS duration in milliseconds from 151,667 sinus rhythm ECGs from 59,903 individuals. Individuals with permanent pacemakers or history of myocardial infarction, heart failure, atrial fibrillation, or WPW were excluded, as well as ECGs with QRS duration > 120 ms. ECG measurements were adjusted for sex, year of birth, and age at measurement. Due to limited availability of information, height, BMI, or drugs were not accounted for in the analysis. The genotypes in the deCODE study were derived from whole-genome sequencing of 28,075 Icelanders using Illumina standard TruSeq methodology to a mean depth of 35X (SD 8X) with subsequent imputation into 160,000 chip-typed individuals and their close relatives [21]. Selected replication variants from the meta-analysis for association with QRS duration were tested in accounting for relatedness using a mixed effects model as implemented by BOLT-LMM [62] followed by LD score regression [63].

Statistical analysis

We first performed a fixed-effects inverse variance weighted meta-analysis combining the summary statistics data from the UKB and deCODE analyses, followed by a combined analysis of the discovery and replication summary statistics using GWAMA v2.2.2 [64].

Mouse and cell models

Western blot analysis

A plasmid vector for expression of the full-length Adamts6 open reading frame was generated via PCR using Phusion High-Fidelity DNA Polymerase (catalog no. M0530 L; New England Biolabs) and embryonic mouse heart complementary DNA (cDNA) as the template and inserted into PSecTag2B (V900–20; Life Technologies). ADAMTS6 variants p.Ser90Leu and p.Arg603Trp were created in the Adamts6 cDNA using Q5 Site-Directed Mutagenesis Kit (catalog no. E0554S; New England BioLabs). Primer sequences used for cloning and mutagenesis are available upon request. Each plasmid insert was verified by sequencing. Human embryonic kidney (HEK293) cells obtained from ATCC were maintained in medium supplemented with 10% fetal bovine serum and 100 U/mL penicillin and 100 μg/mL streptomycin. The constructs were transfected with Lipofectamine 3000 Transfection Kit (catalog no. L3000; Invitrogen) following manufacturer’s instructions. After 72 h in serum-free medium, cell lysates were collected in lysis buffer (0.1% NP-40, 0.01% sodium dodecyl sulfate, and 0.05% sodium deoxycholate in phosphate buffered saline [PBS], pH 7.4). Extracts were electrophoresed by reducing SDS-PAGE on 10% Tris-Glycine gels. Proteins were electroblotted to Immobilon-FL membranes (catalog no. IPFL00010, EMD Millipore), incubated with primary antibody anti-myc (Hybridoma core facility; 1:1000; Cleveland Clinic), anti-GAPDH (catalog no. MAB374; 1:5000; EMD Millipore), and anti-Cx43 (catalog no. C6219; 1:2000; Sigma-Aldrich), overnight at 4 °C, followed by IRDye secondary antibodies goat anti-mouse or anti-rabbit (926–68,170, 827–08365; 1:10000; LI-COR) for 1 h at room temperature and visualized by Odyssey CLx (LI-COR). Band intensity was measured using ImageJ (NIH, Bethesda, MD, USA).

Statistics

All values are expressed as mean ± SEM. A paired two-tailed Student’s t-test was used to assess statistical significance.

Recovery and phenotyping of Adamts6 mutant mice

Adamts6 mutant mice were recovered from a recessive ethynitrosourea (ENU) mouse mutagenesis screen conducted using non-invasive in utero fetal echocardiography [40]. Mutants detected with congenital heart defects by ultrasound imaging were recovered either as fetuses or at term and further analyzed by necropsy, followed by histopathology for detailed analysis of intracardiac anatomy with three-dimensional reconstructions using episcopic confocal microscopy. From the screen, ten independent Adamts6 mutant lines were recovered, all exhibiting the identical phenotype. Mouse histology, immunostaining and RT-PCR experiments were approved by the Cleveland Clinic Institutional Animal Care and Use Committee (protocol # 2015–1458, IACUC number: 18052990).

Mouse mutation recovery

Mutation recovery was conducted by whole-exome capture using SureSelect Mouse All Exon kit V1, with sequencing carried out using Illumina HiSeq 2000 with minimum 50X average coverage (BGI Americas). Sequence reads were aligned to the C57BL/6 J mouse reference genome (mm9) and analyzed using CLCBio Genomic Workbench and GATK software. All homozygous mutations were genotyped across all mutants recovered in the mutant line and only the Adamts6 mutation was consistently homozygous across all mutants recovered in the line, the pathogenic identifying it as mutation. Of the ten mutant lines, nine were identified to have the same missense mutation (c.C447G: p.S149R), while one mutant line exhibited loss of the start codon (c.G3A: p.M1I) and was confirmed to be null with no Adamts6 transcripts detected with transcript analysis. The Adamts6 missense mutation was subsequently identified as a spontaneous mutation in the C57BL/6 J production colony at the Jackson Laboratory.

Histology and immunofluorescence staining and RNA in situ hybridization

Tissues were fixed in 4% paraformaldehyde in PBS at 4 °C overnight followed by paraffin embedding. Sections of 7 μm were used for hematoxylin and eosin staining, picrosirius red staining, and immunofluorescence for Cx43 (catalog no. C6219; 1:800; Sigma-Aldrich) followed by secondary goat anti-rabbit antibody (catalog no. 111–035-144; 1:2000; Jackson Immunoresearch Laboratories Inc.). Antigen retrieval, i.e. immersion of slides in citrate-EDTA buffer (10 mM/L citric acid, 2 mM/L EDTA, 0.05% v/v Tween-20, pH 6.2) and microwaving for 1.5 min at 50% power four times in a microwave oven with 30-s intervals intervening was used before immunofluorescence. Immunofluorescence was quantified by the ratio of Cx43 signal to DAPI-positive cell nuclei integrated density (ImageJ; National Institutes of Health, n = 3, with three samples of each myocardium). Adamts6 RNA in situ hybridization was performed using RNAScope (Advanced Cell Diagnostics) following the manufacturer’s protocol. Briefly, 7-μm sections were deparaffinized and hybridized to a mouse Adamts6 probe set (catalog no. 428301; Advanced Cell Diagnostics) using a HybEZ™ oven (Advanced Cell Diagnostics) and the RNAScope 2.5 HD Detection Reagent Kit (catalog no. 322360; Advanced Cell Diagnostics).

Quantitative real-time PCR

Total RNA was isolated using TRIzol (catalog no. 15596018, Invitrogen) and 1 μg of RNA was reverse-transcribed into cDNA with SuperScript III Cells Direct cDNA synthesis system (catalog no. 46–6321, Invitrogen). qPCR was performed with Bullseye EvaGreen qPCR MasterMix (catalog no. BEQPCR-S; MIDSCI) using an Applied Biosystems 7500 instrument. The experiments were performed with three independent samples and confirmed reproducibility. Gapdh was used as a control for mRNA quantity. The ∆∆Ct method was used to calculate relative mRNA expression levels of target genes. Primer sequences are as follows: Gapdh: 5’ TGGAGAAACCTGCCAAGTATGA 3′ and 5’ CTGTTGAAGTCGCAGGAGACA 3′; Gja1: 5’ CCTGCTGAGAACCTACATCATC 3′ and 5’CGCCCTTGAAGAAGACATAGAA 3′.

Web resources

Databases

Genotype-Tissue Expression (GTEx) Portal database: http://www.gtexportal.org

Software

seqMeta: http://cran.r-project.org/web/packages/seqMeta/

EasyStrata: https://cran.r-project.org/web/packages/EasyStrata/

PLINK 1.9: https://www.cog-genomics.org/plink

SNPTEST v2.5.4: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html

GWAMA v.2.2.2: https://www.geenivaramu.ee/en/tools/gwama