Introduction

Several genome-wide association studies (GWAS) have been undertaken to investigate the genetic basis of rheumatoid arthritis (RA) (17). These studies have been recently combined into a transcontinental metaanalysis (8), which most comprehensively evaluates the influence of common single nucleotide variants on RA susceptibility in populations with European and Asian ancestry. On the basis of the imputation into the 1000 Genomes data set, this metaanalysis has generated association statistics for nearly 10 million single nucleotide polymorphisms (SNPs) across the human genome. The implicated genes were then further tested for their enrichment in molecular pathways. However, for many loci, it remains unclear which SNPs constitute the actual causal variants. In the present report, we examine this question by combining the RA GWAS metaanalysis results with a set of enhancer regions for 71 primary cell types, which was recently published by the FANTOM consortium (9). These enhancer regions were detected by their noncoding RNA expression signature, that is, their localized expression of noncoding capped bidirectional short transcripts. Given that susceptibility variants typically exert their effect through modifying gene expression (10), this large enhancer set can be highly valuable to search for causal variants in current GWAS results. Moreover, using this set of enhancer regions to interpret RA GWAS results can point to cell types where gene activity may be altered through RA susceptibility variants and that therefore have important roles in RA pathogenesis.

Materials and Methods

Transcontinental RA GWAS summary statistics were downloaded from the RIKEN genome center (https://doi.org/plaza.umin.ac.jp/∼yokada/datasource/software.htm; August 2014). Enhancer regions that are present across 71 human primary cell types were retrieved from the FANTOM website (https://doi.org/enhancer.binf.ku.dk/presets/facet_expressed_enhancers.tgz; August 2014), and enhancer regions were defined by the genomic coordinates provided. Note that these enhancers constitute a superset of the cell type-specific enhancers that are displayed in the genome browser on this website. Out of the 9,739,303 (0.3%) SNPs being genotyped or imputed in the RA GWAS, 29,632 fall into any such enhancer. RA-associated SNPs were defined as all SNPs that obtained statistical significance (P < 5 × 10−8) in the transcontinental GWAS analysis or any of the two population-specific GWAS analyses (European or Asian). Excluding the MHC region, a total of 3,581 SNPs are associated with RA under this criterion (8). We excluded the MHC region, because the extensive linkage disequilibrium (LD) and the strong effects of specific coding variants make it difficult to attribute any role to other variants. Importantly, recent analyses indicate that coding region differences within the classical human leukocyte antigen (HLA) loci account for the vast majority of the association signals in the MHC region (12).

SNP genotypes for estimation of LD were downloaded from the 1000 Genomes consortium website (https://doi.org/www.1000genomes.org; August 2014). Gene annotations were downloaded from the UCSC genome browser (https://doi.org/genome.ucsc.edu; August 2014). GWAS metaanalysis results for schizophrenia were used as a negative control and downloaded from the Psychiatric Genomics Consortium (https://doi.org/www.med.unc.edu/pgc/downloads; August 2014). Haplotypes were estimated with the program snphap as obtained from https://doi.org/www-gene.cimr.cam.ac.uk/staff/clayton/software (August 2014). Pairwise LD was estimated with the program Haploview as obtained from https://doi.org/www.broadinstitute.org/scientific-community/science/programs/medical-and-population-genetics/haploview/downloads (August 2014).

To test the enrichment of significant SNPs in enhancer regions across the different cell types, we randomly permuted 10,000 times the assignment of cell types to enhancers. Thus, each cell type was assigned a random set of enhancer regions, with its number of enhancers kept constant. We then counted the number of permutations for which an equal or greater number of significant SNPs was observed in the random enhancer set than in the actual enhancer set of a cell type. If multiple SNPs are located in enhancers at a same locus, each such SNP may increase the chance for an influence of genetic variation on enhancer activity, and it is therefore counted separately. This strategy compares the enrichment of RA-associated SNPs in different cell types relative to each other while accounting for the differences in the number of enhancers. The number of enhancer regions may differ across cell types because of biological factors or technical issues. Note that we do not intend to show here that RA-associated SNPs are generally enriched in FANTOM enhancers compared with the rest of the genome, which would require a different testing strategy. Thus, the null hypothesis represented by this simulation is that RA-associated SNPs are distributed randomly across the enhancers from different cell types. The statistical analysis was performed with the R software package (https://doi.org/www.r-project.org; August 2014).

Results

The FANTOM data set provides a list with 29,202 enhancer regions across 71 primary cell types, which jointly covers 9.1 Mb (∼0.3% of the genome). To find disease-causing enhancer variants, we retrieved all enhancer SNPs that are associated with RA (P < 5 × 10−8) in the transcontinental GWAS analysis or any of the two population-specific GWAS analyses (European or Asian). In total, we found 50 RA-associated SNPs that fall into enhancer regions, which are distributed over 21 different gene loci.

We then evaluated whether RA-associated SNPs are overrepresented in the enhancer regions of any specific cell types compared with other cell types. Notably, RA-associated SNPs are enriched in enhancer regions that are preferentially active in immunological cells (Table 1, Supplementary Table S1). In particular, there is an enrichment of RA-associated SNPs in T-cell enhancers (16.9 SNPs/Mb, P < 0.001) and NK-cell enhancers (14.9 SNPs/Mb, P < 0.001). RA-associated SNPs also tend to be enriched in B-cell enhancers (12.6 SNPs/Mb, P = 0.008), neutrophil enhancers (10.6 SNPs/Mb, P = 0.013), basophil cell enhancers (8.1 SNPs/Mb, P = 0.018), dendritic cell enhancers (7.7 SNPs/Mb, P = 0.036), intestinal epithelial enhancers (14.7 SNPs/Mb, P = 0.039) and mast cell enhancers (8.4 SNPs/Mb, P = 0.044). This enrichment in immunological cell types supports the notion that some of these SNPs may be causal variants that modify cell-specific expression regulation of nearby genes. In total, there were 33 SNPs located in T-cell enhancer regions, and out these, 22 SNPs were also located in NK-cell enhancer regions. Thus, an overlapping set of variants accounts for the enrichment of RA-associated SNPs in T-cell and NK-cell enhancer regions.

Table 1 Cell types of enhancer regions being enriched (P < 0.05) for RA-assoclated SNPs.

As a negative control, we next used the set of 56 non-MHC SNPs that fall into any FANTOM enhancer region and that display a P value <5 × 10−6 in a recent GWAS metaanalysis of schizophrenia (11). None of the enhancer regions of any immune cell types are enriched for these schizophrenia SNPs (P > 0.25 for all cell types listed in Table 1). Given that schizophrenia is not a classical autoimmune disease such as RA, no such enrichment of potential risk variants in immune cell enhancers would be expected.

We next looked at which RA-associated SNPs account for the observed enrichment of RA-associated SNPs in immune cell enhancer regions (Table 2). At 9 loci (ANKRD55, MTF1, LBH, CD28, NFKBIE, BLK, RAD51B, RASGRP1, IRF8), we found exactly one RA-associated SNP falling into an enhancer. At ANKRD55, the published RA lead SNP (rs7731626) itself is located in an enhancer, which is specific to dendritic cells. At the remaining 8 loci with only one RA SNP in enhancers, a tight correlation (as measured by r2) exists between the observed enhancer SNP and the respective lead SNP from the GWAS metaanalysis (8). Therefore, all these enhancer SNPs can be viewed as strong candidates to act as causal variants that contribute to the reported genetic association signals at these gene loci.

Table 2 RA-associated SNPs that are located in cell type-specific FANTOM enhancer regions.

At the remaining 12 gene loci for which RA-associated SNPs fall into FANTOM enhancers (C4orf52, TNFAIP3, CCR6, TRAF1/C5, IL2RA, PRKCQ, GATA3, CXCR5, ETS1, LOC145837, MED1, PTPN2), at least two RA-associated SNPs are located in enhancer regions. At most loci, these enhancer SNPs are in tight LD with the published RA lead SNP (Table 2). Of particular interest might be the enhancer haplotypes near the genes PRKCQ and GATA3, which are both close to the telomere of the short arm of chromosome 10 (Figure 1). At PRKCQ, two closely neighboring RA-associated SNPs in tight LD fall into a same enhancer region. Thus, these two SNPs may act together to alter the activity of this enhancer, which is active in T cells, NK cells, dendritic cells neutrophils, and monocytes. A similar pattern can be observed at the GATA3 locus, where r2 is close to 1 across three RA-associated SNPs. These three SNPs in close physical proximity fall into a same enhancer that is specific for T cells and NK cells. Again, enhancer activity may be altered by variation at multiple nucleotide sites. Thus, at PRKCQ and GATA3, the described multimarker haplotypes are particularly good candidates to act as functional variants that may explain the genetic associations at these loci. Importantly, we did not see any long-range LD across the two associated haplotypes at PRKCQ and GATA3, which are spaced 1.5 Mb apart.

Figure 1
figure 1

Haplotypes of RA-associated SNPs that are in near-perfect LD and fall into enhancer regions at PRKCQ and GATA3. Enhancer regions are displayed as black bars and genes are displayed as blue bars.

Discussion

Our analysis combines the recently published RA GWAS metaanalysis (8) with a recently published set of enhancer regions in the human genome (9). We notice the strongest enrichment of RA-associated SNPs in enhancer regions with T-cell specificity. This finding confirms earlier results that found preferential expression in T cells for genes near RA-associated SNPs (13). In addition, our analysis indicates a strong enrichment of RA-associated SNPs in NK-cell enhancers, suggesting that the possible role of NK cells in RA pathogenesis may deserve more attention (14).

Our study differs from many earlier pathway analyses of GWAS in the sense that we do not attempt to link implicated genes to pathways, as it is typically done in the network-assisted prioritization of GWAS results (15). Instead, we test the enrichment of RA-associated SNPs in different sets of enhancers that are active in particular cell types. We thereby connect genetic findings to tissue biology without considering the role of genes and gene functions. This approach will likely be useful for understanding genetic variants involved in other diseases too. The approach has become possible through the availability of the FANTOM data set and is principally applicable to any phenotype with a sufficiently large number of GWAS hits. In our analysis of RA, this scenario allows us to point out SNPs at several loci that are likely to exert causal effects instead of being in LD with some unknown causal variant. Interestingly, there are several loci where multiple enhancer SNPs occur in tight LD. These haplotypes may have a greater chance of affecting the activity of enhancers by altering more than one nucleotide site. This observation is consistent with the proposed “multiple enhancer variant hypothesis” of complex disease associations (16). Two notable examples are the loci PRKCQ and GATA3, where multiple variants in nearly perfect LD could account for the GWAS association signals. Interestingly, a genetic interaction between GATA3 (rs2275806) and PRKCQ (rs947474) has been reported (17), with one interacting SNP being part of our PRKCQ enhancer haplotype and the other interacting SNP being in tight LD with the SNPs from our GATA3 enhancer haplotype (r2 = 0.9 in Europeans and r2 = 0.98 in Asians). Altered expression of GATA3 and PRKCQ in T cells and NK cells could be caused by these haplotypes, which genetically interact to promote the development of RA (17). GATA3 mediates PRKCQ-induced gene expression (18), and GATA3 is a master regulator of Th2-cell-specific gene expression (19). It is therefore conceivable that expression levels of GATA3 could influence the RA phenotype through modifying T-cell activity. The expectation would be that genetic variants that influence PRKCQ expression in cis would influence GATA3 expression in trans while interacting with other variants that influence GATA3 expression in cis.

Conclusion

Detailed functional studies are now needed to test this hypothesis and find out how these and other enhancer variants alter cellular function and cause RA susceptibility. Our analysis indicates potentially functional enhancer variants for 21 RA loci, but it does not make any prediction about causal variants at the remaining RA GWAS loci. Given that coding SNPs play a role only in a minority of GWAS loci, the search for causal variants may require even more comprehensive functional annotations as well as the analysis of rare variants, insertion/deletion variants and copy number variants.

Disclosure

After completion of this work, J Freudenberg became a full time employee of Regeneron Genetics. This did not have any influence on the results in this study or their presentation.