Introduction

Diffuse gliomas are the most common malignant primary brain tumour affecting adults with around 26,000 newly diagnosed cases each year in Europe [9]. Diffuse gliomas have traditionally been classified into oligodendroglial and astrocytic tumours and are graded II–IV, with the most common form—Glioblastoma (GBM) or glioma grade IV—typically having a median survival of only 15 months [2].

Despite glioma being an especially devastating malignancy little is known about its aetiology and aside from exposure to ionising radiation that accounts for very few cases no environmental or lifestyle factor has been unambiguously linked to risk [2]. Recent genome-wide association studies (GWAS) have, however, enlightened our understanding of glioma genetics identifying single-nucleotide polymorphisms (SNPs) at multiple independent loci influencing risk [22, 25, 35, 44, 49, 51, 63]. While understanding the functional basis of these risk loci offers the prospect of gaining insight into the development of glioma, few have been deciphered. Notable exceptions are the 17p13.1 locus, where the risk SNP rs78378222 disrupts TP53 polyadenylation [51] and the 5p15.33 locus, where the risk SNP rs10069690 creates a splice-donor site leading to an alternate TERT splice isoform lacking telomerase activity [24].

Since the aetiological basis of glioma subtypes is likely to reflect different developmental pathways it is not perhaps surprising that subtype-specific associations have been shown for GBM (5p15.33, 7p11.2, 9p21.3, 11q14.1, 16p13.33, 16q12.1, 20q13.33 and 22q13.1) and for non-GBM glioma (1q44, 2q33.3, 3p14.1, 8q24.21, 10q25.2, 11q21, 11q23.2, 11q23.3, 12q21.2, 14q12 and 15q24.2) [35]. Recent large-scale sequencing projects have identified IDH mutation, TERT promoter mutation and 1p/19q co-deletion as cancer drivers in glioma. These findings have improved the subtyping of glioma [5, 12, 26, 27] and this information has been incorporated into the revised 2016 WHO classification of glial tumours [32]. Since these mutations are early events in glioma development, any relationship between risk SNP and molecular profile should provide insight into glial oncogenesis. Evidence for the existence of such subtype specificity is already provided by the association of the 8q24.21 (rs55705857) risk variant with 1p/19q co-deletion, IDH mutated glioma [13]. Additionally, it has been proposed that associations may exist between risk SNPs at 5p15.33, 9p21.3 and 20q13.33 and IDH wild-type glioma [10], as well as 17p13.1 and TERT promoter, IDH mutated glioma without 1p/19q co-deletion [12].

To gain a more comprehensive understanding of the relationship between the 25 glioma risk loci and tumour subtype we analysed three patient series totalling 2648 cases. Since generically the functional basis of GWAS cancer risk loci appear primarily to be through regulatory effects [53], we analysed Hi-C and gene expression data to gain insight into the likely target gene/s of glioma risk SNPs.

Materials and methods

Data sources

We analysed data from three non-overlapping case series: TCGA, French GWAS, French sequencing. Details of these datasets are provided below and are summarised in Table 1.

Table 1 Overview of TCGA, French GWAS and French seq series and mutation status of tumours

TCGA

Raw genotyping files (.CEL) for the Affymetrix Genome-wide version 6 array were downloaded for germline (i.e. normal blood) glioma samples from The Cancer Genome Atlas (TCGA, dbGaP study accession: phs000178.v1.p1). Controls were from publicly accessible genotype data generated by the Wellcome Trust Case–Control Consortium 2 (WTCCC2) analysis of 2699 individuals from the 1958 British birth cohort (1958-BC) [41]. Genotypes were generated using the Affymetrix Power Tools Release 1.20.5 using the Birdseed (v2) calling algorithm (https://www.affymetrix.com/support/developer/powertools/changelog/index.html) and PennCNV [59]. After quality control (Supplementary Figs. 1, 2, Supplementary Table 1) there were 521 TCGA glioma cases and 2648 controls (Table 1). Glioma tumour molecular data (IDH mutation, 1p/19q co-deletion, TERT promoter mutation) were obtained from Ceccarelli et al. [6]. Further data (EGFR amplification/activating mutations, CDKN2A deletion) were obtained from the cBioportal for cancer genomics [15]. After adjustment for principal components there was minimal evidence of over-dispersion inflation (λ = 1.01; Supplementary Fig. 2).

French GWAS

The French-GWAS [25, 44] comprised 1423 patients with newly diagnosed grade II–IV diffuse glioma attending the Service de Neurologie Mazarin, Groupe Hospitalier Pitié-Salpêtrière Paris. The controls (n = 1190) were ascertained from the SU.VI.MAX (SUpplementation en VItamines et MinerauxAntioXydants) study of 12,735 healthy subjects (women aged 35–60 years; men aged 45–60 years) [19]. Tumours from patients were snap-frozen in liquid nitrogen and DNA was extracted using the QIAmp DNA minikit, according to the manufacturer’s instructions (Qiagen, Venlo, LN, USA). DNA was analysed for large-scale copy number variation by comparative genomic hybridisation (CGH) array as previously described [16, 21]. For tumours not analysed by CGH array, 1p/19q co-deletion status was assigned using PCR microsatellites, and EGFR-amplification and CDKN2A-p16-INK4a homozygous deletion by quantitative PCR. IDH1, IDH2 and TERT promoter mutation status was assigned by sequencing [26, 45].

French sequencing

Eight hundred and fifteen patients newly diagnosed grade II–IV diffuse glioma were ascertained through the Service de Neurologie Mazarin, Groupe Hospitalier Pitié-Salpêtrière Paris. Genotypes for the 25 risk SNPs were obtained by universal-tailed amplicon sequencing in conjunction with Miseq technology (Illumina Inc.). Genotypes were called using GATK (Genome Analysis ToolKit, version 3.6-0-g89b7209) software. Duplicated samples and individuals with low call rate (< 90%) were excluded (n = 111). Molecular profiling of tumour samples was carried out as per the French GWAS.

Unrelated French controls were obtained from the 3C Study (Group 2003) [17] a population-based, prospective study of the relationship between vascular factors and dementia being carried out in Bordeaux, Montpellier, and Dijon. Genotyping of controls was performed using Illumina Human 610-Quad BeadChips. To recover untyped genotypes imputation using IMPUTE2 software was performed using 1000 genomes multi-ethnic data (1000 G phase 1 integrated variant set release v3) as reference. SNPs genotypes were retained call rates were > 98%, Hardy–Weinberg equilibrium (HWE) P value > 1 × 10−6, minor allele frequency (MAF) > 1%. After quality control, 704 cases and 5527 controls were available for analysis (Table 1).

Statistical analysis

Test of association between SNP and glioma molecular subgroup was performed using SNPTESTv2.5 [33] under an additive frequentist model. Where appropriate, principal components, generated using common SNPs, were included in the analysis to limit the effects of cryptic population stratification that otherwise might cause inflation of test statistics. Eigenvectors for the TCGA study were inferred using smartpca (part of EIGENSOFTv2.4) [40] by merging cases and controls with phase II HapMap samples [25].

To ensure reliability when restricting cases to per-group low sample counts, imputed genotypes were thresholded at a probability > 0.9 (e.g. –method threshold in SNPtest) for the TCGA and French-GWAS studies. For the French-sequence study we used –method expected, as we were comparing genotypes from directly sequenced cases against imputed controls. We compared control frequencies to those from European 1000 genomes project to ensure the validity of this approach.

Meta-analyses were performed using the fixed-effects inverse-variance method based on the β estimates and standard errors from each study using META v1.6 [30]. Cochran’s Q statistic was used to test for heterogeneity [20].

Risk allele number and age at diagnosis

For imputed SNPs a genotype probability threshold > 0.9 was used. The age and survival distribution of cases carrying additive combinations of risk alleles were assessed for the 25 SNPs across the molecular subgroups. Trend lines were estimated using linear regression in R and plotted using the ggplot2 package [62]. Association between risk allele number and age was assessed using Pearson correlation.

Survival analysis

Survival plots were generated using the survfit package in R which computes an estimate of a survival curve for censored data using the Kaplan–Meier method. Log-rank tests were used to compare curves between groups and power to demonstrate a relationship between different groups and overall survival was estimated using sample size formulae for comparative binomial trials. The Cox proportional-hazards regression model was used to investigate the association between survival and age, grade, molecular group and number of risk alleles. Individuals were excluded if they died within a month of surgery. Date of surgery was used as a proxy for the date of diagnosis.

Expression quantitative trait locus analysis

We searched for expression quantitative trait loci (eQTLs) in 10 brain regions using the V6p GTEx [31] portal (https://gtexportal.org/home/) as well as in whole blood using the blood eQTL browser [61] (https://molgenis58.target.rug.nl/bloodeqtlbrowser/).

Hi-C analysis

We examined for significant contacts between glioma risk SNPs and nearby genes using the HUGIn browser [34], which is based on analysis by Schmitt et al. [48]. We restricted the analysis to Hi-C data generated on H1 Embryonic Stem Cell and Neuronal Progenitor cell lines, as originally described in Dixon et al. [11]. Plotted topologically associating domain (TAD) boundaries were obtained from the insulating score method [8] at 40-kb bin resolution. We searched for significant interactions between bins overlapping the glioma risk SNP and all other bins within 1 Mb at each locus (i.e. “virtual 4C”).

Gene set enrichment analysis

Gene set enrichment analysis (GSEA) was carried out using version 3.0 with gene sets from Molecular Signatures Database (MSigDB) v6.0 [36, 52], restricted to the C2 canonical pathways sets (n = 1329). Analysis was carried out using default settings, with the exception of removing restrictions on gene set size. RSEM normalised mRNASeq expression data for 20,501 genes in 676 glioma cases from TCGA were downloaded from the Broad Institute TCGA GDAC (http://gdac.broadinstitute.org/). These were assigned molecular groupings using sample information from Supplementary Table 1 of Ceccarelli et al. [6].

Results

Descriptive characteristics of datasets

We studied three non-overlapping glioma case–control series of Northern European ancestry totalling 2648 cases and 9365 controls (Table 1). For 1659 of the 2648 cases information on tumour, 1p/19q co-deletion, TERT promoter and IDH mutation status was available (Fig. 1). Using these data allowed definition of five molecular subgroups of glioma: triple-positive (IDH mutated, 1p/19q co-deletion, TERT promoter mutated); TERT-IDH (IDH mutated, TERT promoter mutated, 1p/19q-wild-type); IDH-only (IDH mutated, 1p/19q wild-type, TERT promoter wild-type); TERT-only (TERT promoter mutated, IDH wild-type, 1p/19q wild-type) and triple-negative (IDH wild-type, 1p/19q wild-type, TERT promoter wild-type). As only 29 cases were classified as IDH mutation, 1p/19q co-deletion and TERT promoter wild-type, we restricted subsequent analyses to the five groups as above. Table 1 also shows grouping of the 1960 cases adopting the WHO 2016 classification of glial tumours into five categories (Astrocytoma with IDH mutation, IDH wild-type astrocytoma, Oligodendroglioma with 1p/19q co-deletion, GBM with IDH mutation and IDH wild-type GBM) (Supplementary Table 2 [Online Resource 1]).

Fig. 1
figure 1

Molecular classification of diffuse glioma and frequency of each subgroup in the TCGA, French-GWAS and French sequencing case series

SNP selection

We analysed 25 SNPs, which had been reported to show the strongest genome-wide significant association with glioma in our recent meta-analysis of 12,496 cases and 18,190 controls [35] (Table 2). In the current analysis all of the SNPs exhibited a consistent direction of effect with that previously reported, albeit some weakly [Supplementary Fig. 4 (Online Resource 1), Supplementary Table 3 (Online Resource 2)].

Table 2 Overview of glioma risk SNPs at the 25 loci

Relationship between risk SNP and molecular subgroup

In the first instance, we examined whether the associations at the 25 risk loci were broadly defined by IDH status. We observed significant association for IDH mutated group with 1q44 (rs12076373), 2q33.3 (rs7572263), 3p14.1 (rs11706832), 8q24.21 (rs55705857), 11q21 (rs7107785), 11q23.3 (rs12803321), 14q12 (rs10131032), 15q24.2 (rs77633900) and 17p13.1 (rs78378222) risk SNPs. In addition, we found strong associations with IDH wild-type gliomas at 5p15.33 (rs10069690), 7p11.2 (rs75061358), 9p21.3 (rs634537), and 20q13.33 (rs2297440) (Supplementary Fig. 5 [Online Resource 1], Supplementary Table 3 [Online Resource 2]). Of particular note was the finding that many of the risk loci recently discovered which were reported to be associated with non-GBM (1q44, 2q33.3, 3p14.1, 11q21, 14q12, 15q24.2) [35] showed a strong association with IDH mutant glioma.

Following on from this we performed a more detailed stratified analysis based on classifying the glioma tumours into the five molecularly defined groups. We found a strong association with IDH mutated tumours at 8q24.21 (rs55705857), in particular with triple-positive glioma [P = 1.27 × 10−37, OR = 9.30 (6.61–13.08)], which corresponds to the WHO 2016 oligodendroglioma classification [Supplementary Fig. 6 (Online Resource 1), Supplementary Table 3 (Online Resource 2)]. Furthermore, we confirmed the previously reported associations at 5p15.33 (rs10069690), 9p21.3 (rs634537), 17p13.1 (rs78378222) and 20q13.33 (rs2297440) with TERT-only glioma in each of the three series [12]. Finally, we found suggestive evidence for an association between 22q13.1 (rs2235573) with TERT-only glioma, as well as 11q21 (rs7107785), 11q23.2 (rs648044), and 12q21.2 (rs1275600) with triple-positive glioma [Fig. 2, Supplementary Table 3 (Online Resource 2)].

Fig. 2
figure 2

Association between the 25 risk loci and glioma subgroup. Horizontal red line corresponds to an odds ratio of 1.0

In addition to data on 1p/19q co-deletion, TERT promoter and IDH mutation, for 1955 of the tumours we had information on EGFR amplification and CDKN2A deletion status (Table 1). Using these data we examined for an association with EGFR amplification and CDKN2A deletion, particularly focusing on the 7p11.2 (rs75061358 and rs11979158) and 9p21.3 (rs634537) risk SNPs in view of the fact that these loci map in or near EGFR and CDKN2A, respectively (Supplementary Figs. 7, 8 [Online Resource 1], Supplementary Table 3 [Online Resource 2]). At 7p11.2, the intergenic variant rs75061358, which is located in the genomic vicinity of EGFR, was associated with EGFR amplified tumours and not those without amplification. There was a less strong association with EGFR amplification seen with the second independent signal at the locus defined by rs11979158, which is intronic within EGFR itself. At 9p21.3 rs634537, which is intronic within CDKN2B-AS1 and in the vicinity of CDKN2A and CDKN2B, was not associated with CDKN2A deletion status. Low grade gliomas tend to be EGFR wild-type and p16 wild-type tumours and, therefore, as anticipated many non-GBM risk SNPs were most strongly associated with these tumours; notably 2q33.3 (rs7572263), 3p14.1 (rs11706832), 8q24.21 (rs55705857), 10q25.2 (rs11196067), 11q23.3 (rs12803321) (Supplementary Figs. 7, 8 [Online Resource 1], Supplementary Table 3 [Online Resource 2]).

Polygenic contribution to age at diagnosis and patient survival

Patient survival by molecular subgroup in each of the three series was consistent with previous published reports [5, 12]; specifically, patients with triple-positive tumours had the best prognosis whilst those with TERT-only tumours had the worst outcome (Supplementary Fig. 3 [Online Resource 1]). We investigated whether an increased burden of glioma risk alleles might be associated with earlier age at diagnosis (i.e. indicative of influence on glioma initiation) or survival (indicative of influence on glioma progression). There was a slight albeit, non-significant trend towards decreased age at diagnosis with increased risk allele number in the IDH-only, TERT-only and triple-positive molecular subgroup, but with decreased risk allele number in the TERT-IDH and Triple-negative tumours (Supplementary Fig. 9 [Online Resource 1]). We found no overall relationship between age and risk allele number, or for the individual molecular groups (Supplementary Table 4 [Online Resource 1]). Examining each SNP individually, only rs55705857 at 8q24.21 was nominally associated with age (Supplementary Table 4 [Online Resource 1]).

We used Cox Proportional-Hazards Regression to investigate whether burden of glioma risk was associated with survival, with each risk allele coded as 0, 1 or 2. As expected, age, grade and all molecular group (Triple-negative, Triple-positive, TERT-only, IDH-only and TERT-IDH) were strongly associated with decreased survival. Intriguingly, the number of risk alleles was associated with increased survival (Supplementary Table 5 [Online Resource 1]; P < 10−4) with 1q32.1 (rs4252707), 11q23.3 (rs12803321) and 11q21 (rs7107785) each being nominally associated with survival, independent of age and molecular subgroup. Considering the relationship between burden of glioma risk alleles and survival in each molecular subgroup a consistent association with increased survival was shown in Triple-positive, Triple-negative and TERT-only molecular groups but not in IDH-only and TERT-IDH groups.

Biological inference of risk loci

Since genomic spatial proximity and chromatin looping interactions are fundamental for the regulation of gene expression [42], we interrogated physical interactions at respective risk loci in embryonic stem cells and neuronal progenitor cells using Hi-C data. We also sought to gain insight into the possible biological mechanisms for associations by performing expression quantitative trait locus (eQTL) analysis using mRNA expression data in 10 brain regions using the GTEx portal.

We identified significant Hi-C contacts from the genomic regions which encompass 14 of the 25 risk loci implicating a number of presumptive candidate genes. For two of these, candidacy was supported by eQTL data. (Table 3; Supplementary Fig. 10 [Online Resource 1]; Supplementary Table 6 [Online Resource 3]). Notably at 2q33.3, there was a significant looping interaction between the risk SNP and IDH1/IDH1-AS1, as well as with EGFR/EGFR-AS1 at 7p11.2, CDKN2A/CDKN2B at 9p21.3, NFASC at 1q32.1 and LRIG1 at 3p14.1. At the 8q24.21 gene desert Hi-C data revealed a significant interaction between the risk SNP rs55705857 and MYC, as well as lincRNAs in the region such as PCAT1/PCAT2. Additionally, the risk SNP rs12803321 at 11q23.3 was significantly associated with PHLDB1 expression in the brain.

Table 3 Candidate gene basis of glioma risk loci

Pathway analysis

To potentially gain further insight into the biological basis of subtype associations, we performed a gene-set enrichment analysis (GSEA) analysing gene expression data from TCGA (Supplementary Table 7 [Online Resource 4]). While we did not identify any significantly altered gene sets (at FDR q value < 0.1), the most significantly expressed genes in subgroups was upregulation of PI3K signalling shown in 1p/19q co-deleted tumours (Supplementary Table 7 [Online Resource 4]).

Discussion

Our findings provide further support for subtype-specific associations for glioma risk loci. Specifically, we confirm the strong relationship between the 8q24.21 (rs55705857) risk variant and Triple-positive glioma. Moreover, we substantiate the proposed specific associations between 5p15.33 (rs10069690) and 20q13.33 (rs2297440) variants with TERT promoter mutations, 9p21.3 (rs634537) with TERT-only glioma, as well as 17p13.1 (rs78378222) with TERT-IDH glioma. Other loci such as 1q32.1 (rs4252707) and 10q25.2 (rs11196067) appear to have more generic effects.

Although preliminary, and in part speculative, our analysis delineates potential candidate disease mechanisms across the 25 glioma risk loci (Table 3; Fig. 3). First, maintenance of telomeres is central to cell immortalization [57], and is generally considered to require mutually exclusive mutations in either the TERT promoter or ATRX. The risk alleles at 5p15.33 (TERT) and 10q24.33 (OBFC1) are associated with increased leukocyte telomere length, thereby supporting a relationship between SNP genotype and biology [56, 57, 66]. While dysregulation of the telomere gene RTEL1 has traditionally been assumed to represent the functional basis of the 20q13.33 locus, the glioma risk SNP does not map to the locus associated with telomere length [7, 35]. Intriguingly, our analysis instead implicates STMN3 at 20q13.33, whose over-expression promotes growth in GBM cells [68], suggesting an alternative mechanism by which the risk SNP influences glioma development. With respect to the 5p15.33 (TERT) and 10q24.33 (OBFC1) loci, it is unclear whether the effect on glioma risk is solely due to telomeres or is pleiotropic and involves multiple factors. For example, rs10069690 at 5p15.33 is strongly associated with TERT-only glioma, yet the TERT promoter mutation increases telomerase activity without necessarily affecting telomere length [6]. An intriguing hypothesis to test would, therefore, be to examine the impact of allele-specific effects of rs10069690 on telomere length in the context of gliomas carrying the TERT promoter mutation.

Fig. 3
figure 3

Summary of the relationship between glioma risk with molecular subgroup and associated biological pathways. The extent of the evidence supporting each candidate gene (ranging from an established role in glioma to largely speculative) is summarised in Table 3

Second, the EGFR-AKT pathway involves EGFR at 7p11.2, LRIG1 at 3p14.1, PHLDB1 at 11q23.3 and AKT3 at 1q44. We showed a significant interaction between the risk SNP rs11979158 at 7p11.2 and EGFR, consistent with a cis-regulatory effect on gene expression. Although the mechanistic basis of the 7p11.2 locus has long been suspected to involve EGFR and is highly associated with classical GBM, emerging evidence suggests that additional components of the EGFR-AKT signalling pathway are implicated by non-GBM SNPs. At the IDH-only associated locus 3p14.1, LRIG1 is highly expressed in the brain and negatively regulates the epidermal growth factor receptor (EGFR) signalling pathway [18]. Reduced LRIG1 expression is linked to tumour aggressiveness, temozolomide resistance and radio-resistance [60, 65]. Downstream components of EGFR-AKT signalling are implicated at 11q23.3 via PHLDB1, as well as 1p31.3 via JAK1 and 1q44 via AKT3. The risk allele of rs12803321 is associated with increased expression of PHLDB1, an insulin-responsive protein that enhances Akt activation [70]. AKT3 at 1q44 is highly expressed in the brain and appears to respond to EGF in a PI3K dependent manner [38], with GBM cells containing amplified AKT3 having enhanced DNA repair and resistance to radiation and temozolomide [54]. The risk allele of rs12752552 at 1p31.3 is associated with increased JAK1 expression in brain tissue. Since JAK1 can be activated by EGF phosphorylation, it may be involved in astrocyte formation [3, 39, 50]. The 3p14.1 and 11q23.3 loci are strongly associated with EGFR amplification negative gliomas, with a consistent albeit non-significant trend at 1p31.3 and 1q44, consistent with elevated upstream EGFR activation masking their functional effects.

Third, the NAD pathway involves IDH1 at 2q33.3 and NNMT at 11q23.2. At 2q33.3 we detected a significant Hi-C interaction between the glioma risk SNP rs7572263 and IDH1/IDH1-AS1. Overexpression of IDH1 mutant proteins has been reported to sensitize glioma cells to radiation [29], providing an interesting mechanism to test the allele-specific effects of this SNP. IDH mutation causes de-regulation of NAD signalling [64]. Interestingly, therefore, at 11q23.2 which is strongly associated with IDH mutated gliomas, the most convincing molecular mechanism is via NNMT, which encodes nicotinamide N-methyltransferase and is highly expressed in GBM relative to normal brain, causing methionine depletion-mediated DNA hypomethylation and accelerated tumour growth [23, 55].

Fourth, genes with established roles in neural development may be involved. While the risk SNP rs4252707 at 1q32.1 is within the intron of MDM4, the strongest evidence for a mechanistic effect was with NFASC. Neurofascin is involved in synapse formation during neural development [1] and, therefore, represents an attractive functional candidate for the association with glioma. Additionally at 16p13.3 and 20q13.33, implicated genes SOX8 and STMN3 are strongly expressed in the brain and thought to play a role in neural development [47, 68]. At 10q25.2, implicated gene TCF7L2 modifies beta-catenin signalling and controls oligodendrocyte differentiation [69]. Intriguingly, 10q25.2 has previously been reported to be a risk locus for colorectal cancer [58], a tumour driven by wnt signalling, however, the risk SNP is not correlated with rs11196067 raising the possibility of tissue-specific regulation across the wider region.

Finally, the p53 pathway is involved at 17p13.1, where the risk SNP rs78378222 affects TP53 3′UTR poly-adenylation processing. In addition, the p53 target GLIPR1 [43] is implicated at 12q21.2. Moreover, 12q21.2 is most strongly associated with Triple-positive glioma, which does not feature TP53 mutation, consistent with wild-type p53 protein being required for the SNP to exert a functional effect.

As with many cancers, the exact point at which the risk SNPs exert their functional impact on glioma oncogenesis still remains to be elucidated, and we did not demonstrate a relationship between increased risk allele number and age at diagnosis. Surprisingly we found a significant association between increasing risk allele number and improved outcome. This result was consistent across the prognostic molecular groups, consistent with our observations not being due to an over-representation of the more favourable prognostic groups among patients with a higher burden of risk alleles. In addition, the distribution of risk allele numbers did not differ across the four groups (P = 0.3, ANOVA test). Examining the impact of an individual SNP’s impact on survival did not reveal any loci strongly associated with outcome. Collectively our findings suggest that, independent of other prognostic factors, the greater the number of risk alleles carried, the better the outcome.

In conclusion, we performed the most comprehensive association study between molecular subgroup and the 25 recently identified glioma risk loci to date. While confirming previous observations, we show that the majority of risk loci are associated with IDH mutation. Through the integration of Hi-C and eQTL data, we have additionally sought to define candidate target genes underlying the associations. Collectively our observations highlight pathways critical to glioma susceptibility, notably neural development and NAD metabolism, as well as EGFR-AKT signalling. Intriguingly, we show here that the number of risk alleles is consistently associated with better outcome. Functional investigation in tumour and neural progenitor-based systems will be required to more fully elucidate these molecular mechanisms. Notably, IDH mutant tumours have been shown to reshape 3D chromatin organisation and may reveal new regulatory interactions [14].

Our current analysis is based on defining glioma subgroups using only three primary markers. Given the extent of the missing heritability for glioma further expansion of GWAS by international consortia [35] is likely to result in the identification of additional risk variants. Additional molecular sub-grouping glioma resulting from ongoing large-scale tumour sequencing projects is likely to provide for further insights into glial oncogenesis and ultimately may suggest targets for novel therapeutic strategies.