Introduction

Leukemia, the most common cancer in children, is characterized by dysregulated proliferation of clonally expanded immature lymphoid or myeloid progenitor cells that have encountered a series of catastrophic alternations within key regulatory genes [1]. Within the major subtypes of childhood leukemia defined by cell lineage, acute lymphoblastic leukemia (ALL) is most common comprising nearly 80 % of diagnoses in developed countries [2]. In most childhood leukemia cases, characteristic genetic alterations are observed, including numerical and structural chromosomal changes such as hyperdiploidy (>46 chromosomes) or translocations, as well as the more subtle changes in the form of point mutations and gene deletions [3]. Translocations are considered a hallmark genetic event in leukemia, for example, the t(12;21) translocation (TEL-AML1) commonly observed in B-cell lineage ALL and the 11q23/MLL gene rearrangements in infant leukemia [4].

Confirmed clinical and epidemiologic associations including sex, age, race, exposure to ionizing radiation in utero, postnatal high-dose radiation, chemotherapeutic agents, and several genetic syndromes, explain only a small proportion of childhood ALL cases diagnosed [2]. The early age of onset of childhood ALL suggests that inherited genetic traits may play a role, and recent evidence indicates that these contribute to a substantial proportion of the variation in childhood ALL risk [5, 6]. Such genetic traits can range from predisposing rare highly penetrant mutations to more common low penetrance genetic polymorphisms. While a considerable excess risk of childhood ALL has been observed among monozygotic twins compared to dizygotic twins of ALL patients suggesting heritability, recent studies indicate that this excess risk may be due more to intraplacental metastasis rather than highly penetrant risk alleles [7]. Thus, for childhood ALL, as for other multi-factorial diseases, inherited risk alleles are likely to be low penetrance susceptibility alleles that interact with environmental factors to modulate disease risk.

In this review, we focused on evaluating the current published epidemiological literature that has evaluated the influence of inherited (germline) genetic variation on childhood ALL risk. This comprises two types of studies broadly classified as either candidate gene association studies or genome-wide association studies (GWAS).

Studies utilizing the traditional candidate gene approach are initiated with specific a priori hypotheses based on known biological functions of the genes that have relevance to proposed disease pathology. This type of study more commonly focuses on specific genetic variants, such as single nucleotide polymorphisms (SNPs) and/or insertion/deletions that are known to correlate with expression and/or function of the resultant protein product. In addition, some candidate gene studies have adopted a haplotype tagging approach built on the principle that segments of the genome are arranged into distinct haplotype blocks defined by the level of linkage disequilibrium (LD) exhibited between neighboring genetic markers [8]. Thus, a candidate region can be interrogated without necessarily having previously identified a functional genetic variant, but any positive results from the study would, at most, only localize a region of potential association, requiring replication and additional fine-mapping efforts to identify the causal variant.

The genome-wide approach is similarly based around this concept, but hypothesis testing is completely agnostic in that a large number of variants that represent the genetic diversity of the entire genome, regardless of candidate gene status, are assembled and evaluated in one study [9].

To date, a large number of studies have been published covering a broad range of genes. In this review, we focus our discussion on the most recent publications while referencing previous reviews and meta-analyses [5, 1012]. Our objective is to provide the reader with an updated account of the current state of evidence regarding the genetic susceptibility to childhood ALL.

Candidate gene association studies

The numbers of candidate gene studies continue to increase. As reported in a previous review [5], a search of the literature published prior to April 2008 identified 59 articles examining the association between variants of 36 different candidate gene loci and childhood ALL risk (blue bars in Fig. 1). As of October 2012, a systematic search of the PubMed database for all ALL risk candidate gene study articles published since April 2008 identified an additional 76 articles reporting on studies conducted within populations from 24 countries. These studies examined about 450 additional previously unevaluated genes in childhood leukemia, but this seemingly dramatic increase is influenced by select studies [1319] that have made use of high-throughput genotyping platforms based around a haplotype tagging SNP approach. With the exception of a few studies [2023], all utilized the case–control study design to evaluate the associations. Consistency in results across multiple studies is essential to the process of establishing associations, particularly in observational studies of the type reviewed here. Thus, we focused our discussion around genes for which more than three studies have been published (Tables 1, 2; Fig. 1).

Fig. 1
figure 1

Plot summarizing the number of reports published for each gene arranged by candidate pathway. Genes having greater than three previous publications were reviewed in this article and are represented by bars exceeding the gray horizontal line. The numbers of reports counted by a previous review as of April 2008 [5] are represented in blue, and the numbers of reports published subsequently as of October 2012 are represented in red. The asterisks (*) are genes with variants that have been evaluated previously with meta-analysis. The “Other” (**) category refers to genes that have been reported by fewer than three previous publications

Table 1 Evidence from candidate gene association studies for gene variants in the folate pathway that have been evaluated with meta-analysis or have greater than three previous publications
Table 2 Evidence from candidate gene association studies for gene variants in the xenobiotic transport/metabolism and DNA repair pathways that have been evaluated with meta-analysis or have greater than three previous publications

Folate metabolism

Folate and its bioactive metabolic substrates are essential to numerous bodily functions, particularly for their role in DNA methylation and synthesis that aid the rapid cell division and growth requirements associated with pregnancy and early infancy [24, 25]. Folate deficiency may contribute to carcinogenesis via hypomethylation of important regulatory genes as well as induction of DNA damage through uracil misincorporation during DNA replication [24]. Variants in more than a dozen genes, including those encoding the methylenetetrahydrofolate reductase (MTHFR), may alter folate metabolism and contribute to the risk of childhood leukemia.

Among candidate gene studies in childhood ALL, MTHFR is the most frequently studied gene and has largely focused on two functional SNPs associated with reduced enzymatic activities, C677T (rs1801133) and A1298C (rs1801131) (Table 1). Reports from three meta-analyses published during this past year showed consistent results, including no evidence of association for A1298C [2628], but marginally significant reduced risks of childhood ALL for C677T (allele contrast, OR = 0.90, 95 % CI 0.82–1.00) [26, 28]. Using meta-regression, Wang et al. [26] found a potential gender effect associated with C677T and suggested this as a source of between-study heterogeneity in results. Zintzaras et al. [28], despite their marginally significant results particularly in white children, interpreted their meta-analysis with caution due to weak cumulative meta-analysis findings. This conclusion is in line with studies that were published in 2012 subsequent to these meta-analyses in which all five reports showed no association with either C677T or A1298C [20, 2932]. Several reasons for this discrepancy are acknowledged, including a potential sex and race/ethnicity specific association for C677T, and important interactions with dietary folate intake that may influence the associations. Current evidence indicates that the effects of the reduced activity of MTHFR conferred by C677T may have greater impact in situations of serum folate deficiency [33, 34]. However, this has not been confirmed in more recent studies of MTHFR-folate gene–environment interaction [29, 31, 35, 36] and continues to be in a relationship that is difficult to evaluate due to widespread dietary folate fortification and education efforts beginning in the mid-1990s.

Other key genes in the folate metabolic pathway examined in previous studies include the solute carrier family 19 member 1 (SLC19A1) gene responsible for membrane transport of folate into the cell, methionine synthase (MTR) and methionine synthase reductase (MTRR) genes involved in methionine biosynthesis that influence DNA methylation, and thymidylate synthetase (TYMS), serine hydroxymethyltransferase 1 (SHMT1), and methylenetetrahydrofolate dehydrogenase 1 (MTHFD1) that contribute to DNA synthesis and replication (Table 1). Of these genes, a recent meta-analysis [12] reported statistically significant (P < 0.05) ALL associations with functional SNPs of SLC19A1 (G80A/rs1051266, dominant: OR = 1.37, 95 % CI 1.11–1.69), MTRR (A66G/rs1801394, dominant: OR = 0.73, 95 % CI 0.59–0.91), and SHMT1 (C1420T/rs1051266, CT vs. CC: OR = 0.79, 95 % CI 0.65–0.98) showing minimal evidence of heterogeneity in effect across studies. Subsequent case–control studies of MTRR A66G and SHMT1 C1420T are partly supportive, but the four additional reports on SLC19A1 G80A are inconsistent with both increased [37, 38] and reduced [39] risk estimates reported. Meta-analyses for MTR A2756G (rs1805087) and the TYMS 28 bp repeat [12] including four studies each, do not support an association. However, a recent UK study [40] comprising a larger number of cases (939 cases and 824 controls) than the meta-analysis reported a strong association between MTR A2756G and ALL risk (GG vs. AA, OR = 1.88, 95 % CI 1.16–3.07), particularly pronounced for MLL-positive leukemia (OR = 4.90, 95 % CI 1.30–18.45). For TYMS, the UK study also found a statistically significant increased risk associated with homozygous carriers of the 1494del6 deletion (rs16430, OR = 1.46, 95 % CI 1.02–2.08), but this is not supported by finding from three other smaller studies [37, 39, 41]. Finally, the five studies reporting results for MTHFD1 G401A (rs1950902) [31, 41] and/or G1958A (rs2236225) [37, 39, 41] provide little evidence of an association with childhood ALL.

Xenobiotic metabolism and transport

In order to exert their effects, potentially harmful chemicals (xenobiotics) must gain entry into target cells via membrane transporters and undergo cellular metabolic processes that alter activity. The complete metabolism of xenobiotic compounds is divided into two phases, each utilizing different sets of metabolic enzymes. The metabolic activation of the xenobiotic performed by the phase I (bioactivation) enzymes are usually necessary in order for the phase II (detoxification) enzymes to convert this activated intermediate into a detoxified water-soluble compound that is eliminated from the cell. Genetic polymorphisms that disrupt the equilibrium between these two phases may compromise the hosts’ ability to respond sufficiently to xenobiotics and may potentially increase the hosts’ susceptibility to developing cancer.

Of the xenobiotic membrane transporters, the ATP-binding cassette sub-family B member 1 (ABCB1 or MDR1) gene has received the most attention in childhood leukemia, particularly the C3435T (rs1045642) and G2677T/A (rs2032582) variants, both of which have been linked to gene function in numerous studies [42, 43] (Table 2). Results of two meta-analyses [12, 44] of the five previous reports are both suggestive of an increased risk of childhood ALL associated with homozygous carriers of C3435T (OR = 1.27, 95 % CI 0.99–1.63), but when considered with two subsequent studies showing non-significant associations [16, 39], interpretation remains inconclusive. Meta-analysis of two studies for G2677T/A showed no association [12].

Several variants within the phase I metabolism cytochrome P450 (CYP) family of genes have been evaluated in childhood ALL. In 2010, a statistically significant increased risk of childhood ALL associated with CYP1A1 T6235C (rs4646903, dominant: OR = 1.36, 95 % CI 1.11–1.66) was reported based on a meta-analysis of 7 studies [12], and more recently has been confirmed in another meta-analysis comprising a total of 12 studies (dominant: OR = 1.31, 95 % CI 1.08–1.59) (Table 2) [45]. A second well-characterized variant in this gene, A4889G (rs1048943) evaluated by a fewer number of studies showed an elevated, but non-significant, risk estimate (homozygous: OR = 1.96, 95 % CI 0.92–4.19) in a meta-analysis of three studies [12]. A subsequent study conducted in a multi-ethnic US population of 258 B-lineage ALL cases and 646 matched controls reported a strong increased risk associated with A4889G observed predominantly in Hispanics (dominant: OR = 2.47, 95 % CI 1.13–5.38) [46]. Elevated risk estimates were reported in recent studies conducted in Brazil [47] and a multi-ethnic California population [16], but not in a Korean study [48]. Evidence for an association with CYP2E1*5B, a restriction fragment length polymorphism at G-1293C/C-1053T (rs3813867/rs2031920), is provided by a meta-analysis of four studies showing a strong increased risk (dominant: OR = 1.99, 95 % CI 1.32–3.00), and one of two additional studies reported results in a consistent direction [16, 49].

Regarding other phase I metabolism genes previously examined, available evidence has been less consistent for the CYP2D6, CYP3A5, and epoxide hydrolase 1 (EPHX1) genes. Meta-analyses of two variants in CYP2D6, G1934A (rs3892097) and del2637 (rs35742686) [12], each based on results from three studies, show weak evidence of an association, which is reinforced by inconclusive results in a recent study [50] (Table 2). Statistically significant risk estimates have been reported in both directions for CYP3A5 G6986A (rs776746) [16, 5153] and EPHX1 T339C (rs1051740) [16, 49, 50, 54, 55], and four studies evaluating EPHX1 A418G (rs2234922) have shown largely non-significant results [49, 50, 54, 55].

Assessment of the accumulating evidence for genetic variation within several phase II metabolic genes suggests an association with the widely studied glutathione S-transferase class mu (GSTM1) gene deletion, and N-acetyltransferase 2 (NAT2) slow acetylator genotype (Table 2). A meta-analysis of 15 studies showed a statistically significant increased risk associated with the GSTM1 deletion (OR = 1.16, 95 % CI 1.04–1.30) [12], and an additional 3 of the 5 subsequent studies reported a statistically significant elevated risk [16, 37, 39, 56, 57]. In contrast, the risk associations for the GSTT1 deletion and two GSTP1 variants, A1578G (rs1695) and C2293T (1138272), are not supported by recent meta-analysis results [12] and by studies published subsequently [16, 37, 39, 56, 57].

Recently, an increasing number of studies have evaluated the effect of the slow NAT2 acetylator genotype on childhood leukemia risk (Table 2). The class of slow acetylator alleles (NAT2*5A-5C, *6A, and *7B) is represented by combinations of polymorphic sites (C282T, rs1041983; T341C, rs1801280; C481T, rs1799929; G590A, rs1799930; A803G, rs1208; and G857A, rs1799931). While previous studies have used slightly varying classifications of acetylation status for NAT2, three studies have reported statistically significant increased ALL risks associated with slow acetylator alleles [49, 58, 59], and two studies observed elevated but non-significant risk estimates [16, 53].

Evidence from a meta-analysis of 6 studies indicated a marginally significant association for NAD(P)H dehydrogenase quinone type 1 (NQO1) C609T (rs1800566, dominant: OR = 1.24, 95 % CI 1.02–1.50), but results from a large number of studies published more recently appear inconsistent, with statistically significant result reported in both directions [16, 37, 39, 47, 49, 50, 56, 60] (Table 2). One earlier meta-analysis examining the C609T effect by leukemia subtype reported no association in childhood ALL overall, but an increased risk of MLL translocation positive leukemia [61]. Heterogeneity in results across studies for C609T may be influenced by the specificity/distribution of leukemia subtype characteristic of each study population. Association with a second NQO1 variant, C465T (rs4986998), appear suggestive of an increased risk based on two studies [12], but more studies are needed for confirmation.

DNA repair

Childhood ALL results from chromosomal alterations and somatic mutations that disrupt the normal process by which lymphoid progenitor cells differentiate and senesce [1]. These are the result of unrepaired DNA damage such as double-strand breaks (DSB) [62]. Among the few established risk factors for childhood ALL are exposure to ionizing radiation and certain chemotherapeutic agents [10], which are well-known genotoxic exposures. Repair of DNA damage is critical [63], thus, alterations in innate DNA repair pathways including nucleotide excision repair (NER), mismatch repair (MMR), and DSB repair may play a role in leukemia development.

Two independently conducted meta-analyses have recently reported statistically significant increased risks associated with X-ray repair cross-complementing group 1 (XRCC1) G28152A (rs25487) [64, 65] (Table 2). Subgroup analyses indicated an effect predominantly in Asians in both reports. No associations with childhood ALL were found for the two other widely studied XRCC1 variants: C26304T (rs1799782), which showed heterogeneity in results across six studies [64, 65], and G27466A (rs25489) examined in fewer studies [64]. Meta-analysis of results from two studies [12], together with two additional reports [15, 66], shows weak evidence of an association with variants of the excision repair cross-complementing group 2 (ERCC2 or XPD) gene (G23591A, rs1799793 and A35931C, rs13181).

Immune response

Exposure to common infections and the role of immune-related processes have emerged as strong candidate risk factors for childhood ALL [67]. One prevailing hypothesis (the delayed infection hypothesis) suggests that a delay in exposure to immune-modulating factors (e.g., exposure to infections) early in life leaves the immune network under-modulated, and subsequent exposure to infections may result in an adverse immune response that gives proliferative advantage to pre-leukemia cells [68]. An immune response to a foreign antigen involves a complex cascade of events beginning with the activation of T lymphocytes and accompanied by vast production/secretion of cytokines and recruitment of other immune cells. Genetic variation influencing immunological pathways including innate and adaptive immunity may affect ALL susceptibility.

Due to its highly polymorphic nature and central role in immune response, the human leukocyte antigen (HLA) genes were one of the first loci within the immune response pathway to be examined in childhood leukemia. HLA is polygenic and broadly classified into class I (HLA-A, B, and C) and II (HLA-DR, DQ, and DP) genes which encode cell surface glycoproteins that bind and present processed antigens to T lymphocytes crucial to both cellular and humoral immune response [69]. Adding to the complexity, HLA alleles are defined based on combinations of genetic variants (haplotypes) that reside within the region encoding the antigen binding groove of the HLA molecule. Thus, HLA genes are multi-allelic with up to many hundreds of alleles segregating at a single locus (http://www.ebi.ac.uk/imgt/hla/).

The HLA-DR53 antigen encoded by the HLA-DRB4 locus which exists only on haplotypes possessing HLA-DRB1*04, *07 and *09 has been associated with increased risks for the major types of leukemia in adults and children [70], including evidence from two studies of childhood ALL indicating a male-specific increase in HLA-DRB4 alleles in cases compared to controls [70, 71]. More recently, two additional studies showed increased risks associated with HLA-DRB1*04 in an Iranian [72] and European American population [73]. The HLA-DRB1*15 allele was also identified as a risk locus in two studies [74, 75]. A second HLA gene, HLA-DPB1, has also emerged as a potential ALL-associated locus based on results from three studies [7678].

Summary

Based on single-marker studies, a number of genes appear interesting with the most consistency across studies observed for MTHFR C677T, CYP1A1 T6235C, the GSTM1 deletion, NAT2*5, XRCC1 G28152A and HLA-DRB4 encoding the HLA-DR53 antigen. Potentially emerging associations that would benefit from additional supportive data include MTR A2756G, MTRR A66G, SHMT1 C1420T, SLC19A1 G80A of the folate pathway, and ABCB1 C3435T, CYP1A1 A4889G, CYP2E1*5B, and NQO1 C609T of the xenobiotic transport/metabolism pathway. Also, a number of studies showing interesting gene–environment and gene–gene interaction have surfaced, but sufficient numbers of comparable reports are not yet available. Since we focus this review on genes with more than three reports, we acknowledge that there are many other isolated reports of relevant gene associations, including haplotype-based associations, not mentioned in this review that show promise as a potential disease locus.

Genome-wide association studies

Results from the first GWAS in childhood ALL were reported in 2009 by two studies conducted independently in populations of European ancestry [79, 80] (Table 3). Taking a meta-analytic approach and combining the results for two separate GWAS analyses that comprised a total of 907 ALL cases and 2,398 controls, Papaemmanuil et al. [79] identified genome-wide significant (P threshold <5 × 10−7) associations within and/or near the Ikaros family zinc finger 1 (IKZF1, chromosomal region 7p12.21) gene, the AT-rich interactive domain 5B (ARID5B, chromosomal region 10q21.2) gene, and CCAAT/enhancer-binding protein epsilon (CEBPE, chromosomal region 14q11.2). Described in a concurrent report, Trevino et al. [80] also identified associated regions within ARID5B and IKZF1 in their discovery population of 317 ALL cases and 17,958 controls. They performed a validation in a case-only series of 124 ALL patients and showed that the ARID5B association clearly distinguishes B-cell hyperdiploidy from other subtypes. The IKZF1 association appeared to predominantly affect B-cell ALL in the discovery study, but this was not clearly distinguished in the validation series. These subtype specific effects were also reported in the Papaemmanuil et al. study. Together, the two studies offered the first unequivocal evidence of a role for inherited genetic susceptibility in childhood ALL risk conferred by inter-individual variation within specific genomic regions and indicating a likely role of IKZF1, ARID5B, and CEBPE. Subsequently, independent replication studies have shown consistent associations in populations from Poland [81], Germany [82], the UK [82], Thailand [83] and among the US infants [84] for IKZF1 and ARID5B, and additionally in Canada [85] and a US population of African ancestry [86] for ARID5B. Replication of the CEPBE risk variant has been less consistent with one successful attempt in a German population [82], but as a variant associated with a more modest risk estimate, lack of statistical power cannot be ruled out [81, 83].

Table 3 Summary of results for previous genome-wide association studies of childhood ALL risk

Based on the initial GWAS from the Papaemmanuil et al. study conducted in the UK, Sherborne et al. [87] pursued a replication attempt for 34 of the top P value-ranked SNPs using multiple independent populations and identified a fourth associated region that localized to the cyclin-dependent kinase inhibitor 2A (CDKN2A, chromosomal region 9p21.3) gene (Table 3). This association remained highly significant in both B- and T-cell lineage ALL which is in contrast to the findings for IKZF1 and ARID5B found to be mostly B-cell subtype specific.

Focused on TEL-AML1-positive childhood ALL, Ellinghaus et al. [88] conducted a GWAS comprising a discovery series of 419 German cases and 474 controls, followed by 2 independent replication series comprising 951 cases and 3,061 controls from Germany, Austria, and Italy (Table 3). In addition to confirming the associations at the three previously reported loci (IKZF1, ARID5B, and CEBPE), they identified four additional associations that localize to tumor protein p63 (TP63, chromosomal region 3q28), protein tyrosine phosphatase receptor type J (PTPRJ, chromosomal region 11p11.2), olfactory receptor family 8 subfamily U member 8 (OR8U8, chromosomal region 11q11), and integrator complex subunit 10 (INTS10, chromosomal region 8p21.3) genes, the latter 2 of which showed heterogeneity between the German/Austrian and Italian replication series.

In a discovery series of 441 French ALL cases and 1,542 controls, Orsi et al. [89] performed a GWAS and included a replication set comprising 390 Australian ALL cases and 1,202 controls (Table 3). Their strongest findings were consistent with the previously identified IKZF1 and ARID5B regions, followed by suggestive results localizing two potentially novel regions near ankyrin repeat domain 44 (ANKRD44, chromosomal region 2q33.1) and solute carrier family 16, member 14 (SLC16A14, chromosomal region 2q36.3). However, these latter two loci did not successfully replicate. Their study also provided additional support for the CDKN2A and CEBPE ALL-associated loci, but TEL-AML1-positive ALL-specific results did not confirm the associations for OR8U8, INTS10, and PTPRJ previously identified [88].

The first Asian childhood ALL GWAS was conducted in a Korean population by Han et al. [90] comprising 50 cases and 50 controls (Table 3). Even in this relatively small study, associations for ARID5B and CEBPE, but not IKZF1, were detected and indicated a role in ALL risk in this population. Using the false discovery rate method to adjust for multiple testing, they report potential ALL associations with loci that map to mannosidase alpha class 2A member 1 (MAN2A1, chromosomal region 5q21), hydroxyacid oxidase 1 (HAO1, chromosomal region 20p12), chromosome 2 open reading frame 3 (C2orf3, chromosomal region 2p12), and erythrocyte membrane protein ban 4.1-like 2 (EPB41L2, chromosomal region 6q23) genes. Given the potential heterogeneity in associations across race/ethnicities, these novel loci warrant further consideration for replication in comparable populations.

Summary

Collectively based on these five previous GWAS and various follow-up analyses, strong evidence of an association with ALL overall and/or specific ALL subtypes is available for four independent loci implicating a role for the IKZF1, ARID5B, CEBPE, and CDKN2A genes. Strong evidence is also available for another four loci around the TP63, PTPRJ, OR8U8, and INTS10 genes found associated with TEL-AML1-positive ALL, but additional confirmation appears necessary. The IKAROS transcription factor encoded by the IKZF1 gene is a regulator of lymphocyte differentiation. In germline mutant mice exhibiting loss of IKZF1 expression, lymphocyte development is inhibited and leads to an aggressive form of lymphoblastic leukemia [79, 91]. In humans, chromosomal deletions involving IKZF1 are observed in about a third of high-risk B-cell precursor ALL and a large proportion of BCR-ABL1-positive ALL patients [92, 93]. As a gene found to be involved in embryogenesis and growth retardation, the putative contributions of ARID5B to leukemogenesis is less understood. However, animal studies have shown defects in the B lymphoid compartment in Arid5b knockout mice [94], and ARID5B expression has been shown to be up-regulated in acute promyelocytic leukemia [95]. A third gene involved in lymphoid development and/or leukemogenesis, CEBPE, is a known target of translocations in B-cell precursor ALL [96]. CDKN2A is a tumor suppressor gene that encodes p16, a negative regulator of cyclin-dependent kinases, and p14, an activator of p53. Deletion of CDKN2A is among the most common genetic events in childhood B- and T-lineage ALL [87, 97]. The potential role for OR8U8 and INTS10 annotated to SNPs found to be associated with TEL-AML1-positive ALL is less clear. Similarly, the direct mechanisms for an involvement of TP63 and PTPRJ in leukemogenesis are not well understood, but TP63 has been shown to possess features of a tumor suppressor, and PTPRJ involvement has been described in a number of cancers [88]. In addition, mice with impaired protein tyrosine phosphatase encoded by PTPRJ have been shown to display a partial peripheral B-cell developmental block [98].

Discussion

The strengths and limitations of each of the candidate gene and genome-wide approaches suggest that they should be viewed as complementary strategies to an integrated effort in identifying disease-associated loci. While the candidate gene approach allows the testing of specific hypothesis-driven questions without the stringent P value requirements as those for genome-wide studies, success is dependent on selecting the correct gene (among thousands) and the markers that accurately capture the genetic variation of the locus of interest. In this respect, the agnostic genome-wide strategy has a major advantage, and any finding that is statistically significant at the genome-wide level will have a very high probability of being a true association in that study population. One limitation of the genome-wide approach that can be addressed by the candidate gene studies is the lack of adequate coverage of certain regions of the genome by the currently available genome-wide arrays, including for multi-allelic and non-SNP markers and certain rare variants. An example is in the evaluation of the HLA [73] and killer-cell immunoglobulin-like receptor (KIR) [99] genes. While the hope is that any association with these loci would be captured by tag SNPs within the region [100], their highly multi-allelic nature and a potential involvement of rare alleles may require a focused examination using specialized methods.

Although GWAS are widely recognized as rigorous in their approach to identifying disease loci [101], the P value requirements for defining a significant association, in return, increases the probability of missing a true association as recently demonstrated specifically in childhood leukemia. Using previously derived methods [102, 103] applied to an analysis of nearly 250,000 SNP genotypes from the UK GWAS [79], Enciso-Mora et al. [6] estimated that about 24 % of the total variation in childhood B-cell precursor ALL risk is accounted for by common genetic variation, and the previously identified loci (IKZF1, ARID5B, CEBPE, and CDKN2A) explain only 8 % of this total. This study provides evidence for a polygenic mechanism of susceptibility and rationale for continued investigation of additional susceptibility loci that were likely missed by previous GWAS.

Areas of ongoing and future work

Two main areas to emphasize for ongoing and future efforts are: (1) attempt to explain more of the remaining total variation of childhood ALL risk that is estimated to be due common genetic variants by identifying additional associated loci, and (2) to characterize the identified regions through fine-mapping and functional studies.

Undetected associations due to insufficient sample sizes, which pertain to both genome-wide and candidate gene studies, is a central issue underscored by both the Enciso-Mora et al. report [6] and previous reviews [5, 104]. Power calculations have shown that even the largest GWAS conducted to date (907 cases and 2,398 controls) had limited ability to detect variants conferring relative risks of 1.4 or below and/or those with allele frequencies of <0.10 [6, 104]. Although a candidate gene study hoping to detect an SNP association with an allele frequency of 0.10 conferring a relative risk of 1.4 (assuming power = 0.80 and alpha = 0.05) would require fewer subjects, the sample size would still be substantial: about 600 cases and 600 controls. Furthermore, successful detection depends on whether the correct region and variants were selected a priori [5]. It is important to note, as well, that studies utilizing small sample sizes are sensitive to even small amounts of bias in genotyping or study design, potentially leading to erroneous results. This is likely a common phenomenon among the previous candidate genes studies reviewed here; about 40 % of studies published within the last 4 years included fewer than 150 cases. These requirements, together with the rarity of childhood leukemia and importance of independent validation [105], amount to advocating coordinated studies and collaborations. The Childhood Leukemia International Consortium (CLIC, https://ccls.berkeley.edu/clic), which currently includes 22 epidemiological studies of childhood leukemia representing 12 countries, was formed to facilitate collaborative efforts such as those needed for genetic studies.

Identification of additional genetic loci will undoubtedly come from rigorous scrutiny of gene–environment and gene–gene interactions. Therefore, it is very important that epidemiological studies continue to collect high-quality environmental, lifestyle, and other exposure data along with the biological samples as a source of DNA. Studies based solely on genetic data, with no collection of environmental data, will be unable to shed light on this critical area. However, it should be noted that widespread testing for combined effects of genes and environmental factors, as well as gene–gene interactions, dramatically increases the number of tests performed, thus compounding the multiple testing problem. Statistical methods for performing such large-scale investigations are being developed [106]. In addition, investigators should develop plans for testing and replicating a limited number of interactions a priori.

While the initiation of new GWAS, particularly in populations of non-European ancestry, still has tremendous potential to yield discoveries in childhood leukemia, the field of genetic epidemiology for the more common cancers and non-cancer phenotypes is now in the “post-GWAS” era [107]. Unlike candidate gene studies where the scientific inquiry was commonly initiated by first having some evidence of a functional consequence of the candidate locus, the genome-wide approach works in reverse order requiring fine-mapping of the localized region to identify the causal variant and subsequent functional characterization to identify the causal genes. The various activities associated with the follow-up of GWAS-identified susceptibility loci are broadly referred to as “post-GWAS” investigations. Data generated from the ongoing 1000 Genomes Project [108], an international collaborative effort with publicly available resources that aims to provide a deep characterization of human genome sequence variation, can aid in fine-mapping efforts.

Complementary to this and for interrogating certain intergenic regions and gene deserts potentially not addressed by the 1000 Genomes Project, targeted re-sequencing can be undertaken using next generation sequencing technology [109]. The next challenge of identifying/confirming the genes that are regulated by the variants can be addressed using various controlled laboratory experiments, but one initial approach can be to evaluate its association with gene expression. This has already been initiated by previous GWAS investigations for some of the associated SNPs [79, 80, 88]. SNP rs4132601 mapping to the IKZF1 gene region showed significant correlation with IKZF1 mRNA expression levels in Epstein–Barr virus-transformed lymphocytes, strongly implicating IKZF1 as the causal locus tagged by the SNP [79]. Fine-mapping effort for the ARID5B SNPs (rs10821936 and rs10994982) did not reveal coding variants in LD, but gene expression studies showed a correlation with global gene expression pattern specifically in B-hyperdiploid ALL blast cells [80]. In addition, existing data from large-scale efforts are available that identify expression quantitative trait loci (eQTLs) [110], genetic variants that tag regions of the genome controlling mRNA expression. GENEVAR is one example of a resource that provides a catalog of information of known eQTLs in select tissue types [111]. Using this resource to examine the four TEL-AML1-positive ALL-associated SNPs, significant correlations with expression levels of genes within the localized regions were not observed [88]. Necessary considerations in the interpretation of these data include that eQTLs may be tissue-specific, can vary in time, and are likely to differ between tumor and normal tissue.

Concluding remarks

Just within the past few years, there have been a large number of reports on the genetic susceptibility to childhood ALL, attempting both to confirm candidate gene associations, as well as to make new discoveries by testing novel candidate loci and conducting large-scale GWAS. Such efforts have proven tremendously meaningful, to the point that we now have confirmation of an inherited genetic basis for childhood ALL and have come closer to establishing some candidate genes as risk loci. While most studies have focused on SNPs, which has been the topic of this review, it should be acknowledged that other types of inherited genetic variation may contribute to the overall heritability of childhood ALL, such as epigenetic alterations and copy number variations [112]. Understanding of the influence of these factors in childhood leukemia is growing and it will undoubtedly contribute to major advances in the near future.

Findings described in this review still lie within the realm of research and have not yet yielded tangible clinical results. However, the expectations are that these studies will advance our understanding of the detailed causal pathways leading to disease, and eventually contribute to the development of novel therapeutics, identification of biomarkers for refined disease prediction, and monitoring of disease progression and treatment response [9]. Indeed, this improved knowledge of causal pathways may also reveal clues about modifiable environmental and lifestyle risk factors that can be used to develop public health-based prevention measures. The last few years has seen significant progress in the genetic epidemiology of childhood leukemia, largely fueled by technological advances in genetic analysis and global collaborative efforts in genomics such as the International HapMap and 1000 Genomes Projects. We suspect that with decreasing costs in the use of recent genomic platforms, an increased recognition of the importance of collaboration for validation and to achieve maximal sample sizes, and increased public access to large genomic datasets, the next few years will see even more productivity.