Introduction

Chromosomal translocations encompassing the mixed lineage leukemia (MLL) gene (also known as MLL1, HRX, HTRX, KMT2A, and ALL1) generate various MLL fusion genes [13]. To date, more than 70 MLL fusion genes have been reported [4]. Leukemia associated with MLL gene alterations (hereafter referred to as MLL-associated leukemia) accounts for ~5–10 % of total acute leukemia cases and is a major cause of infant acute lymphoblastic leukemia [5]. Clinical outcomes of MLL-associated leukemia are often unfavorable; therefore, the development of better therapeutic strategies is needed.

Significant progress in understanding MLL-associated leukemia has been achieved in the past two decades. The coding sequence of the MLL gene was established in the early 1990s [1, 2]. The first mouse model of MLL-associated leukemia using retroviral gene transduction or knock-in strategies was generated in the late 1990s [6, 7], and several other sophisticated disease models using murine cells [812] and human cells [13, 14] that mimic the human disease have been developed. Recent technological advances in genetics, such as DNA microarray and short hairpin RNA library screening have enabled identification of a number of novel pathways that play critical roles in the development of MLL-associated leukemia, and these are reviewed in detail elsewhere [15]. In this review, I focus on the mechanistic aspects of MLL fusion-dependent leukemic transformation.

MLL activates transcription of cellular memory genes

MLL is structurally similar to the Drosophila trithorax protein, as both have an evolutionarily conserved SET domain (Fig. 1a) [1, 2]. Knockout of Mll results in a loss of the expression of several posterior homeobox (Hox) genes [16, 17], similar to the consequences of the genetic ablation of trithorax in Drosophila [18, 19]. Hox genes are called “cellular memory” genes as their position-specific expression patterns are maintained throughout development [20, 21]. MLL is not required for the initial activation of Hox gene expression, but is necessary for their continuous expression during development [22]. Thus, MLL is considered a maintenance factor of cellular memory genes. Unlike most sequence-specific transcription factors, MLL is retained on the chromatin during mitosis for the efficient transcriptional activation of its target genes during the next G1 phase [23], thereby maintaining the expression of cellular memory genes for multiple cell divisions.

Fig. 1
figure 1

Models of gene activation by MLL and MLL fusion complexes. a Schematic structures of MLL and MLL fusion complexes. The domain structures responsible for various protein–protein interactions are indicated. hMBM high affinity MENIN-binding motif, LBD LEDGF-binding domain, RD repression domain, PHD plant homeodomain, HBM HCF-binding motif, AD activation domain, PS processing site, Win WDR5 interaction motif, IBD integrase-binding domain. Models of MLL protein complexes formed on the target promoter are shown on the right. The light blue rectangle indicates the interactions required for MENIN-dependent target recognition. The orange rectangle indicates possible combinations of the interactions required for MENIN-independent target recognition. b Expression of HSC program genes by MLL and MLL fusion proteins during myeloid differentiation

MLL is required for the expression of posterior Hoxa genes in the hematopoietic cell lineage [17, 24]. The expression of Hoxa genes is highest at the immature progenitor stage, such as multi-potent progenitors (MPPs) [25], but gradually declines as cells differentiate, and eventually diminishes during the terminally differentiated stages (Fig. 1b) [26, 27]. Posterior Hoxa genes facilitate the expansion of immature hematopoietic progenitors [28], suggesting that MLL drives the proliferation of immature hematopoietic cells by upregulating posterior Hoxa genes. Studies of Mll-knockout mice have confirmed its requirement in the hematopoietic lineage of both adult and fetal hematopoietic systems [17, 24, 29, 30]. Mll-deficient embryos produce a smaller population of hematopoietic stem cells (HSCs), MPPs, common myeloid progenitors (CMPs) and granulocyte–macrophage progenitors (GMPs) in the fetal liver than in the control [29]. MLL function appears to be most critical for hematopoietic progenitors that are actively expanding. Consistent with this notion, the effects of Mll deficiency are most prominent in situations where hematopoietic progenitors are forced to expand rapidly, for example, during the reconstitution of hematopoietic systems [24, 30]. Conversely, MLL is not essential for the homeostasis of more differentiated myeloid and lymphoid cells [24]. These observations indicate that MLL is required for the proliferation of immature hematopoietic progenitors.

MLL is a large protein (431 kDa) possessing histone methyltransferase (HMT) activity on histone H3 lysine 4 [31, 32]. Further biochemical analyses revealed that the SET domain is a mono-methyltransferase and its associated factors retain additional methyltransferase activity [33, 34]. MLL associates with WDR5 via the Win motif to recruit ASH2L/RBBP6 proteins that possess additional HMT activity (Fig. 1a). As part of this complex, MLL ultimately produces di-methylated histone H3 lysine 4 [33]. It should be noted that a weak tri-methyltransferase activity has been reported under certain experimental conditions [3436]; however, its molecular basis remains unknown. A knock-in allele lacking the SET domain of the Mll gene resulted in reduced Hoxc8 expression and a decreased amount of mono-methylated histone H3 lysine 4 modification at the Hoxc8 locus in embryos [37]. However, this mouse line was viable after birth, unlike other Mll-deficient mouse lines [16, 17, 29]. No differences in Hoxa9 expression and repopulating potential were observed in the cells of the LSK fraction (containing HSCs and MPPs) from SET domain-deficient mice compared to the wild-type animals [38]. These results suggest that the transcriptional activation of cellular memory genes is mostly independent of the HMT activity mediated by the SET domain. However, a gene knockout-rescue experiment on mouse embryonic fibroblasts or short interfering RNA-mediated knockdown of SET domain binders in 293T cells indicated a more important influence of HMT activity on MLL-dependent gene expression [31, 36]. The requirement for HMT activity of MLL on gene expression may be highly context-specific and therefore demands further studies.

After translation, MLL is proteolytically cleaved into two large polypeptides, MLLN and MLLC [39, 40]. The protease responsible for this processing is Taspase 1 [40]. MLLN and MLLC form an intra-molecular complex via the PHD fingers 1 and 4, the FYRN domain and the FYRC domain (Fig. 1a) [39, 40]. This intra-molecular interaction stabilizes the MLL polypeptides and therefore is a critical first step in the maturation of MLL proteins [26]. Analysis of knock-in alleles conducted by our research group showed no effects of mutations that abolish the proteolytic processing on Hox gene expression, whereas a mutation that disrupts the intra-molecular interaction resulted in a typical Mll-null phenotype. Thus, proteolytic processing of MLL does not have a major impact on Hox gene expression and proper development.

The intramolecular complex further associates with various cofactors to form a larger multiprotein complex [41]. The biochemically stable core complex of MLL is composed of MENIN, HCF1/2, ASH2L, RBBP6 and WDR5 (Fig. 1a). MENIN binds to MLL via the high-affinity MENIN-binding motif (hMBM) located at the N-terminal region of MLL [42]. HCF1/2 binds to MLL via its HCF-binding motif (HBM). ASH2L, RBBP6, and WDR5 associate with MLL via the SET domain and an adjacent Win motif [33, 41]. Histone acetyltransferases such as CBP/p300 have also been reported to bind to the activation domain of MLL as transiently associated factors [43]. Moreover, the cyclophilin Cyp33 binds to PHD finger 3 to induce conformational changes and recruit HDAC1 and polycomb group proteins such as BMI1, CtBP and HPC2 to the RD1 and RD2 domains of MLL to switch off transcription of MLL target genes [4446]. Knockdown of MENIN resulted in reduced HOXA9 expression in HeLa cells [41], and MENIN-deficient mouse embryonic fibroblasts showed decreased Hoxc8 expression [47], indicating that MENIN is a critical cofactor of MLL for the maintenance of cellular memory.

Recent genetic analysis of the MLL-deficient and MENIN-deficient mouse lines indicated that MLL recognizes its target genes via multiple pathways, of which one is MENIN-dependent and the others are MENIN-independent [48]. The MENIN-dependent pathway appears to be responsible for the maintenance of cellular memory and is likely shared by MLL fusion proteins. The MENIN-independent pathways have not been fully elucidated. One report demonstrated that an MLL mutant protein composed of the CXXC domain and PHD finger 3, the latter of which binds to di-/tri-methylated histone H3 lysine 4, associated with the HOXA9 promoter despite the lack of the MENIN-binding motif [49], indicating that MLL can bind to certain chromatin regions without MENIN. Hence, MENIN-independent targeting pathways likely involve the binding capacities of MLL to di-/tri-methylated histone H3 lysine 4, via PHD finger 3 (Fig. 1a).

In summary, wild-type MLL maintains the expression of cellular memory genes through a MENIN-dependent mechanism to support the expansion of immature hematopoietic progenitors. It likely activates transcription through the activation domain. However, the contribution of HMT activity remains elusive. There appear to be alternative pathways in which wild-type MLL activates gene expression through a MENIN-independent mechanism.

MLL fusion proteins constitutively activate HSC program genes

MLL fusion proteins constitutively activate genes that promote self-renewal of HSCs [50]. Analysis of gene expression profiles identified a subset of genes, such as posterior Hoxa genes and Meis1 that are highly expressed in MLL-rearrangement leukemia cells [27, 50]. Such genes exhibit high levels of expression in the HSC fraction in physiological settings. The constitutive expression of Hoxa9 and Meis1 in hematopoietic progenitors induces leukemia in vivo [28]. The retroviral transduction of MLL fusion genes into immature hematopoietic progenitors results in the constitutive expression of HSC program genes (Fig. 1b) and continuous cell growth in the presence of myeloid-lineage cytokines [6, 51]. These cells transduced with MLL fusion genes cause leukemia in vivo when transplanted into a syngenic mouse [6, 27]. This mouse leukemia model enabled structure/function analysis of MLL fusion proteins to determine the functional requirements for leukemic transformation.

Mechanisms of target recognition by MLL fusion proteins

MLL fusion proteins and wild-type MLL regulate a common set of HSC program genes in vivo [17, 24, 48]. Therefore, it was predicted that the MLL portion of MLL fusion proteins, which is commonly retained in both wild-type MLL and MLL fusion proteins, is sufficient for targeting a subset of HSC program loci (Fig. 1a). A structure/function analysis revealed that hMBM and the CXXC domain are required for the transforming ability of MLL fusion proteins [42, 52, 53]. The MLL fusion/MENIN complex further associates with LEDGF through the structures of both MLL and MENIN [54]. LEDGF contains a PWWP domain, which binds to nucleosomes [55]. Further structure/function analysis of MLL-ENL revealed that only three domains of the MLL-ENL complex, the PWWP domain, the CXXC domain, and the ANC1 homology domain (AHD) of ENL, are necessary and sufficient for leukemic transformation ex vivo and in vivo [56]. Therefore, the PWWP and CXXC domains comprise the minimum targeting module of MLL fusion proteins.

The PWWP domain of LEDGF specifically recognizes the tri-methylated histone H3 lysine 36 (H3K36me3) in vitro [57], as does the PWWP domain of BRPF1 [58]. It also binds non-specifically to DNA [57], similar to the PWWP domain of HDGF [59]. Indeed, biochemical purification of PWWP domain-bound nucleosomes demonstrated a relatively high amount of H3K36me3 although the di-methylated histone H3 lysine 36 (H3K36me2) was predominant [56]. The nucleosomes bound to the minimum targeting module comprising the PWWP and CXXC domains contained mostly H3K36me2 and a very low frequency of H3K36me3, indicating that (1) the PWWP domain associates not only with H3K36me3 but also with H3K36me2 in vivo and (2) the MLL target chromatin is mostly H3K36me2-positive and not H3K36me3-positive. H3K36me2 is a relatively abundant modification that is slightly enriched in the promoter-proximal region [60]. Consistent with this observation, MLL fusion proteins preferentially associate with the promoter-proximal regions [56] rather than the gene body region, which is known to be enriched with H3K36me3 [61]. The CXXC domain of MLL binds to non-methylated CpGs [6264]. A genome-wide DNA methylation analysis showed an enrichment of non-methylated CpGs in the promoters of MLL target genes and a depletion of methylated CpGs at the loci [56]. Therefore, MLL target chromatin loci are characterized by the presence of non-methylated CpGs and H3K36me2/3 modifications. This chromatin context is typical of previously-active gene promoters. Hence, MLL fusion proteins represent a transcriptional machinery that re-activates previously active CpG-rich genes, thereby constitutively upregulating previously active HSC program genes to induce leukemia.

It should be noted that additional interactions are involved in the targeting mechanisms of MLL fusion proteins. It has been reported that the association with the PAF1 complex is necessary for the target recognition by MLL fusion proteins [49, 65] (Fig. 1a). MENIN has two stretches of positively charged amino acids in its carboxyl region that bind DNA in a sequence-independent manner [66]. Moreover, several lines of evidence suggest that wild-type MLL, but not its HMT activity, is required for the targeting of MLL fusion proteins [38, 49, 67]. These additional interactions/factors likely contribute to proper targeting of MLL fusion proteins.

Mechanisms of constitutive activation by MLL fusion proteins

As the target recognition ability is conferred by the MLL portion, transcriptional activation is thought to be mediated through the fusion partner portion. To date, more than 70 fusion partners of MLL have been reported [4]. It appears that mechanisms of transcriptional activation vary depending on the fusion partners. The mechanisms employed by various fusion partners can be sorted into at least four different categories.

MLL-AEP

Although the fusion partners of MLL are diverse, the majority of leukemia cases are caused by fusion with a component of the AF4 family/ENL family/P-TEFb complex (AEP) [68]. The AF4 family is composed of AF4 (AFF1), AF5q31 (AFF4), LAF4 (AFF3) and FMR2 (AFF2) (Fig. 2a). Three of these proteins (AF4, AF5q31, and LAF4) have been reported to form fusions with MLL in leukemia cases. The ENL family consists of ENL (MLLT1) and AF9 (MLLT3), both of which are frequently fused to MLL in leukemia cases (Fig. 2b). The P-TEFb complex is composed of CDK9 and Cyclin T1/2 and is known to phosphorylate the serine 2 residue of the heptapeptide repeat of the C-terminal domain of RNAP2 [69]. Because the AEP complex contains the P-TEFb transcription elongation factor and also binds to ELL family proteins that retain transcriptional elongation activity [70], it is referred to as the super elongation complex [71]. The AEP complex is involved in various biological processes, such as heat shock response and viral transcription by facilitating transcriptional elongation [7173]. MLL-AEP fusion proteins form an MLL/AEP hybrid complex on the target chromatin (Fig. 2a) [68]. MLL-AF5q31 activates Hoxa9 and transforms hematopoietic progenitors through the C-terminal homology domain (CHD), which is a binding platform for AF4. Therefore, the constitutive recruitment of the AEP complex components appears to be the primary mechanisms, by which MLL fusion proteins activate Hoxa9. MLL-ELL, which is a relatively common MLL fusion in therapy-related AML, should also be a part of this group, as it also associates with the AEP complex [71].

Fig. 2
figure 2

Models of leukemic transformation by MLL-AEP fusion proteins. a Schematic structures of AF4 family proteins and MLL-AF4 family fusion proteins. NHD N-terminal homology domain, ALF AF4/LAF4/FMR2 homology domain, pSER poly-serine, A9ID AF9 interaction domain, CHD C-terminal homology domain, AHD ANC1 homology domain. A model of the MLL-AF4 family complex formed on the target promoter is shown on the right. The orange rectangle with a broken line indicates the minimum structure required for the transformation of myeloid progenitors ex vivo. b Schematic structures of ENL family proteins and MLL-ENL family fusion proteins

MLL-ENL and MLL-AF9 transform through AHD, which is the AF4 binding platform (Fig. 2b). However, AHD is also the binding platform for the DOT1L histone methyltransferase. ENL forms a complex with DOT1L and AF10/AF17 in a mutually exclusive manner with respect to AF4 family proteins [68, 74, 75]. An MLL-AF9 mutant carrying L504P/D505P substitutions, which has a reduced binding capacity to AF4 family proteins but a normal binding capacity to DOT1L, failed to transform hematopoietic progenitors [75], indicating that recruitment of AF4 family proteins but not DOT1L is critical for the transformation. Thus, MLL-ENL family fusion proteins are also categorized into the group of fusion partners in which the constitutive recruitment of the AEP components activates the HSC program genes. However, genetic ablation of DOT1L or AF10 resulted in the loss of MLL-AF9 transforming ability [7679], indicating that the DOT1L complex likely plays an important role in facilitating MLL-ENL family-dependent transformation. Furthermore, AHD is also a binding platform for CBX8, a component of the Polycomb repressive complex 1 (PRC1) [74]. It has been reported that antagonizing the transcriptional repressor activity of PRC1 [80] or the recruitment of TIP60 [81] through the CBX8 association facilitates leukemic transformation. These additional functions mediated by AHD likely promote the ability of MLL-ENL family fusion proteins to drive oncogenesis, which probably explains why the ENL family is frequently targeted for gene rearrangement.

MLL-AF10 family

AF10 (MLLT10) is a relatively frequent fusion partner and accounts for ~8 % of cases of MLL-rearranged leukemia [4]. AF10 has a homolog called AF17 (MLLT6) that is also a fusion partner of MLL (Fig. 3). Biochemical purification revealed that AF10 forms a complex with DOT1L and the ENL family proteins as well as TRRAP, SKP1 and β-catenin [82]. Structure/function analysis revealed that the DOT1L interaction domain of AF10 was responsible for MLL-AF10-dependent transformation [83]. Genetic ablation of DOT1L resulted in loss of the transforming ability of MLL-AF10 [83, 84]. Okada et al. [83] reported that an artificial protein in which the MLL portion was fused to the catalytic domain of DOT1L could transform hematopoietic progenitors. In theory, this implies that the recruitment of HMT activity of DOT1L is sufficient to cause constitutive activation of MLL target genes. However, our group has been unable to reproduce these results [68]. Moreover, the aforementioned MLL-AF9 L504P/D505P mutant, which retains the binding capacity to DOT1L, did not transform myeloid progenitors [75]. It is likely that the recruitment of HMT activity on its own confers only a weak capacity (if any) to activate MLL-target genes. Other factors/functions of the DOT1L complex may be required for the full transforming ability of MLL-AF10 family proteins. Although the molecular role of DOT1L in MLL-associated leukemia remains to be elucidated, these discoveries facilitated the development of a molecularly targeted drug for MLL-associated leukemia by inhibiting the DOT1L enzymatic activity [85].

Fig. 3
figure 3

A model of leukemic transformation by MLL-AF10 family proteins. Schematic structures of AF10 family proteins and MLL-AF10 family fusion proteins. OM octapeptide motif, LZ Leucine zipper. A model of the MLL-AF10 family complex formed on the target promoter is shown on the right

MLL active form mimicry

The CBP/p300 family is fused with MLL in rare cases of MLL-associated leukemia (Fig. 4). MLL-CBP transforms hematopoietic progenitors through the bromo and histone acetyltransferase domains of CBP [86]. The CBP portion appears to enhance the propagation of acetylated histones at the MLL target chromatin. The native MLL associates with the CBP family protein via its activation domain [43] (Fig. 1a). However, the CBP family protein was not found in the core complex of native MLL [41], indicating that MLL associates with CBP in a context-dependent manner. It is likely that MLL-CBP family fusion proteins mimic the active form of the natural MLL complex.

Fig. 4
figure 4

A model of leukemic transformation by MLL-CBP family proteins. Schematic structures of CBP family proteins and MLL-CBP family fusion proteins. HAT Histone acetyltransferase domain. A model of the MLL-CBP family complex formed on the target promoter is shown on the right

AFX and its homolog FKHRL1 are both members of the forkhead family of transcription factors containing a highly conserved forkhead DNA-binding domain [87]. The conserved region 3 (CR3) of the forkhead family protein specifically associates with the CBP family proteins to activate transcription and is required for MLL-AFX-dependent transformation. These results indicate that mimicry of the active form of native MLL is a common mechanism employed by MLL fusion proteins.

MLL-dimerization domain

The most enigmatic mechanism of MLL fusion-dependent transformation occurs via dimerization mediated by the fusion partner portion. Nevertheless, this probably explains the high number of MLL fusion partner genes [4]. Structure/function analyses of the MLL-GAS7 and MLL-AF1p fusion proteins indicated that dimerization domains in the fusion partner portion confer transforming abilities [88]. Moreover, artificial MLL fusion proteins, in which MLL was fused to a ligand-dependent dimerization domain, transformed hematopoietic progenitors in a dimerization-dependent manner [89, 90]. The dimerization domain causes duplication of an MLL portion in a single complex. Therefore, partial tandem duplication mutations of MLL, in which a portion including the CXXC domain and its adjacent regions is duplicated in-frame, may fall into this category [89].

The most frequent fusion partner of this class is AF6 (MLLT4) (Fig. 5), which accounts for ~4 % of all MLL-associated leukemia cases [4]. The RA1 domain of AF6 has been demonstrated to be responsible for the transformation and dimerization [91]. Other fusion partners, such as SEPT6 and GPHN, also confer transformation abilities via their oligomerization domains [92, 93]. However, the molecular mechanism by which the dimerization of MLL fusion proteins causes transformation remains elusive. One clue is that MLL-AF6 does not directly binds to the AEP or DOT1L complex, but co-localizes with them at the target chromatin region [68]. Knockdown or knockout of the components of the AEP and DOT1L complexes in MLL-AF6-transformed cells resulted in reduced transforming ability [68, 94]. These observations suggest that MLL fusion proteins with a dimerization domain activate transcription through the recruitment of the AEP and DOT1L complexes via unknown mechanisms.

Fig. 5
figure 5

A model of leukemic transformation by the MLL-AF6 fusion protein. Schematic structures of AF6 and MLL-AF6. RA RAS association domain, PDZ PSD-95/Dlg/ZO-1 domain. A model of the MLL-AF6 complex formed on the target promoter is shown on the right

Concluding remarks

Much information regarding the molecular mechanisms of MLL-associated leukemia has been gained in the past decade. Along the way, various key protein–protein interactions required for leukemogenesis have been identified. Based on these data, compounds that specifically inhibit the formation of the MLL fusion protein complex such as MLL-MENIN interaction inhibitors [95] have been developed. It will be particularly exciting to see if these compounds can be translated into clinical applications to benefit leukemia patients. One of the unique features of MLL-associated leukemia is the diversity of MLL fusion partners. Based on the current understanding, I categorized these partners into four groups based on the molecular mechanisms employed. However, there are many gaps that need to be filled in each category. Hopefully, the next decade of research in this area will contribute to a better understanding of the molecular mechanisms underlying this disease.