Introduction

Chronic lymphocytic leukemia (CLL) is a lymphoproliferative disorder characterized by the accumulation of mature clonal CD5+ B lymphocytes in blood, bone marrow and lymphoid tissues. These differentiated B cells present a specific immunophenotype characterized by low surface membrane immunoglobulin levels as well as co-expression of the B cell antigens CD23 and CD19. Outstanding features of CLL are its high prevalence in Western adults and its clinical heterogeneity. Most CLL patients are asymptomatic at the time of diagnosis, but further progression can drive either to a severe and refractory CLL form with very poor prognosis or to an essentially indolent disease with a life expectancy similar to that of the normal population [15].

The first staging system for CLL prognosis, the Binet-Rai classification scheme designed in the 1970s, was based on clinical features [6, 7]. Even when this system is still widely maintained nowadays, new elements have been incorporated for the prognostic evaluation of CLL patients. This is the case of certain biomarkers such as the serum levels of β2-microglobulin and thymidine kinase [8], or the expression of CD38 [9], CD49d [10] and ZAP70 [11]. Moreover, the karyotypic aberrations del13q14, trisomy 12, del11q22-q23 and del17p13, which are recurrent in CLL, can be used to define prognostic groups with decreasing life expectancy [12]. Additionally, around 60 % of CLL cases undergo somatic hypermutation (SHM) in the variable region of their immunoglobulin heavy chain (IGHV) genes in the germinal center (GC) [9, 13]. In general, these IGHV-mutated CLL (mCLL) patients represent an indolent form of CLL, whereas the remaining cases, collectively termed unmutated-IGHV CLL (uCLL), usually display an aggressive disease form, characterized by poor outcomes and frequent refractoriness to chemotherapy [9, 14]. This finding set the basis for the definition of two relevant molecular subtypes of CLL that are characterized by very distinct prognosis.

The correlation between IGHV status and CLL severity has been one of the landmarks in the identification of genetic features with prognostic value, but in spite of this, the characterization of the genomic alterations and molecular mechanisms underlying CLL development and progression have remained largely elusive to the traditional molecular biology approaches. In recent years, the implementation of next-generation sequencing (NGS) techniques has dramatically broadened our understanding of the genomic and epigenomic landscapes of CLL, providing an unprecedented amount of biological information that is expected to open new and improved therapeutic avenues (Fig. 1). In this review, we will summarize the most relevant breakthroughs achieved recently by these means and how they translate into CLL biology. We will finally discuss the clinical perspectives in the light of the new findings, also from the perspective of CLL as a paradigm for personalized medicine.

Fig. 1
figure 1

Overview of next-generation DNA sequencing methods. Whole genome sequencing requires shearing the sample, preparing libraries from the fragments, obtaining short reads and aligning those reads to the reference genome (middle). For whole-exome studies, the sheared samples are hybridized to biotinylated, exon-specific probes. The hybridized molecules, containing exonic sequences, are purified with streptavidin-coated beads (left). Finally, samples can be pre-treated with bisulfite to study the methylation status of cytosines (right)

Chromosomal aberrations in CLL

In spite of the relative karyotypic stability of CLL cells, around 80 % of the patients carry chromosomal aberrations [5, 12]. One of these defects, del13q14, is found in 50–60 % of the cases. This deletion abolishes the expression of the micro-RNA genes miR15-a and miR16-1—the first micro-RNA alteration ever associated with cancer [15]—, which regulate B cell apoptosis and cell cycle, and participate in a miRNA/TP53 feedback system that also involves regulation of ZAP70 expression [1618]. Del13q14 is considered as an early CLL-founding event and is usually associated with a favorable prognosis. On the contrary, chromosomal defects such as del11q22-q23 and del17p13 are linked to dismal prognosis and chemo-refractoriness [19, 20], as they disrupt the expression of ATM and the tumor suppressor TP53, respectively [12, 2124]. In some cases, del11q22-q23 affects the BIRC3 locus leaving ATM intact, instead [25]. Notably, ATM [21, 22, 26], TP53 [23, 24, 27] and BIRC3 [28, 29] are also frequently affected by inactivating mutations in CLL.

Depicting the mutational repertoire of CLL by next-generation sequencing

The advent of NGS techniques for massively parallel genome sequencing, together with the development of powerful dedicated bioinformatic tools, has propelled the cost- and time-effective comprehensive analyses of cancer genomes [30, 31]. These unbiased approaches allow the comparison of tumor-normal tissue matched pairs either for whole-genome, whole-exome, transcriptome (RNA-seq) or epigenome studies [32, 33] (Fig. 1). A seminal NGS-based CLL study, based on whole-genome sequencing of four patients (two uCLL and two mCLL) uncovered NOTCH1, MYD88, XPO1 and KLHL6 as bona fide recurrent CLL drivers upon validation by targeted sequencing of additional 169 samples [34]. Only NOTCH1 mutation, present in 12 % of the cases and additionally confirmed in a parallel study [28], had been previously proposed as a CLL-related event [35]. Further studies have defined the presence of these NOTCH1 activating mutations as an independent prognostic factor associated with shorter survival and higher risk of Richter transformation [36, 37]. Gain-of-function mutations in the Toll-like receptor adaptor protein MYD88 have been previously reported in diffuse large B cell lymphomas and nearly all cases of Waldeström’s macroglobulinemia [38, 39]. On the other hand, a recent report has shown promising therapeutic effects of selective inhibition of the nuclear export protein XPO1 both in vitro and in the TCL1 mouse model [40].

The CLL mutational landscape was readily expanded by independent whole-exome analyses of two larger cohorts, thus uncovering several recurrently mutated genes such as CHD2, POT1, FBXW7, DDX3X and BIRC3, and also confirming previous reports describing TP53 and ATM as CLL drivers [41, 42]. Importantly, both studies highlighted recurrent mutations conspicuously affecting the C-terminal HEAT-repeat domain of SF3B1, a component of the U2 snRNP spliceosome, in 10–15 % of the cases. SF3B1 mutations are predicted to induce aberrant splicing in specific genes [4345], and have also been found to be significantly associated with aggressive forms of the disease [46]. Some other candidates have been further validated by functional studies; this is the case of POT1, the first member of the telomere-binding shelterin complex ever described as a cancer driver gene. POT1 mutations generate aberrations linked to defects in telomeric ends and are only found in mCLL cases, which is consistent with its association with more aggressive forms of the disease [47]. Moreover, loss-of-function mutations in the sucrase isomaltase gene (SI), found as frequently mutated in a cohort of 105 cases, have been proposed to participate in metabolic reprogramming of CLL cells [48]. Truncating mutations in DDX3X have been associated with poor clinical prognosis and relapse [49], whereas those in BIRC3 seem to be responsible of TP53-independent fluradabine chemo-refractoriness [25, 50].

These and other results illustrate the impressive leap forward provided by the use of NGS techniques in our description of the catalogue of somatic mutations involved in CLL pathogenesis. Nevertheless, the genetic information uncovered by these means has also left some open questions that will need to be answered to develop successful antitumor therapies for any CLL case.

Genetic heterogeneity of the CLL genome

The comprehensive description of CLL genomes has uncovered a landscape of low mutational rates, with a reduced number of non-synonymous mutations in gene-coding regions when compared with other tumors. The mutational catalogue of CLL shows an outstanding inter-sample genetic heterogeneity, with few top recurrently mutated genes presenting mid/low frequencies around 12-15 %, followed by a larger group of genes found at low frequencies (2-5 %) [2, 4, 51, 52]. More importantly, around a third of CLL cases cannot be explained by the presence of any of the 50 most recurrently mutated driver genes, thus limiting the development of novel targeted anti-CLL therapies [4]. Therefore, functional studies that define the cellular mechanisms and pathways altered by the identified mutations are warranted. This should help to clarify if CLL is either originated by changes in many different cellular functions or by mutually exclusive mutations affecting genes that belong to a limited number of common biological pathways. The latter has already been proposed for mutations affecting RNA processing genes [44], the Wnt/β-catenin activation pathway [53], and the Toll-like receptor/MYD88 pathway [54]. In fact, many mutations seem to affect genes belonging to a certain number of pathways, which are differentially enriched in the two CLL molecular subtypes defined by the IGHV mutational status (e.g. uCLL and mCLL). Thus, genes belonging to the DNA damage response and cell cycle control (TP53, ATM, POT1, BIRC3), mRNA splicing, processing and transport (SF3B1, U2AF2, XPO1, DDX3X), and NOTCH signaling (NOTCH1, FBXW7) happen to be more frequently mutated in uCLL cases; on the contrary, genetic lesions in genes participating in the innate inflammatory response (MYD88, TLR2, MAPK1) seem to be more specific of the mCLL subtype [2, 51, 52]. The functional validation of these candidate genes in the neoplastic process seems to be mandatory to sort out those that are affected by “passenger” mutations—lacking any pathological effect—from true CLL drivers.

Valuable information about unanticipated or unknown functional relationships between mutated genes could be also provided through the evaluation of the genetic patterns of mutual exclusion and/or co-segregation. Thus, sequential mutations in genes belonging to the same pathway are not generally expected to co-segregate, as secondary mutational events will not provide any further evolutionary advantage. On the other hand, a synergistic effect caused by co-segregating mutations that act on different cellular functions could have a positive impact on the fitness of the cells harboring these alterations [2, 4]. Nevertheless, assessing the true nature of these patterns is not an easy task, hampered by the low mutational frequencies and the lack of sensitivity of standard NGS approaches, which prevents the identification of small subclonal populations [4, 55]. Nevertheless, recent landmark studies have successfully tackled the issue of the subclonal nature and evolution of CLL.

Intra-tumor heterogeneity and clonal evolution in CLL

The first evidences showing that individual CLL samples can be genetically heterogeneous and harbor different subclonal populations where obtained by fluorescence in situ hybridization (FISH)-based cytogenetic analyses [12, 5658] and SNP microarrays [59, 60], even revealing different clonal compositions between samples taken at the time of diagnosis and after relapse. NGS approaches, which provide a significant improvement in sequence coverage and sensitivity, were employed to study CLL clonal evolution for the first time through the assessment of three patients along repeated cycles of chemotherapy [61]. This study also reported changes in clonal population dynamics as a result of multiple rounds of chemotherapy. Clonal evolution patterns were observed to be highly heterogeneous, ranging from stable equilibrium of up to five simultaneous subclones along the process, to the complete replacement of the dominant clone by a minor one [61]. In a landmark study, Landau et al. used a powerful combination of whole-exome deep sequencing, assessment of allelic fraction, local copy number and cancer cell sample purity to discriminate clonal and subclonal genetic alterations in a large cohort of 149 CLL samples [52]. This confirmed the existence of both types of mutations in 146 cases. Clonal mutations are present in all tumor cells, which are either founder alterations or early events that were followed by selective sweep that removed any other subpopulations. By contrast, subclonal mutations represent events further acquired along the course of the disease. Importantly, the number of subclonal mutations was significantly higher in treated patients and the number of these mutations increased with the number of prior therapies. This approach allows the definition of a chronological order for the emergence of driver mutations along disease progression. Thus, mutations in MYD88, del13q14, and trisomy 12 were defined as clonal, whereas alterations in TP53, ATM and SF3B1 are late events and likely relevant elements in disease progression. The availability of 18 longitudinally assessed cases confirmed changes in population dominance, showing enhanced rates of clonal evolution in treated patients. More importantly, cytotoxic therapy seems to eliminate dominant clones, facilitating the expansion of fit subclones harboring an increased number of driver genes. Accordingly, the presence of subclonal drivers was found to be an independent prognostic factor predicting poor clinical outcome [52].

Treatment-associated clonal evolution could explain some of the discrepancies found in the estimated mutational frequencies for CLL drivers in different cohorts. Thus, the most frequently mutated genes in the 91 cases studied by Wang et al. [42] were SF3B1, TP53 (both 15 %) and ATM (9 %), whereas in the cohort analyzed by Quesada et al. [41] these genes were mutated in 10, 1 and 4 % of their 105 patients, respectively. Moreover, the most frequently mutated driver in the latter study was NOTCH1 (12 %), in stark contrast with the 4 % found in the Wang study. These differences in mutation rates seem to be mirroring the distinct clinico-biological features that characterized both cohorts. Thus, the patients in the study by Quesada et al. were mostly untreated, recently diagnosed or in early stages of the disease. In addition to the enrichment in patients presenting adverse prognostic parameters, one-third of the cases from the second report were relapsed [4, 29, 62]. Notably, TP53, ATM and SF3B1 are known to be associated with CLL progression and chemo-refractoriness [23, 24, 27, 46, 50, 63], and seem to define highly fit clones that can expand over time upon chemotherapy-induced selective pressure. Moreover, small subclones harboring these mutations could be already present in the tumor sample in undetectable amounts at the time of diagnosis.

Indeed, the implementation of ultra-deep NGS analyses of the TP53 gene in 309 newly diagnosed patients, combined with a very robust bioinformatics algorithm, allowed the identification of extremely low abundance TP53 subclones (down to three TP53 cells per ~1000 wild type cells) in 9 % of the patients [64]. Patients harboring very small TP53 subclones displayed similar clinical features and equivalent poor survival rates than those with clonal TP53 aberrations. Longitudinal studies confirmed that small TP53 subclones end up being predominant after therapy and relapse and that they can anticipate chemo-refractoriness [64].

A very recent study performing longitudinal analyses of 12 CLL cases treated homogeneously and longitudinally assessed at equivalent stages confirmed that clonal evolution in CLL can be very heterogeneous, from single clones that are stable over time, to switching-dominance relationships between subclones before and after treatment [65]. An outstanding finding in this study was the description of convergent evolution in two cases that harbored different mutations affecting the same genes in distinct subclones, with one case switching dominance of two major subclones along the course of the disease, while the second case experienced an almost complete clonal replacement over time. This phenomenon involved different mutations in NOTCH1, DDX3X, SF3B1 and BIRC3, as well as del11q23 [65]. All in all, these studies highlight the need for a systematic and comprehensive characterization of the relationship patterns and the evolution of subclonal genetic alterations over time and under different microenvironmental conditions to define their association with CLL progression, relapse and chemo-refractoriness.

The epigenomic landscape of CLL

It is widely recognized that epigenetic reprogramming plays a key role in cancer development. In this regard, the most studied modification is the methylation of the cytosine residue of CpG pairs to render methylcytosine. The usual epigenomic landscape in neoplastic transformation is defined by general hypomethylation combined with local hypermethylation and often with overexpression of DNA methyltransferases (DNMTs) [66], which leads to genomic instability and oncogene activation [67, 68]. Quantitative studies indicated global hypomethylation of the CLL genome, mostly in repetitive genomic sequences [69, 70]. Further studies have uncovered expression changes through aberrant methylation of specific CLL-associated genes, such as BCL2 [71], TCL1 [72], DAPK1 [73] and NOTCH1 [74]. Interestingly, the assessment of the methylation status of specific CpG pairs in ZAP70 has been proposed for clinical prognostic purposes, as they are associated with ZAP70 expression, activity and clinical outcome [7577]. Also, several works have described altered regulation of Wnt signaling through aberrant methylation of different genes of this CLL-relevant pathway [7881].

Aberrant promoter methylation has been shown to affect the expression of several miRNAs associated to CLL (reviewed by Cahill and Rosenquist [70] ), including miR15a/16-1 and miR29 [82], which are also epigenetically deregulated through histone deacetylation [83]. Moreover, promoter demethylation of the two lncRNAs located in the 13q14 deleted region (which also involves miR15a/16-1) appears to induce in cis down-regulation of neighboring tumor suppressor genes that modulate the NF-kB signaling pathway [84].

The introduction of recently developed high-content techniques has also allowed comprehensive genome-wide methylation studies. Hence, the use of the Illumina 450 k-array in CLL demonstrated stability of the DNA methylation status in CLL over time and showed differential methylation of known CLL prognostic genes, epigenetic regulators and components of different signaling pathways between uCLL and mCLL cells [74]. The combined use of whole-genome bisulfite sequencing (WGBS) (Fig. 1) and high-density microarrays have provided the broadest analysis of the methylome of CLL and normal B cells done so far [85]. This study reported global gene body hypomethylation, and despite the finding of a limited correlation between DNA methylation and gene expression, both positive and negative correlations were found between gene body methylation and gene expression levels. This work confirmed global methylation differences between uCLL and mCLL subtypes, mostly affecting enhancer regions. Importantly, these methylation changes resemble the differences found between naïve and memory B cells, thus suggesting the former as the cell of origin for uCLL, being memory B cells proposed as the precursors for mCLL (Fig. 2). In addition, a previously unknown intermediate CLL prognostic subtype, characterized by an intermediate DNA methylation profile and enriched in mCLL with lower IGHV hypermutation levels, was defined [85].

Fig. 2
figure 2

Schematic depiction of the tumorigenesis of CLL. The normal maturation of B cells is shown on top. According to their epigenetic patterns, both CLL subclasses are derived from different stages of this process through the acquisition of mutations and genomic aberrations affecting different genes. Some of these mutations are associated to either subclass (left and right), whereas others appear mutated in both (middle). Early events are shown in bold, and mutations associated with poor prognosis are shown in red

To delve deeper into the nature of CLL epigenomic alterations, recent efforts have aimed to describe intratumor DNA methylation heterogeneity. Reduced representation bisulfite sequencing (RRBS) analysis of 104 CLL patients and 26 normal B cell samples uncovered more than 50 % increase in intra-sample variability for methylation patterns in CLL cells that was found to arise from local variability within DNA fragments (i.e., variable methylation of the CpGs in individual fragments) [86]. Gene set enrichment analysis of genes consistently showing high levels of this ‘locally disordered methylation’ showed enrichment in promoters of TP53 target genes and genesets related to stem cell biology, suggesting a shift to ‘stemness’ in CLL. It was also shown that samples presenting high disordered methylation in promoters have a higher probability of harboring a subclonal driver mutation, being therefore associated with adverse clinical outcomes [86].

A deeper insight into the relationships between driver gene mutations and DNA methylation in CLL clonal evolution has been provided by the work of Oakes and colleagues [87], yielding information with a notable prognostic value. Global DNA methylation analyses performed by WGBS and high-density arrays indicate that patients with low intra-sample methylation heterogeneity usually belong to the clonal and stable mCLL/ZAP70-methylated subclass. On the other hand, some samples show above-median methylation heterogeneity levels associated with high-risk uCLL/ZAP70-unmethylated prognostic biomarkers. Many of these cases show evolution towards increased methylation heterogeneity levels, always associated with growing subclonal genetic complexity involving known driver genes such as TP53 and SF3B1. In spite that the mechanisms, interactions and forces driving co-evolution between genetic and epigenetic aberrations in CLL remain to be defined, this work supports the integration of DNA methylation heterogeneity assessment associated with driver genetic events into the prognostic framework defining indolent/aggressive CLL subtypes [87].

Clinical impact and future directions

Standard therapies to treat CLL patients have been mostly based on DNA-targeting drugs, such as fludarabine and cyclophosphamide. However, DNA-damaging chemotherapy results in the development of chemo-resistance in most of the cases, which has been initially attributed to the selection of driver mutations affecting genes of the DNA-damage response pathways, such as TP53 and ATM [1924, 64]. This has prompted the development of a ‘watch and wait’ strategy, by which therapy is only applied when the patients are symptomatic. The introduction of immunotherapy-based approaches [88, 89] and allogeneic transplantation [90] has improved the management of refractory and aggressive cases. In addition, the use of the BCR inhibitors ibrutinib and idelalisib in clinical trials has shown very promising effects [91, 92]. Unfortunately, and in spite of all the progress made, aggressive forms of CLL remain incurable. Now, with the unprecedented level of progress provided by NGS-based approaches, novel, and in some cases unanticipated drug targets have been uncovered. Several newly described drivers are also putatively involved in poor outcome, disease relapse and/or refractoriness, and several studies are validating the use of recurrently mutated genes such as SF3B1, NOTCH1, BIRC3 and TP53 for patient stratification and risk assessment [9397]. Moreover, the integration of some of these genetic markers in the current CLL prognostic classification has been proposed elsewhere [98].

The uncovering of intraleukemic genetic and epigenetic heterogeneity and the description of the Darwinian process of clonal evolution leading to the expansion of resistant subclonal populations upon chemotherapy-driven selection could call for a paradigm shift in the therapeutic approach to CLL. Hence, the ability to identify low abundance chemo-resistant subclones at diagnosis could help clinicians to apply standard chemotherapy combined with specific therapies targeting these highly fit clones in advance, in what has been branded as the ‘ABC (anticipation-based chemotherapy) approach’ [99].

The findings provided by the high-throughput genomic and epigenomic approaches are, therefore, opening new and promising avenues for CLL patient stratification, risk assessment and therapy, but nevertheless, the understanding of CLL biology is still far from being completed. Thus, the low-mutational recurrence and the high number of cases with yet undefined driver genes hamper the development of broad-spectrum treatments. CLL has been proposed as a paradigmatic model for personalized medicine, due to its prevalence, its unpredictable clinical course—now known to be defined by a high-genetic and epigenetic heterogeneity—and easy tumor sample acquisition through peripheral blood collection [4, 52, 100]. Nevertheless, further functional studies are needed to validate candidate drivers and to understand the physiological roles of those yet uncharacterized alterations. This will be fundamental to clarify if CLL is caused by the alteration of a few common biological pathways or on the contrary, can be generated by the alteration of multiple unrelated cellular functions. In the latter case, personalized approaches would be needed to face anti-CLL treatments, prompting to the application of routine genome and epigenome sequencing analyses [4, 51]. Also, additional cohorts with distinct clinico-biological characteristics are required to understand the functional relationships between genomic and epigenomic aberrations, and more importantly, for the definition of putative mutation patterns driving clonal evolution and chemo-resistance, in order to incorporate this wealth of information into new and efficient antineoplastic therapies.