An Entity Evolving into a Community: Defining the Common Ancestor and Evolutionary Trajectory of Chronic Lymphocytic Leukemia Stereotyped Subset #4
Patients with chronic lymphocytic leukemia (CLL) assigned to stereotyped subset #4 express highly homologous B-cell receptor immunoglobulin (BcR IG) sequences with intense intraclonal diversification (ID) in the context of ongoing somatic hypermutation (SHM). Their remarkable biological and clinical similarities strongly support derivation from a common ancestor. We here revisited ID in subset #4 CLL to reconstruct their evolutionary history as a community of related clones. To this end, using specialized bioinformatics tools we assessed both IGHV-IGHD-IGHJ rearrangements (n = 511) and IGKV-IGKJ rearrangements (n = 397) derived from eight subset #4 cases. Due to high sequence relatedness, a number of subclonal clusters from different cases lay very close to one another, forming a core from which clusters exhibiting greater variation stemmed. Minor subclones from individual cases were mutated to such an extent that they now resembled the sequences of another patient. Viewing the entire subset #4 data set as a single entity branching through diversification enabled inference of a common sequence representing the putative ancestral BcR IG expressed by their still elusive common progenitor. These results have implications for improved understanding of the ontogeny of CLL subset #4, as well as the design of studies concerning the antigenic specificity of the clonotypic BcR IGs.
Patients with chronic lymphocytic leukemia (CLL) assigned to stereotyped subset #4 are characterized clinically by an early age at diagnosis and an indolent disease course and molecularly by B-cell receptor immunoglobulins (BcR IGs) that exhibit a series of distinctive immunogenetic features (1,2). More specifically, they are IgG-switched (a rarity in CLL since the great majority of CLL clones, >90% of all cases, express IgM/IgD) and are composed of heavy chains encoded by the IGHV4-34 gene and light chains encoded by the IGKV2-30 gene (3, 4, 5). The antigen-binding sites of subset #4 are equally interesting, being composed of a variable heavy complementarity determining region 3 (VH CDR3) that is long and enriched in positively charged residues (reminiscent of pathogenic anti-DNA antibodies) (3,4). Anti-DNA is the most common specificity in autoreactivity, with DNA binding often acquired through surface-active basic amino acids; predominantly arginine (R) but also, to a lesser extent, lysine (K) (6, 7, 8). This point is worthy of note since the VH CDR3 of subset #4 is defined by a (R/K)RYY motif which is deemed to not only be “CLL-biased” but also exclusive to subset #4 as it has never been found outside this context (3,4). In addition, both the VH and variable kappa (VK) domains of subset #4 demonstrate a high impact of somatic hypermutation (SHM) and are remarkable for carrying shared (“stereotyped”) SHM, that is, identical changes at the same codon position of the variable domain (3,9).
Subset #4 is also outstanding due to intense intraclonal diversification (ID) within their IG genes in the context of ongoing SHM, alluding to an active, ongoing interaction with antigen(s) (10,11). Indeed, by conducting a large-scale longitudinal study of subset #4 we previously established: (i) the existence in most cases of distinct “clusters” of subcloned sequences; (ii) a hierarchical pattern of subclonal evolution, thus revealing which SHMs were negatively or positively selected overtime; and, (iii) subclonal drift, that is, temporal changes in the relative size of different clusters of sequences (12).
Nevertheless, this study only investigated clonal evolution at an individual case level and hence could not shed light on the clonal ancestry of subset #4 as a whole, which is relevant since the remarkable biological and clinical similarities of subset #4 cases strongly support derivation from a common ancestor. In an attempt to trace the ontogeny of subset #4, we here sought to revisit ID in subset #4 and reconstruct their evolutionary history by determining the structure of a community of related clones profiled at different time points for both IG heavy and light chains.
Materials and Methods
Peripheral blood samples were collected at multiple time points from eight CLL patients meeting the International Workshop on Chronic Lymphocytic Leukaemia (iwCLL) criteria; these eight patients, on the basis of both their IG gene sequence features and our previously established criteria, were assigned to subset #4 (1,3,4,13). Patients’ demographics and clinical and molecular data are summarized in Supplemental Table 1. Cases were analyzed over a six-year period (range 7 to 72 months, median 20 months) and no patient received treatment during sampling (Supplemental Table 1). The diagnostic sample was available, and called time point 1, for 6 of the 8 patients analyzed. No diagnostic samples were available for the remaining two patients (P0103 and P2451) and therefore the initial sample (time point 1) analyzed for these patients were 81 and 63 months post diagnosis, respectively. Written informed consent was obtained in accordance with the Declaration of Helsinki and the study was approved by the local ethics review committee.
PCR Amplification, Subcloning and Sequence Analysis of IGHV-IGHD-IGHJ and IGKV-IGKJ Gene Rearrangements
PCR amplification using the high-fidelity Accuprime Pfx polymerase (Invitrogen [Thermo Fisher Scientific, Waltham, MA, USA]), subcloning and sequence analysis and interpretation were performed as described previously. The sequence data evaluated herein has been reported previously (1,3,9, 10, 11, 12).
Visualization of Clonal Evolution in Subcloned IG Gene Sequences
Nonnegativity: d(s1, s2) ≥ 0;
Nondegeneracy: d(s1, s2) = 0 if and only if s1 = s2;
Symmetry: d(s1, s2) = d(s2, s1);
Triangle inequality: d(s1, s2) + d(s2, s3) ≥ d(s1, s3).
To explore the functional similarities of observed sequence changes, we followed the ImMunoGeneTics information system (IMGT) classification of the 20 common amino acids for the properties of hydropathy and chemical characteristics (https://doi.org/www.imgt.org/IMGTeducation/Aide-memoire/_UK/aminoacids/) and performed the following comparisons: (i) amino acid sequence distance including only replacement mutations; (ii) amino acid sequence distance when considering amino acids with similar physicochemical properties as single equivalent entities; (iii) amino acid sequence distance when considering amino acids within the same hydropathy group as single equivalent entities; and (iv) nucleotide sequence distance.
Focusing on both the VH and VK CDR3, hierarchical visualization was performed and by determining which nucleotide or amino acid had the highest probability of appearing at a certain position, a hypothetical VH and VK CDR3 sequence from which all subset #4 CDR3 sequences derive could be constructed. More specifically, a hierarchical tree structure comprised of nodes and branches was assembled. Within this structure, the root node corresponded to the derived (proposed) ancestral sequence, and the branches were determined based on the calculated optimal string distance of each node. The string distance of a node indicated its position from the root node and also its proximity to the other nodes.
Composite Clusters of Subset #4 IG Sequences: Convergent Patterns of Subclonal Evolution
Similar analysis of the IGKV-IGKJ amino acid sequences produced five clusters (Figure 1C). Two clusters were located within a very close distance, forming a more central core from which a further two clusters emanated. More specifically, the first cluster was formed by two patients (P2920 and P0907), while the second closely neighboring cluster contained the subcloned sequences from P0103 and P2451. Subcloned sequences from P3916 bridged these two clusters (Figure 1D). Clonal sequences from P1422 formed one of the two more distant clusters, while the other cluster was composed of clonal sequences from P1939. Patient P3020, previously found to carry limited ID despite bearing the highest SHM load (within both the IG heavy and light chain), was distanced from all other clusters. As with the cluster analysis of IG heavy chain sequences, we noted that individual IG kappa sequences occasionally were separated from their respective clusters and, instead, attached to clusters generated by other patients and located some distance away. Hence, the pattern of clustering evidenced from the kappa light chain sequences is analogous to that of their partner heavy chains, thereby reenforcing the idea that subset #4 essentially constitutes a community of related clones that follow closely similar ontogenetic and evolutionary pathways.
Clustering at the nucleotide level. Finally, clustering based on changes within IG heavy chain nucleotide sequences produced an individual and distinct cluster for six patients, while patients P0103 and P2451 remained clustered together (Figure 3A). Within the IG kappa chains, although the clusters generated shared similarities to cluster formation at the amino acid level, the two central cores were completely distanced from each other and, instead, a major cluster was formed by four patients (P0103, P2451, P3916 and P1939) (Figure 3B). Since these four patients all carry a 10-amino acid VK CDR3, the enhanced segregation of clusters observed at the nucleotide level is likely attributable to the additional three nucleotides that these sequences carry compared with cases carrying a 9-amino acid VK CDR3; thus accounting for three additional sequence changes as opposed to one at the amino acid level.
Taken collectively, this detailed computational reconstruction of CLL subset #4 clonal evolution based on merged IG sequence data for all eight cases (at either an amino acid or nucleotide level) reveals a convergent and unified tumorigenic evolutionary process. Thus, this framework is indicative of a “consensus path” of evolution for subset #4 cases with the branched evolutionary growth perhaps reflecting selective pressures honing their BcR affinities.
Tracing the Origins of CLL Subset #4: Molecular Phylogeny of CDR3 Sequences
Both the VH and VK CDR3s were visualized hierarchically with the aim of constructing a CDR3 sequence at both the nucleotide and amino acid level, which then could be considered as the root from which all subset #4 CDR3 sequences derive. Comparison of each CDR3 sequence to the derived root sequence, using the same algorithmic process applied throughout the entire variable domain, enabled us to identify the mutational path followed by each individual patient.
With regards to the VH CDR3, the derived root sequences for both nucleotide and amino acids, were GCG AGA GGC TAC GCG GAT ACA GCT GTG GTT AGG AGG TAC TAC TAT TAC GGT ATG GAC GTC and ARGYADTAVVRRYYYYGMDV, respectively. These sequences would have been created through the association of the IGHV4-34 and IGHJ6 genes with the IGHD5-18 gene in reading frame 1. Within these sequence strings, GGC TAC GCG (translation: GYA) and AGG AGG (translation: RR) cannot be assigned to the germline sequence of any IGHD and/or IGHJ gene, and thus would correspond to nontemplated regions (N1 and N2, respectively). Regarding the VK CDR3, the derived root nucleotide and amino acid sequences were ATG CAA GGC ACA CAC TGG CCC CCG TAC ACT and MQGTHWPPYT, respectively, and would have been created by the association of the IGKV2-30 and IGKJ2 genes.
CLL subset #4 lies at the intersection between autoimmunity and malignancy. The expression of IGHV4-34 endows B cells with the capacity to recognize the N-acetyllactosamine (NAL) antigenic epitope present in both self and exogenous antigens via a germline-encoded motif located within the heavy variable framework region 1 of the IGHV4-34 gene (22,23). This motif remains intact in all CLL subset #4 IG heavy chain sequences despite a heavy SHM load and intense ID (3,4,10). Notably, recombinant monoclonal antibodies from CLL subset #4 patients have been found to bind viable B cells, recognizing the NAL epitope present on B-cell CD45 (24,25). Additional features encoded in the subset #4 IG BcR sequence that hint at autoreactivity include: (i) the predicted high electropositivity of their long arginine-rich VH CDR3s, reminiscent of pathogenic anti-DNA antibodies; and (ii) the presence of recurrent SHMs typified by the frequent introduction of acidic residues, similar to edited anti-DNA antibodies (3,9).
The route to malignancy for CLL subset #4 clones may thus be a multifactorial phenomenon, beginning with autoreactive precursors that undergo positive selection by DNA, nucleosomes and/or surface structures of apoptotic cells (26,27). Thereafter, modifications introduced by SHM may curtail this autoreactivity, thus rendering these clones anergic (28, 29, 30), though still capable of reactivation either through their BcRs and/or other immune receptors, namely toll-like receptors (TLRs) (31, 32, 33, 34, 35). While this scenario bodes well for our understanding of the evolutionary pathway followed by subset #4 clones, despite much ingenuity and effort, our knowledge about the specific eliciting anti-gen(s) for subset #4 remains limited. Along these lines, it is relevant to mention that recombinant monoclonal antibodies derived from subset #4 patients lacked detectable reactivity with DNA, however, upon removal of SHMs (reversion to germline configuration), these antibodies regained the ability to strongly bind DNA (24). Nevertheless, owing to difficulties in defining the un-mutated progenitor rearrangement, mainly due to the extensive SHM present within subset #4 clones, the contribution made by the somatically generated CDR3s to auto-antibody specificity (24,25,36, 37, 38, 39) may have been underestimated, thus obscuring the actual antibody-antigen interactions (40, 41, 42).
In an attempt to clarify and enhance our understanding of the ontogeny of CLL subset #4 B cells, we sought not only to reconstruct the evolutionary history of subset #4 clones viewed as a single antibody lineage, that is, the sequence of changes introduced into the lineage during the development of the clone, but also to identify the common ancestral sequence from which all subset #4 cases are derived—a task hitherto unattainable due to the heavy SHM load within the antigen-binding sites. One means to obtain insight into the trajectory of subset #4 clones would be through characterization of their genetic sequence, with the greatest insight obtained from longitudinal sampling. Consequently, for this purpose, we drew on a community of related clones profiled at different time points, for both heavy and light chains, derived from 8 subset #4 cases (12). The Damerau-Levenshtein distance algorithm, in the form of a purpose-built computational tool, was applied and enabled us to infer both the unmutated ancestral rearrangement and the maturation intermediates, and hence gain further insight into the interplay between mutational constraints and selection on antigen-binding affinity.
Through this approach the focused evolution of subset #4, the evolution of single entities into a community of related clones, was clearly evidenced with most patient clusters found lying very close to each other due to a high degree of sequence relatedness. The branching observed within such clusters could perhaps reflect specific selective pressures that occurred in parallel in distinct subclones, as a means to fine-tune their BcR affinities. Importantly, exploring the evolutionary trajectory of subset #4 enabled us to suggest for the first time the common ancestral sequence from which all subset #4 cases likely descend. By determining the most probable sequence of mutations, the mutationally preferred pathway, the unmutated common ancestor (including the predicted VH CDR3) could be inferred, which could now serve as a template for antigen reactivity studies (which should better predict antigen specificities compared with previous studies). Defining the antigens bound by the CLL cells should aid in unraveling the path to malignancy in subset #4. We thus reason that knowledge of the subset #4 ancestral rearrangement could provide a blueprint for the resolution of crystal structures, which would not only further define structural characteristics of the #4 antibody, but also provide detailed molecular insights into the nature of contact sites between the antibody and antigen.
The tale of CLL subset #4 is truly intriguing; bestowed with autoreactive properties at birth, they fortuitously escape immunological tolerance and exist in an anergic state in the periphery, only to reemerge as immunocompetent cells (potentially due to dual engagement of the BcRs and TLRs). That said, the story is far from complete and unresolved issues relate to where and under what influence SHM (and also switching to the IgG isotype) occurs, and whether specific modalities of BcR/TLR collaboration and/or regulation may eventually impact on the biological behavior of the clones. Nevertheless, results from this study unveil new leads in the ontogeny of CLL subset #4 clones and bring fresh insights, which may directly impact the design of studies concerning the anti-genic specificity of the clonotypic BcR IGs. Although it is difficult to predict how revelations in biological understanding may translate into improved immunological interventions, it seems reasonable to think that once a detailed understanding of the B-cell ontogeny of CLL subset #4 is achieved, doors for therapeutic strategies may open, for example, the design of peptides that would inhibit or alter the consequences of antigen-antibody interactions.
The authors declare that they have no competing interests as defined by Molecular Medicine, or other interests that might be perceived to influence the results and discussion reported in this paper.
This work was supported in part by the Swedish Cancer Society, the Swedish Research Council, and the Lion’s Cancer Research Foundation, Uppsala; the ENosAI project (code 09SYN-13-880) cofunded by the EU and the Hellenic General Secretariat for Research and Technology; the KRIPIS action, funded by the Hellenic General Secretariat for Research and Technology and the European Regional Development Fund of the EU under the O.P. Competitiveness and Entrepreneurship, NSRF 2007–2013.
- 7.Li Z, Schettino EW, Padlan EA, Ikematsu H, Casali P. (2000) Structure-function analysis of a lupus anti-DNA autoantibody: central role of the heavy chain complementarity-determining region 3 Arg in binding of double- and single-stranded DNA. Eur. J. Immunol. 30:2015–26.CrossRefPubMedPubMedCentralGoogle Scholar
- 13.Hallek M, et al. (2008) Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the International Workshop on Chronic Lymphocytic Leukemia updating the National Cancer Institute-Working Group 1996 guidelines. Blood. 111:5446–56.CrossRefPubMedPubMedCentralGoogle Scholar
- 14.Dang QT, Phan TH. (2010) Determining Restricted Damerau-Levenshtein Edit-Distance of Two Languages by Extended Automata [Internet]. In: Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010 IEEE RIVF International Conference on; 2010 Nov 1–4; Hanoi. [cited 2015 Mar 31]. Available from: https://doi.org/ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5632914&queryText%3DDetermining+Restricted+Damerau-Levenshtein.
- 15.Gomez-Alonso C, Valls A. (2008) A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements. In: Modeling Decisions for Artificial Intelligence: 5th International Conference, MDAI 2008, Sabadell, Spain, October 30–31, 2008, Proceedings. Torra V, Narukawa Y (eds.) Springer, Berlin, pp. 134–45.CrossRefGoogle Scholar
- 21.Erciyes K. (2013) Minimum spanning trees. In: Distributed graph algorithms for computer networks. Sammes AJ (ed.) Springer, pp. 69–82.Google Scholar
- 25.Catera R, et al. (2006) Polyreactive monoclonal antibodies synthesized by some B-CLL cells recognize specific antigens on viable and apoptotic T cells. Blood. 108:2813.Google Scholar
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, and provide a link to the Creative Commons license. You do not have permission under this license to share adapted material derived from this article or parts of it.
The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this license, visit (https://doi.org/creativecommons.org/licenses/by-nc-nd/4.0/)