Immune receptor repertoires in pediatric and adult acute myeloid leukemia
Acute myeloid leukemia (AML), caused by the abnormal proliferation of immature myeloid cells in the blood or bone marrow, is one of the most common hematologic malignancies. Currently, the interactions between malignant myeloid cells and the immune microenvironment, especially T cells and B cells, remain poorly characterized.
In this study, we systematically analyzed the T cell receptor and B cell receptor (TCR and BCR) repertoires from the RNA-seq data of 145 pediatric and 151 adult AML samples as well as 73 non-tumor peripheral blood samples.
We inferred over 225,000 complementarity-determining region 3 (CDR3) sequences in TCR α, β, γ, and δ chains and 1,210,000 CDR3 sequences in B cell immunoglobulin (Ig) heavy and light chains. We found higher clonal expansion of both T cells and B cells in the AML microenvironment and observed many differences between pediatric and adult AML. Most notably, adult AML samples have significantly higher level of B cell activation and more secondary Ig class switch events than pediatric AML or non-tumor samples. Furthermore, adult AML with highly expanded IgA2 B cells, which might represent an immunosuppressive microenvironment, are associated with regulatory T cells and worse overall survival.
Our comprehensive characterization of the AML immune receptor repertoires improved our understanding of T cell and B cell immunity in AML, which may provide insights into immunotherapies in hematological malignancies.
KeywordsAcute myeloid leukemia T cell receptor repertoires B cell receptor repertoires Complementarity-determining region 3
Acute myeloid leukemia
B cell receptor
Chimeric antigen receptor
Complementarity-determining region 3
Clonotypes per kilo reads
Class switch recombination
Genomic Data Commons
Immune checkpoint blockade
Immunoglobulin heavy chain
Immunoglobulin kappa light chain
Immunoglobulin lambda light chain
Therapeutically Applicable Research To Generate Effective Treatments
The Cancer Genome Atlas
T cell receptor
Regulatory T cells
Acute myeloid leukemia (AML), caused by the abnormal proliferation of immature myeloid cells in the blood or bone marrow (BM), is the most common acute leukemia in adults and the second most common in children . For many years, the standard therapy for AML has been chemotherapy regimens with or without allogeneic hematopoietic stem cell transplantation . This strategy often induces complete remission, but a majority of patients will ultimately relapse and succumb to the disease [2, 3, 4, 5]. Advances in immunotherapies, particularly immune checkpoint blockade (ICB) and engineered T cells, have revolutionized cancer therapy in recent years [6, 7]. However, the treatment of AML with immunotherapies so far has been promising but very challenging . In contrast to the success of ICB therapy in many solid tumors, the only published phase I study of pidilizumab (anti-PD1) in AML showed peripheral blast reduction only in one out of eight patients . Though low mutational burden was considered the cause of low endogenous immune responses for ICB treatment in AML , the intrinsic resistance mechanisms of the leukemic blasts against immune responses remains poorly understood. In addition, due to the lack of specific target antigen, treatment with chimeric antigen receptor (CAR) T cells is still challenging for AML compared to the prominent effect of CAR T therapies targeting CD19/CD20 in B cell leukemia and lymphoma . Hence, better understanding of the interactions between AML malignant cells and the immune microenvironment has the potential to improve patient outcome and inform novel immunotherapy strategies for AML patients .
T cell and B cell are key components of the adaptive immunity. With the development of ICB therapy, the antitumor properties of infiltrating T cells have been well confirmed in many solid tumors such as melanoma and non-small cell lung cancer . Upon binding to tumor neo-antigens, cytotoxic T cells can eliminate the cancer cells . Though infiltrating B cells have been frequently observed in multiple tumor tissues [14, 15], their functional impact remains controversial [16, 17, 18]. The most variable region in the T cell receptor and B cell receptor (TCR and BCR, respectively) is the complementarity-determining region 3 (CDR3), which plays a key role in antigen recognition [19, 20]. Therefore, characterizing tumor TCR and BCR repertoires, particularly the CDR3s, is critical to understanding antigen recognition and tumor–immune interactions. Efforts have been made to study the tumor-infiltrating TCR or BCR repertoires using either targeted deep sequencing (TCR-seq or BCR-seq) or unselected RNA-seq data in many solid tumors [21, 22, 23, 24]. However, less is known about the immune repertoire changes in hematologic malignancies, and a systematic characterization of both TCR and BCR repertoires in the AML microenvironment is still lacking.
In this study, we characterized TCR and BCR repertoires in both pediatric and adult AML by detecting and analyzing the CDR3 sequences in TCR α, β, γ, and δ chains and B cell immunoglobulin (Ig) heavy (IgH) and light (IgL, IgK) chains from the RNA-seq data in AML patients and non-tumor donors. We investigated the clonal expansion patterns of T cells and B cells in the AML microenvironment and described the differences between AML and non-tumor samples. We also compared the differences between pediatric and adult AML samples and identified the association of tumor immune receptor repertoires with clinical outcome. These results provided insights into the immune receptor repertoires and T/B cell functions in AML.
In silico validation using single cell RNA-seq data
We previously developed a computational algorithm TRUST [22, 24, 25, 26] to extract TCR and BCR hypervariable CDR3 sequences from unselected bulk tumor RNA-seq data. In order to further validate the accuracy of our method for assembling TCR and BCR from RNA-seq data, we collected one SMART-seq dataset of CD45-positive white blood cells from 19 pre-treatment melanoma patients . For each patient, we merged the single cell RNA-seq (scRNA-seq) data of the CD45-positive cells into one “bulk” sample and applied TRUST to extract the TCR/BCR reads as if it were regular RNA-seq data. In the single cell data, all the T/B cells have been identified based on known gene markers, providing the true fractions of T/B cells in each merged “bulk” sample. We then estimated the T/B cell fraction in each “bulk” sample using the number of reads mapped to TCR/BCR region from TRUST divided by the total number of sequencing reads. Moreover, we followed the instructions by Sade-Feldman et al.  to reconstruct T and B cell receptors from all the identified T and B cells. Only cells with unique sequence on both chains (e.g., it has been reported in  that some T cells have two different alpha chains) were counted in the downstream analysis of single cell data. In order to estimate the T/B cell clonotype diversity from single cell data, we calculated the Shannon entropy using the frequencies of TCR β chain and IgH CDR3 amino acid sequences. Samples with fewer than two single T/B cells were excluded in this analysis. In the simulated “bulk” data, we applied CPK (TCR/BCR CDR3s per kilo of TCR/BCR reads)  to estimate the clonotype diversity of T/B cells.
Data collection and preprocessing
Our study investigated a total of 296 primary AML samples (Additional file 1: Table S1), including 145 pediatric samples from Therapeutically Applicable Research To Generate Effective Treatments (TARGET)  and 151 adult samples from The Cancer Genome Atlas (TCGA) . The RNA-seq reads in BAM files, gene expression read counts, and clinical data of all the AML samples were downloaded from Genomic Data Commons (GDC, https://portal.gdc.cancer.gov/, Jun 2017). RNA-seq reads have been previously aligned to hg38 human reference genome using STAR2  with the same parameters. As a control of the AML samples, RNA-seq data of 73 peripheral blood (PB) of non-tumor samples (Additional file 1: Table S2) were downloaded from Sequence Read Archive repository (SRA, https://www.ncbi.nlm.nih.gov/sra, PRJNA263846) and successfully processed using the GDC mRNA analysis pipeline (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline). The limited available clinical annotation on these normal samples only allowed categorical information such as male/female and children/adults to be parsed out. Since the maturity of the adaptive immunity is dependent on age, especially in early age, the pediatric AML samples were further divided into infants (0–3 years old, n = 37) and children (3–20 years old, n = 108) group in the downstream analyses. Control samples were not divided due to the lack of age information.
Detection and analysis of TCR and BCR CDR3 sequences from AML and non-tumor RNA-seq data
To characterize the immune receptor repertoires, we applied TRUST3.0.1 (https://bitbucket.org/liulab/trust) to all the AML and non-tumor RNA-seq samples. Formatted txt files with CDR3 calls were used in the downstream analyses, in which the est_lib_size column represents the number of reads mapped to TCR/BCR region. The number of total sequencing reads was obtained from each bam file using samtools , and those mapped to each variable (V), joining (J), or constant (C) genes were tallied in the “coverage.txt” file for each sample. The definition of the columns in these files was described in the TRUST documentation.
In order to compare the richness of TCR/BCR between AML and non-tumor samples, we normalized the number of CDR3s by the number of total sequencing reads and one minus blast percentage (pathologically estimated tumor purity) in each sample. The clonotype diversity of T/B cells was estimated by TCR/BCR CDR3s per kilo TCR/BCR reads (CPK)  in each sample. Complete CDR3 sequence was defined as CDR3 annotated with both V and J genes. γδ T cell fraction was estimated by the total number of γ or δ-CDR3s divided by the total number of TCR CDR3s in each sample.
To identify B cell lineage clusters in each sample, we extracted an octamer starting from the first position (not counting the starting "C") in each complete IgH CDR3 as motifs. All the IgH CDR3 sequences (either partial or complete) which contain amino acid matches to the motif with 0-1 mismatch (e.g., motifs RDMWLVGW and RDMWIVGW were considered matches) were collected. Each motif with 3 or more sequences was considered a B cell cluster. This approach provided flexibility in detecting amino acid changes from non-synonymous mutations, yet maintained low computational complexity.
Somatic hypermutation (SHM)  was defined as mismatches in B cell clusters. Mutations between two sequences with only one nucleotide mismatch were counted to avoid overestimation on SHM rate due the aggregated mutations during the B cell clonal expansion. SHM rate per sample was calculated as the SHM count divided by the total number of assembled CDR3 bases, which avoided the bias of unknown mutations outside partial CDR3 assembles. IgH CDR3 calls with unique isotype annotation were used in the isotype fraction and class switch recombination (CSR) analyses . Cooccurrences of unambiguously assigned different Ig classes or subclasses in the same IgH CDR3 cluster were considered as CSR. The number of CSR events was normalized by the total number of IgH clusters in each group, and samples with less than 10 unique IgH CDR3s were excluded from downstream analyses.
Wilcoxon rank-sum test was used to compare the differences between TCR/BCR CPK, γδ CDR3 fractions, and SHM rates among AML and non-tumor groups. Spearman’s rank correlation was used to check the association among αβ, γδ, or IgH and IgK/IgL CDR3 calls, and partial Spearman’s rank correlation was used to check the association between different Ig isotype fractions in the AML and non-tumor groups. Survival analyses were visualized using Kaplan–Meier curves, and the statistical significance was estimated using Log-rank test. Details for the other analyses were described in supplementary methods (Additional file 3).
In silico validation on the accuracy of TRUST for assembling TCR and BCR CDR3s from RNA-seq data
The overall approach in our study has been repeatedly validated in our previous work [22, 24, 25, 26]. In this study, we applied the same approach to investigate the potential functional roles of T/B cells in AML using a large number of publicly available RNA-seq samples. Here, we also performed in silico validation on the accuracy of our method for assembling TCR and BCR from RNA-seq data by using publicly available scRNA-seq datasets on immune cells. We collected one SMART-seq dataset of CD45-positive white blood cells from pre-treatment melanoma patients . Although these cells were derived from the infiltrating immune cells, they covered most of the cell types (macrophage, monocyte, dendritic cells, neutrophil, T/B lymphocytes, natural killer cells, etc.) composed of the AML immune microenvironment. We found that the fraction of both T and B cell estimated from single cell results and TRUST callings from “bulk” samples are significantly positively correlated (Additional file 2: Figure S1a). We then compared the associations of the number of TCR/BCR CDR3s between single cell data and TRUST callings from “bulk” samples. Again, they are also significantly positively correlated (Additional file 2: Figure S1b), indicating that the CDR3s detected by TRUST from bulk RNA-seq data provide a good approximation to the real T/B cell numbers in each sample. In order to estimate the T/B cell clonotype diversity from single cell data, we calculated the Shannon entropy using the frequencies of TCR β chain and BCR heavy chain CDR3 amino acid sequences. In the simulated “bulk” data, we applied CPK  to estimate the clonotype diversity of T/B cells. Consistently, we observed a significantly positive correlation between TCR/ BCR entropy and CPK (Additional file 2: Figure S1c). Based on these results and our previous work, we conclude that our approach has sufficient power to recover TCR and BCR CDR3s to evaluate the fraction and diversity of both T and B cells from bulk RNA-seq data, which allowed us to identify the changes of T and B cells between AML and non-tumor samples.
Overview of TCR α, β, γ, and δ chain CDR3 sequences in AML and non-tumor samples
The clonotype diversity of TCR repertoire in AML and non-tumor samples
T cell clonotype diversity is an important feature of the TCR repertoire which was previously reported to have potential clinical implications [36, 37]. We investigated the differences in T cell clonotype diversity between AML and non-tumor groups. Using CPK to approximate TCR clonal diversity , we observed significantly lower diversity in both pediatric and adult AML samples compared to non-tumor samples (Fig. 1b). This result suggests that T cells are more clonal in the AML microenvironment. No significant difference was observed in TCR diversity between PB and BM samples in the pediatric AML (Additional file 2: Figure S3a) or between pediatric and adult non-tumor samples (Fig. 1b). Interestingly, we found that infant AML samples have significantly higher TCR CPK than children or adult AML (Fig. 1b). This result suggests that T cells are less expanded in infant AML, which might be due to limited bacterial and viral antigen exposure during infancy. Consistently, we also observed lower fraction of β-CDR3s specific to common viral epitopes from cytomegalovirus, Epstein–Barr virus, or influenza , in infant AML than in children or adult AML (Fig. 1c).
Neo-antigens arising from somatic mutations can induce T cell-mediated elimination of cancer cells . A direct consequence of antigen-specific T cell activation is clonal expansion, which can be approximated by the inverse of CPK. We therefore sought to investigate whether specific missense mutation or gene fusion, which has been linked to patient survival, was associated with αβ T cell activation in AML samples. Due to the lack of detailed mutation information from pediatric AML samples, we could only check the mutation status available on five genes with high clinical relevance (FLT3, NPM1, KIT, CEBPA, and WT1) and on three oncogenic gene fusions (RUNX1-RUNX1T1, CBFB-MYH11, and PML-RARA). We found that pediatric AML samples with CBFB-MYH11 fusions have significantly lower TCRβ CPK value (Fig. 1d), suggesting this fusion as potentially immunogenic. The same trend was also observed in infant and adult AML, although the difference is not as significant due to the limited sample size.
γδ T cell analysis in AML and non-tumor samples
To further investigate the potential impact of γδ T cells in AML, we clustered all the complete δ-CDR3s based on their pairwise sequence similarity. This revealed two major clusters of the δ-CDR3 sequences (Fig. 2b), with Cluster1 containing 26 sequences from 19 patients. All the δ-CDR3s in Cluster1 were annotated to be associated with TRDV2 and TRDJ3. Sequence motif analysis of Cluster1 δ-CDR3s revealed the first 4 and last 8 amino acids to be conserved (Fig. 2c), as well as a glycine (G) in the middle. Intriguingly, these individuals have significantly better overall survival (Fig. 2d) compared to the other patients. These results suggest that the δ-CDR3s containing the specific pattern in Cluster1 might serve as a potential prognosis marker or potential therapeutic target for AML patients.
Overview of BCR IgL, IgK, and IgH CDR3 sequences in AML and non-tumor samples
B cell activation and clonal expansion patterns in the AML samples
We further investigated the potential impact of B cells in AML. Similar to the lower TCR diversity, BCR CDR3 diversity in terms of CPK is also lower in AML samples than in non-tumor samples (Fig. 3b). Unlike T cells, B cells, upon binding to a foreign antigen, undergo SHM and CSR to produce high affinity antibodies against the antigen. Therefore, SHM and CSR are important signatures of B cell activation and clonal expansion. To investigate SHM rate, we counted the cases where two IgH CDR3 sequences differ by only one nucleotide, and divided the count by the total assembled CDR3 bases in each sample. Using this measure, we observed significantly higher SHM rate in adult AML samples compared to pediatric AML samples or non-tumor samples (Fig. 3c). Consistent with this result, AICDA , the gene responsible for SHM, also has significantly higher expression in the adult AML samples compared to pediatric AML samples (Additional file 2: Figure S6). To investigate CSR, we examined the approximately 346,000 IgH sequences that were successfully aligned to specific Ig isotypes. We observed significant differences in the isotype distributions among AML and non-tumor groups (Fig. 3d). Specifically, in the non-tumor samples, IgM and IgD, which are the first two heavy chain constant segments in the immunoglobulin locus and usually expressed on naïve mature B cells , account for the majority of the total IgH sequences (Fig. 3d). Infant AML samples also have higher IgM and IgD B cells, but as AML patients age, the fraction of IgG and IgA increase (Fig. 3e). IgG1 and IgA1 become the dominant Ig isotypes in children and adult AML samples (Fig. 3d, e). When normalizing against the expression of housekeeping genes, we found that the level of IgM and IgD only decreased slightly, suggesting that the increase of IgG and IgA fraction is mostly due to the expansion of B cells with IgA and IgG isotypes (Additional file 2: Figure S7). In addition, AML samples show more CSR events than non-tumor samples (Fig. 3f). Taken together, the increased IgH CDR3 length, decreased IgH CDR3 diversity, increased SHM, and increased CSR in AML, especially with IgG and IgA isotypes in adult AML, all indicate higher levels of B cell activation and clonal expansion in the AML microenvironment.
Association between high IgA fraction and worse clinical survival in AML patients
IgA2 fraction and immunosuppressive microenvironment in adult AML
AML is a common hematologic malignancy, although the interactions between malignant myeloid cells and the immune microenvironment, especially T cells and B cells, remain poorly characterized. In this study, we conducted the first comprehensive characterization of TCR (α, β, γ, and δ chains) and BCR (IgL, IgK, and IgH) CDR3 from the bulk RNA-seq data from both pediatric and adult AML samples as well as non-tumor controls. The human immune system evolves with age, as exposures to multiple self and foreign antigen challenges promote the maturation of immune-related cells and organs . We found higher clonal expansion of both T cells and B cells in the AML microenvironment, but observed wide differences between pediatric and adult AML. In particular, we found that adult AML samples have higher fraction of γδ T cells (Fig. 2a) and higher level of IgH SHM rate and CSR events compared to pediatric AML (Fig. 3). One limitation of our study is that we do not have age information for the non-tumor samples, so we could not analyze the age effect in normal donors, although this does not bias any of our findings. Another limitation of this work is that due to the use of bulk RNA-seq data, it is not possible to match the full clonal type (TCR αβ, γδ chain, and BCR heavy light chain) or distinguish subtypes of T and B cells in our analysis. Despite these limitations, our findings help improve our understanding of T and B cell immunity in AML as well as the distinct immune responses of T cells and B cells to AML between children and adults. Our results might provide insights into immunotherapy development in hematological malignancies.
Notably, we found that pediatric AML with highly expanded IgA1 B cells and adult AML with highly expanded IgA2 B cells, which might represent an immunosuppressive microenvironment, are associated with worse overall survival. Recent studies reported that IgA-producing plasma cells can function as potent immunosuppressors through the secretion of PD-L1 in prostate  and liver cancer mouse models . Unlike mouse IgA which has only one subclass, human IgA comprises two subclasses (IgA1 and IgA2) encoded by two distinct genes. The lack of elongated hinge regions in IgA2 Fc ligand forms the major structure difference between the two subclasses . We found the survival-related B cells are restricted to IgA1 in pediatric but to IgA2 in adult AML samples (Fig. 4c, d). Together with many differences observed between pediatric and adult AML, we interpret this as potentially related to the different immune response patterns between children and adults. The IgA CSR is known to be related to the secreted cytokine TGFβ1 , and we observed a significant positive correlation between TGFB1 gene expression and IgA2 fraction in adult AML (Fig. 5a). In addition, in a single cell expression data from one M6 AML patient , we found TGFB1 to be highly expressed in three major cell clusters, including CD4+CD14+ monocytes, PRSS57+MYC+ neutrophils, and CD3+CD7+ T cells (Additional file 2: Figure S12), suggesting a complex regulation of IgA2 B cell proliferation in AML. Our findings may shed light on the unique immune regulation in hematological malignancies.
In summary, our comprehensive analyses of TCR and BCR CDR3 sequences from AML RNA-seq samples provided the first overview of the immune receptor repertoires in both pediatric and adult AML microenvironments. We found a higher clonal expansion of both T cells and B cells in the AML microenvironment. In addition, adult AML samples have a significantly higher level of B cell activation and more secondary Ig class switch events than pediatric AML or non-tumor samples. Furthermore, we found that pediatric AML with highly expanded IgA1 B cells and adult AML with highly expanded IgA2 B cells are associated with worse overall survival. The identified TCR/BCR repertoires and the observed associations from this work provide useful resources and insights into the future development of novel immunotherapies for hematological malignancies.
We thank Drs. Jennifer S. Whangbo, Jerome Ritz, Anna H. Jonsson, and Michael B. Brenner for their helpful discussions.
JZ and XH maintained the TRUST algorithm and processed the raw data. JZ, XH, BW, JW, and JFu performed the data analyses. XY, BL, and XSL designed and supervised the study and wrote the manuscript together with JZ. All co-authors contributed to the research progress discussion and manuscript preparation. All authors read and approved the final manuscript.
We acknowledge the funding support from NCI grants U01 CA226196 (XSL) and U24 CA224316 (XSL), National Key R&D Program of China grant 2017YFC0908300 (XY), CPRIT RR170079 (BL), Chinese Scholarship Council funding (JZ), and Breast Cancer Research Foundation (XSL).
Ethics approval and consent to participate
Consent for publication
XSL is a cofounder and board member of GV20 Oncotherapy and Scientific Advisory Board (SAB) member of 3DMed Care. NH is a founder and SAB member of Neon Therapeutics. XH conducted the work in this study as a postdoctoral fellowship at Dana-Farber Cancer Institute, and he is now an employee of GV20 Therapeutics. The remaining authors declare that they have no competing interests.
- 5.Bejanyan N, Weisdorf DJ, Logan BR, Wang H-L, Devine SM, de Lima M, et al. Survival of patients with acute myeloid leukemia relapsing after allogeneic hematopoietic cell transplantation: a center for international blood and marrow transplant research study. Biol Blood Marrow Transplant. 2015;21:454–9.PubMedCrossRefPubMedCentralGoogle Scholar
- 7.Couzin-Frankel J. Breakthrough of the year 2013. Cancer Immunother. Sci. 2013;342:1432–3.Google Scholar
- 42.Maul RW, Gearhart PJ. AID and somatic hypermutation. Adv Immunol. 2010;105:159–91.Google Scholar
- 43.Stavnezer J, Schrader CE. IgH chain class switch recombination: mechanism and regulation. JI. 2014;193:5370–8.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.