Classification of triple-negative breast cancers based on Immunogenomic profiling
Abundant evidence shows that triple-negative breast cancer (TNBC) is heterogeneous, and many efforts have been devoted to identifying TNBC subtypes on the basis of genomic profiling. However, few studies have explored the classification of TNBC specifically based on immune signatures that may facilitate the optimal stratification of TNBC patients responsive to immunotherapy.
Using four publicly available TNBC genomics datasets, we classified TNBC on the basis of the immunogenomic profiling of 29 immune signatures. Unsupervised and supervised machine learning methods were used to perform the classification.
We identified three TNBC subtypes that we named Immunity High (Immunity_H), Immunity Medium (Immunity_M), and Immunity Low (Immunity_L) and demonstrated that this classification was reliable and predictable by analyzing multiple different datasets. Immunity_H was characterized by greater immune cell infiltration and anti-tumor immune activities, as well as better survival prognosis compared to the other subtypes. Besides the immune signatures, some cancer-associated pathways were hyperactivated in Immunity_H, including apoptosis, calcium signaling, MAPK signaling, PI3K–Akt signaling, and RAS signaling. In contrast, Immunity_L presented depressed immune signatures and increased activation of cell cycle, Hippo signaling, DNA replication, mismatch repair, cell adhesion molecule binding, spliceosome, adherens junction function, pyrimidine metabolism, glycosylphosphatidylinositol (GPI)-anchor biosynthesis, and RNA polymerase pathways. Furthermore, we identified a gene co-expression subnetwork centered around five transcription factor (TF) genes (CORO1A, STAT4, BCL11B, ZNF831, and EOMES) specifically significant in the Immunity_H subtype and a subnetwork centered around two TF genes (IRF8 and SPI1) characteristic of the Immunity_L subtype.
The identification of TNBC subtypes based on immune signatures has potential clinical implications for TNBC treatment.
KeywordsTriple-negative breast cancer Tumor immunity Immunogenomic profiling Classification Machine learning
Activated dendritic cells
Basal-like immune activated
Cell adhesion molecule binding
Cytokine and cytokine receptor
False discovery rate
Gene-set enrichment analysis
Human leukocyte antigen
Human epidermal growth factor receptor 2
Immature dendritic cells
Major histocompatibility complex
- NK cells
Natural killer cells
Plasmacytoid dendritic cells
Single-sample gene-set enrichment analysis
Somatic copy number alteration
The Cancer Genome Atlas
- Tfh cells
Follicular helper T cells
- Th17 cells
T helper 17 cells
Triple-negative breast cancer
Regulatory T cells
Weighted gene co-expression network analysis
Triple-negative breast cancer (TNBC) is a breast cancer subtype that lacks the expression of hormone receptors (estrogen receptor (ER) and progesterone receptor (PR)) and human epidermal growth factor receptor 2 (HER2). TNBC is associated with a high risk of mortality for its aggressiveness and the lack of effective targeted therapies. Moreover, abundant evidence shows that TNBC is very heterogeneous [1, 2, 3, 4]. Lehmann et al. identified six gene expression profile-based TNBC subtypes, including an immunomodulatory (IM) subtype that was enriched in immune cell processes . Bonsang-Kitzis et al. identified six TNBC subgroups based on a biological network-driven approach, which included two immunity clusters whose stromal immune module gene signatures exhibited a strong prognostic value . Burstein et al. identified four stable TNBC subgroups based on mRNA expression and DNA genomic profiling, which included Luminal/Androgen Receptor, Mesenchymal, Basal-Like Immune Suppressed, and Basal-Like Immune Activated (BLIA); furthermore, the authors identified potential therapeutic targets for these specific subtypes . These efforts to classify TNBC might lay the foundation for developing targeted therapies for TNBC.
Recently, cancer immunotherapy has been successful in treating many refractory malignancies . Thus, it is worth considering immunotherapy for TNBC, since the therapeutic options for this disease are significantly limited. Indeed, many experimental and clinical studies have explored the possibility of treating TNBC patients with immunotherapy [6, 7, 8, 9, 10, 11]. Moreover, numerous studies have demonstrated that TNBC is more immunogenic than other breast cancer (BC) subtypes, which may warrant an immunotherapeutic approach for TNBC [12, 13]. However, currently, immunotherapeutic strategies exhibit beneficial effects in less than 20% of cancer patients. This suggests that not all TNBC patients could respond to immunotherapy. In fact, certain genetic or genomic features, such as tumor mutation burden (TMB), neoantigen load, PD-L1 expression, and deficient DNA mismatch repair, have been associated with cancer immunotherapeutic responsiveness [14, 15, 16, 17, 18].
In this study, we classified TNBC into three distinct subtypes by immunogenomic profiling: Immunity High (Immunity_H), Immunity Medium (Immunity_M), and Immunity Low (Immunity_L). We demonstrated the stability and reproducibility of this classification in four independent datasets by a machine learning approach. Furthermore, we identified the subtype-specific molecular features, including genes, gene ontology, pathways, and networks. The identification of immune signature-associated TNBC subtypes may facilitate the optimal selection of TNBC patients responsive to immunotherapy.
For each TNBC dataset, we first quantified the enrichment levels of the 29 immune signatures in each TNBC sample by the single-sample gene-set enrichment analysis (ssGSEA) score [19, 20]. Based on the enrichment levels (ssGSEA scores) of the 29 immune signatures, we performed hierarchical clustering of TNBC.
Evaluation of immune cell infiltration level, tumor purity, and stromal content in TNBC
ESTIMATE  was used to evaluate the immune cell infiltration level (immune score), tumor purity, and stromal content (stromal score) for each TNBC sample.
Gene-set enrichment analysis
We performed gene-set enrichment analysis of the METABRIC and TCGA datasets by GSEA (R implementation) [22, 23, 24]. This analysis identified the KEGG  pathways that were upregulated in Immunity_H and Immunity_L (FDR < 0.05), respectively. The common pathways identified in both datasets were selected.
Correlation of pathway activities with immune cell infiltration levels in TNBC
We quantified the activity of a pathway with the ssGSEA score of the set of genes included in the pathway, and the immune cell infiltration level with the immune score. The Spearman correlation of the ssGSEA score and the immune score were used to evaluate the correlation of pathway activities with immune cell infiltration levels in TNBC.
Identification of TNBC subtype-specific gene ontology and networks
We used WGCNA  to identify the gene modules (gene ontology) that were significantly associated with the genes highly correlated with immune cell infiltration based on gene co-expression analysis. The gene modules specifically amplified in different TNBC subtypes were identified. On the basis of the expression correlations between the hub genes in the gene modules, we built gene–gene interaction networks. A hub gene was defined as a gene that was connected to no less than 10 other genes, with a connectedness weight greater than 0.25.
We compared the survival prognosis (overall survival (OS), disease-free survival (DFS), and metastasis-free survival (MFS) of TNBC patients considering tumor subtype and the expression level of the identified genes, i.e., higher expression level (expression levels > median) versus lower expression level (expression levels < median). The log-rank test was used to calculate the significance of survival time differences using a threshold of P-value < 0.05. Kaplan–Meier curves were plotted to show the survival time differences. We performed the survival analyses using the METABRIC, TCGA, and GSE103091 datasets, where the survival data were available.
We transformed each attribute (immune signature or gene set) value (ssGSEA score) xi into xi′ by the equation xi′ = (xi − xmin)/(xmax − xmin), where xmin and xmax represent the minimum and maximum of the ssGSEA scores for the gene set across all TNBC samples, respectively. The Random Forest (RF) classifier was used to classify the TNBC subtypes. We set the number of trees to 100 and all 29 immune signatures as features for the RF classifier. The classification performance was evaluated by the accuracy and the weighted F-score. We carried out the classification in Weka .
Comparison of the proportions of immune cell subsets between TNBC subtypes
CIBERSORT  was used to calculate the proportions of 22 human immune cell subsets. We set 1000 permutations and P < 0.05 as the criteria for the successful deconvolution of a sample. We compared the proportions of the immune cell subsets between TNBC subtypes using the Mann–Whitney U test.
Comparison of clonal heterogeneity between the TNBC subtypes
We used the ABSOLUTE algorithm  to assess the ploidy score, representing clonal heterogeneity, for each TNBC sample. We compared the ploidy scores between the TNBC subtypes using the Kruskal–Wallis test.
Comparison of biological processes between the TNBC subtypes
We compared the activities (ssGSEA scores) of stem cell-associated (marker genes ABCA8 and ALDH1A1), proliferation (MKI67), and epithelial-to-mesenchymal transition (EMT) (ZEB1, ZEB2, SNAIL, CDH2 and TGFB1) biological processes between the TNBC subtypes. The Kruskal–Wallis test was used to determine the statistical significance of the results.
Comparison of somatic copy number alteration (SCNA) levels between the TNBC subtypes
We applied GISTIC2  to the SNP6 file of the SCNA data for TNBC in TCGA. We obtained arm-level SCNA frequencies for Immunity_H and Immunity_L TNBC samples and compared them. Moreover, we calculated focal SCNA levels for each TNBC samples and compared them between Immunity_H and Immunity_L.
Immunogenomic profiling identifies three TNBC subtypes
Notably, most HLA genes showed significantly higher expression levels in Immunity_H and significantly lower expression levels in Immunity_L (ANOVA test, P < 0.05) (Fig. 2b, Additional file 3: Figure S2A). Moreover, the expression levels of various immune cell subpopulation marker genes  were the highest in Immunity_H and the lowest in Immunity_L, such as CD8A (cytotoxic T cell), CD45RO (memory T cell), CD20 (B cell), CXCR5 (Tfh cell), FOXP3 (Treg), IL-17 (Th17 cell), CD1A (iDC), and IL3RA (pDC) (Additional file 3: Figure S2B).
We examined the expression of PD-L1 (programmed cell death 1 ligand) in the three TNBC subtypes and found that Immunity_H had the highest PD-L1 expression levels and Immunity_L had the lowest PD-L1 expression levels (ANOVA test, P < 0.05) (Fig. 2c). This suggest that the TNBC subtype Immunity_H might better respond to anti-PD-L1 immunotherapy than the other TNBC subtypes, since PD-L1 expression tends to be positively associated with immunotherapeutic responsiveness .
Survival analyses showed that these TNBC subtypes had distinct clinical outcomes. The Immunity_H subtype likely had a better survival prognosis than the Immunity_M and Immunity_L subtypes, but there was no significant survival difference between the Immunity_M and the Immunity_L subtypes (Fig. 2d). This is consistent with previous studies showing that TNBC with elevated immune activity were associated with more favorable clinical outcomes [4, 12, 34].
Comparisons of the immunogenomic profiling-based TNBC classification with other TNBC classification methods
Identification of TNBC subtype-specific pathways, gene ontology, and networks
Identification of TNBC subtype-specific pathways
Identification of TNBC subtype-specific gene ontology
We performed a weighted gene co-expression network analysis of the METRABRIC dataset by WGCNA  and identified a set of gene modules (gene ontology) associated with the highly expressed genes previously determined. We found several gene modules that significantly differentiated TNBC by subtype, survival time, or survival status (Fig. 4c). As expected, the immune response was significantly elevated in Immunity_H (P = 4.0*10− 54), while was depressed in Immunity_L (P = 1.0*10− 32). Moreover, a high immune response was associated with a better survival prognosis in TNBC patients (P = 5.0*10− 4). This finding is in line with the previous observation that the subtype Immunity_H is associated with better clinical outcomes than the other subtypes. Similar results were observed for the TCGA dataset (Additional file 4: Figure S3B). The other two immune-associated gene modules, i.e., myeloid leukocyte activation and response to type I interferon, were also enriched in Immunity_H (P = 3.0*10− 14 and 2.0*10− 13, respectively), and were reduced in Immunity_L (P = 2.0*10− 16 and 2.0*10− 11, respectively). In contrast, cell adhesion molecule (CAMD) binding activity was significantly increased in Immunity_L (P = 1.0*10− 30) and decreased in Immunity_H (P = 2.0*10− 35). This suggests that CAMD activity has a strong inverse correlation with tumor immunity in TNBC. Interestingly, CAMD activity correlated with reduced survival (P = 0.001 for OS, and P = 0.002 for DFS). Cell cycle process was also increased in Immunity_L (P = 0.04), suggesting that the cell cycle signature correlates with reduced tumor immunity. This finding is consistent with results from previous studies [38, 39].
Identification of TNBC subtype-specific networks
WGCNA generated a gene module (green color, Fig. 4c) that was specifically significant in Immunity_H. We identified 98 hub genes from the gene module, including five transcription factor (TF) genes, i.e., CORO1A, STAT4, BCL11B, ZNF831, and EOMES. The five TFs interact with each other and form a subnetwork with diverse immune and cancer-related genes that they regulate (Fig. 4d). Typically, CD247 (the marker gene for a T cell subpopulation) was regulated by all these TFs, and the cytotoxic T cell marker gene CD8A was co-regulated by CORO1A, STAT4, and EOMES. MAP4K1 (Mitogen-Activated Protein Kinase Kinase Kinase Kinase 1), which is involved in multiple immune and cancer-related pathways including B cell receptor signaling, JNK, EGF/EGFR, TGF-β, and MAPK signaling, was also regulated by the five TFs. CORO1A encodes a member of the WD repeat protein family which is involved in diverse cellular processes including cell cycle, apoptosis, signal transduction, and gene regulation. The main pathways related to CORO1A include cytoskeletal signaling and phagosome function, and its relatedness with immune regulation has been revealed [40, 41]. The association of the other TFs STAT4 , BCL11B , and EOMES  with immunity has been examined, whereas the role of ZNF831 in immune regulation remains unexplored.
WGCNA also generated a gene module (turquoise color, Fig. 4c) that was more enriched in Immunity_L. This module included 112 hub genes, two of which encode the TFs IRF8 and SPI1. A subnetwork of the hub genes centered on IRF8 and SPI1is shown in Fig. 4e. IRF8 (interferon regulatory factor 8) has been shown to play a negative role in immune cell regulation . Thus, the IRF8-centered regulatory network may be responsible for the depressed immunity of the TNBC subtype Immunity_L. SPI1 (Spi-1 proto-oncogene) encodes a transcription factor that activates gene expression during immune cell development. As a result, the deregulation of SPI1 may affect immunity. In fact, SPI1 showed significantly lower expression levels in Immunity_L than in Immunity_H (Student’s t test, P = 9.1*10− 28, fold change > 2). Therefore, the down regulation of SPI1 may contribute to the decreased immunity of the Immunity_L subtype. The contribution of the IRF8- and SPI1-centered regulatory network to the depressed immunity of Immunity_L is evidenced by a previous study showing that IRF8 and SPI1 together negatively regulated immune cell differentiation .
Interestingly, survival analyses showed that elevated expression levels of these TF genes (except SPI11) were consistently associated with better survival prognosis in TNBC (Fig. 4f), suggesting the pivotal role of these TFs in TNBC immunity and prognosis.
Class prediction of TNBC subtypes based on immunogenomic profiling
A number of prior studies have identified TNBC subtypes on the basis of genomic profiling [2, 3, 4, 34]. However, very few studies have investigated the classification of TNBC specifically based on immune signatures. To fill this knowledge gap, we focused on identifying immune-related TNBC subtypes using immunogenomic profiling. Our results show that TNBC could be classified into three stable subtypes: Immunity High, Immunity Medium, and Immunity Low. Furthermore, we demonstrated that this classification was reproducible and predictable. The Immunity High TNBC subtype was enriched not only in immune signatures, but also in many cancer-associated pathways including apoptosis, calcium signaling, MAPK signaling, PI3K–Akt signaling, and RAS signaling (Fig. 4a). This is in line with our previous study showing that diverse immune signatures positively correlated with the MAPK and PI3K–Akt signaling pathways in TNBC . In contrast, the Immunity Low TNBC subtype was impoverished in immune signatures but enriched in Hippo signaling, DNA replication, mismatch repair, spliceosome, adherens junction, pyrimidine metabolism, glycosylphosphatidylinositol (GPI)-anchor biosynthesis, and RNA polymerase pathways (Fig. 4a). It is rational that the mismatch repair pathway activity was significantly negatively correlated with immune signatures in cancer, since deficient mismatch repair often results in elevated tumor immunity . Interestingly, we found that the Hippo signaling pathway had a significantly negative correlation with immune signatures in TNBC. This observation is in agreement with findings from previous studies showing that the Hippo signaling pathway plays a key role in regulating tumor immunity [46, 47, 48]. Deficiency of Hippo pathway components such as kinases LATS1/2 (large tumor suppressor 1 and 2) , effector YAP (Yes-associated protein) , and transcriptional co-activator TAZ (WW domain-containing transcription regulator 1)  could promote anti-tumor immunity. Overall, these results revealed potential positive or negative associations between pathway activities and immune activities in TNBC.
Currently, immunotherapy for TNBC is an active field of investigation , and the stronger immunogenicity exhibited by TNBC compared to other breast cancer subtypes suggests that immunotherapy could be a viable option for TNBC patients . However, some preliminary TNBC immunotherapy clinical trials have not shown significant patients’ improvement (personal communication). Thus, the immune signature-based classification of TNBC may aid the stratification of TNBC patients to identify those responsive to immunotherapy. It is imaginable that patients with an Immunity_H subtype of TNBC would be more likely to respond to anti-PD-1/PD-L1 therapy than patients with other TNBC subtypes, since PD-L1 is more highly expressed in Immunity_H TNBC, and PD-L1 expression is a predictive biomarker for the response to PD-1/PD-L1-directed immunotherapy [36, 54].
The identification of TNBC subtypes based on immune signatures has potential clinical implications for TNBC treatment.
We thank Dr. Agnese Mariotti for editing the manuscript.
This study was funded by the China Pharmaceutical University (grant number 2632018YX01, 3150120001).
Availability of data and materials
We used four publicly available TNBC genomic datasets: METABRIC , TCGA , GSE75688  and GSE103091 . The METABRIC dataset was obtained from cBioPortal (http://www.cbioportal.org/study?id=brca_metabric#summary). The TCGA dataset was obtained from the TCGA data portal (https://portal.gdc.cancer.gov/). The GSE75688 and GSE103091 datasets were downloaded from the NCBI gene expression omnibus (https://www.ncbi.nlm.nih.gov/geo/). We obtained 29 immune signatures (represented by 29 different gene sets, respectively) from the publications [12, 35] (Additional file 1: Table S1).
YH performed data analyses and helped prepare for the manuscript. ZJ performed data analyses and helped prepare for the manuscript. CC performed data analyses. XW conceived the research, designed the analyses strategies, and wrote the manuscript. All the authors read and approved the final manuscript.
Ethics approval and consent to participate
Ethical approval and consent to participate was waived since we used only publicly available data and materials in this study.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 5.Del Paggio JC. Immunotherapy: Cancer immunotherapy and the value of cure. Nat Rev Clin Oncol. 2018;15(5):268–270.Google Scholar
- 10.Bernier C, et al. DZ-2384 has a superior preclinical profile to taxanes for the treatment of triple-negative breast cancer and is synergistic with anti-CTLA-4 immunotherapy. Anti-Cancer Drugs. 2018.Google Scholar
- 27.Witten IH, et al. Data Mining: Practical Machine Learning Tools and Techniques. 4th ed: Morgan Kaufmann, San Francisco; 2016.Google Scholar
- 46.Moroishi, T., et al., The hippo pathway kinases LATS1/2 suppress Cancer immunity. Cell, 2016. 167(6): p. 1525–1539 e17.Google Scholar
- 52.Davoli T, et al. Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy. Science. 2017;355(6322).Google Scholar
- 53.Wang X, et al. Immunological therapy: A novel thriving area for triple-negative breast cancer treatment. Cancer Lett. 2018;442:409–428.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.