A novel 4-gene signature for overall survival prediction in lung adenocarcinoma patients with lymph node metastasis
- 119 Downloads
Lung adenocarcinoma (LUAD) patients experiencing lymph node metastasis (LNM) always exhibit poor clinical outcomes. A biomarker or gene signature that could predict survival in these patients would have a substantial clinical impact, allowing for earlier detection of mortality risk and for individualized therapy.
With the aim to identify a novel mRNA signature associated with overall survival, we analysed LUAD patients with LNM extracted from The Cancer Genome Atlas (TCGA). LASSO Cox regression was applied to build the prediction model. An external cohort was applied to validate the prediction model.
We identified a 4-gene signature that could effectively stratify a high-risk subset of these patients, and time-dependent receiver operating characteristic (tROC) analysis indicated that the signature had a powerful predictive ability. Gene Set Enrichment Analysis (GSEA) showed that the high-risk subset was mainly associated with important cancer-related hallmarks. Moreover, a predictive nomogram was established based on the signature integrated with clinicopathological features. Lastly, the signature was validated by an external cohort from Gene Expression Omnibus (GEO).
In summary, we developed a robust mRNA signature as an independent factor to effectively classify LUAD patients with LNM into low- and high-risk groups, which might provide a basis for personalized treatments for these patients.
KeywordsTranscriptome Lung adenocarcinoma (LUAD) Lymph node metastasis (LNM) mRNA signature Weighted gene co-expression network analysis (WGCNA) Overall survival (OS)
lymph node metastasis
The Cancer Genome Atlas
time-dependent receiver operating characteristic
Gene Set Enrichment Analysis
Gene Expression Omnibus
weighted gene co-expression network analysis
next generation sequencing
differentially expressed gene
Least Absolute Shrinkage and Selection Operator
Lung cancer is the leading cause of cancer-related death worldwide, with adenocarcinoma being the most common histological type . Despite advances in cancer therapy in recent decades, the prognosis of lung adenocarcinoma (LUAD) patients is still unfavourable, with an overall 5-year survival rate less than 15% . The main reason for this low overall survival rate is that LUAD patients have a high frequency of lymph node metastasis (LNM) or even distant metastases at diagnosis [3, 4, 5, 6]. Therefore, the identification of a high-risk subset from these patients who have greater need for additional systematic therapy is urgently needed.
In recent years, the development of gene expression profile technologies, such as microarray and next generation sequencing (NGS), have provided further opportunities to comprehensively characterize the molecular features of cancer [7, 8]. Considering that individual biomarkers usually have little statistical power, the current approach is to identify novel molecular signatures to offer better prediction in various cancers [9, 10, 11]. A number of studies have proposed gene expression-based signatures for survival stratification in patients with lung cancer [12, 13, 14, 15, 16]. However, few studies have focused on the prognostic prediction for LUAD patients with LNM.
In this study, based on The Cancer Genome Atlas (TCGA) LUAD mRNA-seq and clinical datasets, we sought to develop a gene expression signature to predict overall survival for LUAD patients with LNM, and then the proposed gene signature was validated in an external cohort from the Gene Expression Omnibus (GEO) database.
Data download and processing
TCGA RNA-seq datasets and clinical data for LUAD were downloaded by UCSC Xena browser (https://xenabrowser.net/). GSE68465 was download from the GEO database. The LUAD patients with LNM were filtered by the criteria that N stage of patients was I–IV.
Co-expression gene network based on RNA-seq data
Differentially expressed gene (DEG) analysis
DEG analysis was performed by the Limma package . The tissue samples were separated into para-tumour group and tumour group. The DEGs were defined as genes with Q value (adjusted p value between two groups) less than 0.05.
Identification and selection of prognostic biomarkers in LNM-positive patients
Then, LASSO Cox regression analysis was performed to identify robust markers among the 575 candidates. By forcing the sum of the absolute value of the regression coefficients to be less than a fixed value, certain coefficients were shrunk to exactly zero, and the most powerful prognostic markers were identified with relative regression coefficients. Cross-validation was applied to prevent the over-fitting of the LASSO Cox model (Fig. 2e). Figure 2f shows individual coefficient distributions of the 4 filtered markers: LDHA was associated with high risk (HR > 1), while ABAT, INPP5J and FAM117A were shown to be protective (HR < 1).
Risk score and survival prediction based on the 4-gene signature
Expression profiles of the 4-gene signature and subgroup analysis
Construction of a nomogram to predict overall survival in LNM-positive patients
Validation of the 4-gene signature for survival prediction
Accumulating evidence shows that lung adenocarcinoma patients with local invasion or lymph node metastasis always exhibit poor responses to standard treatments and thus tend to have poor clinical outcomes [3, 4, 5, 25]. In clinical practice, these patients need more intensive monitoring and aggressive therapy, and robust biomarkers are urgently needed to stratify high-risk groups among these patients. However, individual biomarkers usually have very little predictive power. The established clinical markers for survival are primarily based on patient- and tumour-related factors, such as AJCC-TNM stage, while the accuracy and specificity are also limited. Therefore, our study aimed to identify novel molecular signatures integrated with established clinicopathological features to predict overall survival in LUAD patients with lymph node metastasis.
In this study, we identified 4 coding genes associated with overall survival in LUAD patients with lymph node metastasis, namely, LDHA, ABAT, FAM117A and INPP5J, in the training set. Among the 4 coding genes, LDHA was widely reported to promote malignant progress and predict poor survival in various cancer types [26, 27, 28, 29, 30]. In addition, Ooms et al. reported that INPP5 J functions as a tumour suppressor to inhibit breast cancer cells’ invasive ability via PI3 K/AKT signalling . However, ABAT and FAM117A remain inadequately investigated in cancer-related research. To some degree, our study might provide some clues for further investigation on the biological roles and clinical significance of these genes.
Based on multivariate Cox coefficients derived from LASSO analysis, we developed a 4-gene signature-based risk score model. Moreover, we investigated the prognostic value of the signature in different subgroups. In detail, the signature still exhibited powerful prediction for overall survival in LNM+ patients with same TNM stage and genomic alteration (including EGFR and KRAS mutation status), confirming that the signature is a promising marker independent of different clinicopathological features. In addition to survival prediction, GSEA showed that the signature-identified high-risk group was significantly correlated with certain hallmarks of cancer, such as EMT and hypoxia, indicating the potential molecular mechanisms underlying the lethal tendency of these LNM+ patients. By integrating established clinicopathological features with the signature, we developed a nomogram to predict the survival probability of LNM+ patients. The predictive power was measured by the time-dependent area under the ROC curve (AUC(t)), and the result showed that the integrated nomogram model had higher predictive power than individual markers. Lastly, an external GEO cohort was used to validate the prognostic value of the 4-gene signature, and the survival analysis showed the same tendency in the validation cohort.
The limitations of our study should be acknowledged. This is a retrospectively designed study, and the sample size of the training and validation sets is relatively small.
In summary, the novel 4-gene signature proved to be a robust model with high predictive power in LUAD patients with LNM+. The predictive power was stable over time and showed promising survival prediction in combination with established markers. The use of the signature integrated with clinicopathological features can help to further stratify LNM+ patients into risk groups, thus serving as a predictive tool for clinical outcome, guiding personalized treatment, and resulting in more aggressive therapy for high-risk patients or less aggressive therapy for low-risk patients. Further research is needed to reveal the interplay between these genes, and thereby, we will be able to develop better treatment alternatives for high-risk LUAD patients with lymph node metastasis.
In conclusion, based on publicly available data, we constructed a robust mRNA signature that could serve as a reliable marker to stratify a high-risk group among LUAD LNM+ patients. Subgroup analysis indicated that the signature works effectively independent of other clinical features. Validation in an external cohort from GEO further confirmed the prognostic value of the signature. We hope the identified signature could help to improve the strategies for personalized treatment of LUAD patients with LNM.
XW B and R S conceived and designed the experiments. XWB, SX, QLZ, ZJG and KZ analysed the data. XWB, QLZ, YBZ and YFW wrote the paper. All authors read and approved the final manuscript.
We would like to thank Dr. Omid Azimzadeh and Dr. Michael Rosemann for helpful discussions and suggestions.
The authors declare that they have no competing interests.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the Xena browser (https://xenabrowser.net/) repository.
Consent for publication
Ethics approval and consent to participate
This work was supported by the Zhejiang Provincial Natural Science Foundation (No. LY16H020005).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 21.Cox DR. Regression models and life-tables. J Roy Stat Soc Ser B. 1972;34(2):187–202.Google Scholar
- 23.Harrell FE. Ordinal logistic regression. Regression modeling strategies. Berlin: Springer; 2015. p. 311–25.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.