Identification of a molecular signature of prognostic subtypes in diffuse-type gastric cancer
Although recent advances in high-throughput technology have provided many insights into gastric cancer (GC), few reliable biomarkers for diffuse-type GC have been identified. Here, we aim to identify a prognostic and predictive signature of diffuse-type GC heterogeneity.
We analyzed RNA-seq-based transcriptome data to identify a molecular signature in 150 gastric tissue samples including 107 diffuse-type GCs. The predictive value of the signature was verified using other diffuse-type GC samples in three independent cohorts (n = 466). Log-rank and Cox regression analyses were used to estimate the association between the signature and prognosis. The signature was also characterized by somatic variant analyses and tissue microarray analysis between diffuse-type GC subtypes.
Transcriptomic profiling of RNA-seq data identified a signature which revealed distinct subtypes of diffuse-type GC: the intestinal-like (INT) and core diffuse-type (COD) subtypes. The signature showed high predictability and independent clinical utility in diffuse-type GC prognosis in other patient cohorts (HR 2.058, 95% CI 1.53–2.77, P = 1.76 × 10–6). Integrative mutational and gene expression analyses demonstrated that the COD subtype was responsive to chemotherapy, whereas the INT subtype was responsive to immunotherapy with an immune checkpoint inhibitor (ICI). Tissue microarray analysis showed the practical utility of IGF1 and NXPE2 for predicting diffuse-type GC heterogeneity.
We present a molecular signature that can identify diffuse-type GC patients who display different clinical behaviors as well as responses to chemotherapy or ICI treatment.
KeywordsGastric cancer Diffuse-type GC Prognosis Chemotherapy Immune checkpoint inhibitor
Gastric cancer (GC) is the third leading cause of cancer-related mortality and the fifth most common cancer worldwide . Chemotherapy has been well established to improve the survival rates of GC patients after surgery . Even with the advancement of therapeutic options, however, the optimal approach for an individual GC patient is difficult to determine  because of considerable clinicopathologic heterogeneity in GC patients. The Lauren classification, stratifying GC into diffuse, intestinal, and mixed types, has been widely used in the clinical field [3, 4]. Diffuse-type GC accounts for approximately 30% of GC and often exhibits more aggressive characteristics and poorer clinical outcomes than intestinal-type GC [5, 6]. Therefore, there is a crucial need to molecularly characterize diffuse-type GC and identify a signature that could predict the clinical course of and suggest appropriate treatment options for diffuse-type GC.
Among four molecular subtypes [i.e., Epstein–Barr virus positive (EBV), microsatellite instable (MSI), genome stable (GS), and chromosomal instability (CIN)] stratified according to the Cancer Genome Atlas (TCGA) consortium, diffuse-type GC is classified mainly as the GS type . Additionally, diffuse-type GCs can be further distinguished as the microsatellite stable and epithelial-to-mesenchymal transition (MSS/EMT) subtype, which shows the worst prognosis as demonstrated by the Asian Cancer Research Group (ACRG) . Despite these defined classifications, diffuse-type GC is molecularly heterogeneous  because of different tumor origins among diffuse-type GCs . Recent advances in molecular characterization have provided evidence that diffuse-type GC comprises molecularly distinct subtypes, including EMT-associated subtypes [4, 10]. Indeed, a number of genome-wide studies have been conducted on diffuse-type GC, yet there are no reliable criteria that can adequately predict prognosis in diffuse-type GC. Moreover, despite the advances in treatment options, including chemotherapy, the ability of these studies to predict the response to therapy remains insufficient, or the relevant cancer patients included were limited.
Here, we investigated distinct molecular subtypes of diffuse-type GC, displaying different prognoses and treatment responsiveness, and generated a gene signature stratifying diffuse-type GC patients into these subtypes. Using multiple patient cohorts, we tested whether our signature showed prognostic or therapeutic relevance. Via integrative exploration of mutational and gene expression alterations, we discovered that high-risk patients classified by the signature benefited from standard chemotherapy, while low-risk patients were responsive to immunotherapy based on immune checkpoint inhibitor (ICI) treatment. Using tissue microarray analysis (TMA) to verify protein expression levels, we also confirmed that IGF1 and NXPE2 might be practical indicators for predicting the heterogeneous clinical behavior of diffuse-type GCs.
Materials and methods
Patients and data
We generated a transcriptome dataset of 150 fresh-frozen tissues including diffuse-type GC (n = 107), intestinal-type GC (n = 23), and normal gastric tissues (n = 20) obtained from the BioBank of the Chungnam and Seoul National University Hospitals (the original cohort, n = 150). We also obtained the mRNA expression or variant data of diffuse-type GC from the TCGA database (n = 61, the TCGA cohort) , the ACRG study (n = 135, the ACRG cohort) , and the Samsung Medical Center in South Korea (n = 280, the SMC cohort) . Table S1 details the baseline characteristics of GC patients.
Details of the methodology are available in Supplementary Materials.
Discovery of distinct subtypes of diffuse-type GC by transcriptomic profiling
Through exploring the expression patterns, we intuitively identified two distinct subsets of genes that were highly associated with the INT or COD cluster of GC (252 and 397 genes, respectively, surrounded by purple dashed lines in Fig. 1). These genes were more highly expressed in the INT or COD cluster than in the other clusters. Among the 252 genes associated with the INT cluster, many genes involved in the cell cycle or DNA repair were observed. Among the 397 genes associated with the COD cluster, on the other hand, genes involved in EMT-associated functions such as cell adhesion/migration or the TGFβ signaling pathway were significantly plentiful (Fig. 1). These results indicate that the newly discovered COD cluster may be compatible with the known poor prognosis of diffuse-type GC, which may derive from EMT activity.
To identify optimal gene sets for distinguishing patient subgroups (i.e., the N, INT, and COD clusters) in diffuse-type GC, genes that were differentially expressed between the three clusters were next identified using only diffuse-type GC samples (n = 107) from the original cohort. We selected two lists of genes that were significantly differentially expressed between the N and INT clusters or the INT and COD clusters (P < 0.001), which were compared by using a Venn diagram approach (Fig. S1a). Gene list “A” represents genes that were differentially expressed between the N and INT clusters (737 genes), and gene list “B” represents genes that were differentially expressed between the INT and COD clusters (2069 genes). When the two gene lists were compared, three different patterns were observed: only A (590 genes), A and B (147 genes), and only B (1922 genes; Fig. S1b). Diffuse-type GC samples in the N cluster showed gene expression patterns of normal stomach, which might be due to involving many normal gastric cells, even though all diffuse-type GC tissues used in the current investigation were pathologically confirmed as containing high tumor cell contents. Therefore, genes in the only-A category exhibited distinct expression patterns associated with tumorigenesis of GC in the INT subgroup. On the other hand, genes in the only-B category exhibited expression patterns associated with the progression into the COD subgroup in diffuse-type GC. Genes in both the A and B categories were common to the three subgroups of diffuse-type GC. Among the genes involved in the only-B category, many genes associated with EMT activity, such as ERG, FGF1, FGF2, FGFR1, SFRP1, SFRP2, SOX10, SOX8, TGFB1I1, and TGFB3, were highly expressed in the COD subgroup (Fig. S1b), consistent with our previous observations shown in Fig. 1.
Prognostic utility of the COD signature in diffuse-type GC
Univariate and multivariate Cox regression analysis of overall survival in diffuse type gastric cancer (combined with ACRG and SMC cohorts)
HR (95% CI)
HR (95% CI)
Gender (male or female)
AJCC stage (I, II, III or IV)
3.22 × 10–22
1.04 × 10–23
Tumor site (cardia, body, antrum or whole)
COD-signature (INT or CODa)
4.4 × 10–4
1.76 × 10–6
Adjuvant chemotherapy data were available for the patients from the ACRG cohort. Because adjuvant chemotherapy is the standard treatment option for GC, we investigated whether the signature could predict diffuse-type GC patients who would benefit from adjuvant chemotherapy. This analysis was performed for patients with diffuse-type GC without distant metastasis (n = 115). When the patients were divided into the INT and COD subtypes based on the signature and the difference in OS was independently assessed, adjuvant chemotherapy was found to improve the survival rate in patients with the COD subtype (P = 0.003, Fig. 2e), while patients with the INT subtype showed only a moderate benefit from adjuvant chemotherapy (P = 0.13, Fig. 2f). When the Cox regression model was applied, the interaction of the signature with adjuvant chemotherapy reached a significance level of 0.326 (Fig. 2g). However, consistent with the Kaplan–Meier and log-rank tests, the estimated HR for adjuvant chemotherapy in the COD subtype was 0.333 (95% CI 0.155–0.713; P = 0.004), retaining significant predictive value, while the HR for OS for adjuvant chemotherapy in the INT subtype was 0.576 (95% CI 0.28–1.187; P = 0.135). Taken together, the results showed that the newly identified signature exhibited significant prognostic potential as well as predictive value for chemotherapy in diffuse-type GC patients.
Biological insight into the COD signature
Functional enrichment analysis illustrated that genes involved in cellular movement, interaction, proliferation, or cell morphology in the category of molecular and cellular functions were significantly activated in the COD subtype (Fig. S3). Regulator effects analysis also showed that players involved in the EMT signature were key genetic mediators dichotomizing diffuse-type GC subgroups (Table S2; Fig. S4). Details of biological insights into the COD signature are available in Supplementary Materials.
Mutational profiling reveals an association between the COD signature and the response to an immune checkpoint inhibitor (ICI)
While the COD signature showed significant predictive value for standard chemotherapy in diffuse-type GC (Fig. 2f–h), patients with the INT subtype responded moderately to this type of therapy, implying a need for additional or alternative therapeutic options for the INT subtype. To identify clues toward alternative therapies for the INT subgroup of patients, we explored mutational variants in diffuse-type GC in the TCGA cohort. A hierarchical cluster analysis based on the signature revealed stratification of patients into the INT and COD subtypes (Fig. S5). Estimation of the tumor mutation burden (TMB) in the INT and COD subtypes revealed that TMBs in the INT subtype were significantly higher than those in the COD subtype (Two-sample t test; P = 1.6 × 10–4; Fig. 3a). Comparison of the mutation frequencies of all known genes between the INT and COD subtypes revealed that a total of 470 genes showed significantly different mutation frequencies (Fisher’s exact tests, each P < 0.05; Table S3). We searched for enriched functions using significant mutational variants (Fig. S6) and observed that genes involved in cell/focal adhesion, ECM-receptor interaction, or chromatin remodeling/modification were significantly plentiful (Fig. 3b), consistent with our previous results (Fig. 1; Fig. S3). Among the genes involved in cell adhesion, MUC16, associated with hypermutation and favorable prognosis in GC , and with resistance to chemotherapy in lung cancer , was the top discriminator of the INT and COD subtypes. The total mutation rate of MUC16 in the diffuse-type GC patients was 31.1%, and the mutation frequency of MUC16 in the COD subtype was significantly higher than that in the INT subtype (Fisher exact test; P = 0.002), suggesting that the MUC16 variant is a good indicator discriminating diffuse-type GC of the INT subtype from that of the COD subtype. Comparison of CDH1 mutations, which are well-known variants in diffuse-type GC, between the INT and COD subtypes revealed that the mutation frequency in the COD group was higher than that in the INT group; however, the difference was not statistically significant. We also observed significantly more mutations of PIK3CA [frequently found in microsatellite unstable (MSI) GCs]  and ARID1A (associated with MSI along with hypermutations and PD-L1 expression in cancers including GC) [15, 16, 17] in the INT subtype compared to the COD subtype. These results suggest distinct features of the INT subtype, including high TMB and MSI, which are typical indicators predicting the response to ICI treatment [5, 18, 19].
We also compared known molecular subtypes  with somatic alterations illustrated by the signature (Fig. 3c). When considering molecular subtypes, all MSI (n = 7) and the majority of EBV (4 out of 6) patients exhibited the INT subtype, whereas all GS (n = 12) patients were classified as the COD subtype. The difference in molecular subtypes between the two subgroups of the signature was statistically significant (χ2 test; P = 1.919 × 10–4; Fig. 3c). These results also support a distinct feature of the INT subtype regarding the response to ICI treatment according to significant enrichment of MSI or EBV along with PIK3CA and ARID1A mutations [5, 14, 15, 16, 17].
We further sought to identify the predictive value of the COD signature for ICI treatment (Fig. 3d). When the expression levels of immune checkpoint genes were compared, CD274 (PD-L1) showed a significant difference in expression between the INT and COD subtypes, supporting a reported close relationship between ARID1A and PD-L1 expression in GC [15, 16, 17]. Since activation of DNA damage response and repair (DDR) genes is significantly associated with the response to ICI , we also estimated the expression levels of DDR genes, revealing that the vast majority of DDR genes were significantly activated in the INT subtype. When the expression levels of members of the TGFβ pathway and its associated factors in EMT were estimated, it was found that TGFβ pathway genes were significantly downregulated in the INT subtype, and EMT genes were differentially expressed between the two subtypes, consistent with the previous report that TGFβ attenuates the response to ICI . Because a dataset of ICI responsiveness in GC is publicly available, we sought to validate the predictive value of the COD signature for ICI treatment . After applying the signature to the transcriptome data from GC patients and dividing them into the INT and COD subtypes, it was found that the rate of ICI responsiveness was significantly higher in the INT group than the COD group (Fig. S7), indicating a possible treatment option using ICI in patients classified into the INT subtype. Considering these findings together with gene expression and mutation profiling results, we suggest that the gene expression-based COD signature reflects ICI responsiveness. However, we also indicate that more rigorous validation steps are needed, because of insufficient GC samples availability.
To more characterize the INT and COD subtypes at copy number or epigenetic alterations, we also performed copy number variation (CNV) and methylation profiling in the TCGA cohort. When exploring CNV data, we observed 842 genes had statistically significant difference in CNV between diffuse-type patients with the INT and COD subgroups. A number of important canonical pathways associated with these genes including cell adhesion molecules were found to be enriched (Fig. S8). When a methylation profiling was carried out for comparing epigenetic alterations between two subtypes, 1412 genes had statistically significant differences in methylation between the INT and COD subtypes. A function enrichment test using these genes revealed that many important canonical pathways including focal or cell adhesion, cell adhesion molecules, and ECM-receptor interaction were found to be enriched (Fig. S9), consistent with the results of gene expression and mutation profiling.
Practical utility of the IGF1 and NXPE2 proteins for classifying diffuse-type GC of the INT and COD subtypes
NXPE2 promotes cell migration and proliferation in vitro
Based on our new finding of NXPE2 characteristics, we sought to verify the effects of cell migration and proliferation of NXPE2 in diffuse-type GC cell lines. We ectopically overexpressed NXPE2 in the SNU601 and MKN45 cells, and the expression was successfully overexpressed in two cell lines using RT-PCR (Fig. S10a). Then, we performed migration assays with NXPE2 overexpressing GC and control cells. We found that ectopic NXPE2 overexpression significantly increased the migration ability in SNU601 and MKN45 cells (Two sample t tests; P = 1.48 × 10–6 and P = 0.019 in SNU601 and MKN45, respectively; Fig. S10b). We also performed cell proliferation assays with ectopic NXPE2 overexpressing GC and control cells. Although moderate proliferation of SNU601 cells was observed (at 48 h), we found that ectopic NXPE2 overexpression significantly increased the proliferation of MKN45 cells (Fig. S10c). These results suggest that NXPE2 mediates diffuse-type GC aggressiveness promoting cell migration or proliferation, which are key determinants of malignant progression and metastasis.
Diffuse-type GC is clinically heterogeneous and frequently exhibits extremely poor outcomes . Using multiple GC patient cohorts, we carried out transcriptome and mutation profiling analyses, which identified a signature of distinct prognostic subtypes of diffuse-type GC. The COD signature showed significant prognostic relevance with independent utility in relation to other pathological factors. The signature also showed therapeutic relevance in that patients with the COD subtype benefit from standard chemotherapy, while patients with the INT subtype are responsive to ICI treatment. Additionally, tissue microarray analyses revealed that IGF1 and NXPE2 might be useful for predicting different clinical behaviors of diffuse-type GC (Fig. S11).
Considerable efforts have been devoted to elucidating the molecular characteristics and establishing prognostic models of GC [4, 5, 7, 8, 10, 23, 24]. Recent advanced investigations characterizing diffuse-type GC at the proteomics level demonstrate the practical utility of specific proteins in addressing aggressive diffuse-type GC in the clinical field [4, 5, 10]. Despite these contributions, the ability to predict the clinical course of patients with diffuse-type GC remains a major clinical challenge. Through our effort to generate new transcriptome data from GC patients involving more than 100 diffuse-type GCs, we identified a molecular signature for classifying distinct prognostic subtypes of diffuse-type GC. The patients with diffuse-type GC classified as exhibiting the COD subtype potentially benefited from chemotherapy, whereas those of the INT subtype might be responsive to immunotherapy with ICI. These data underscore the importance of the molecular subtypes defined by the COD signature as a potential prognostic and predictive signature in diffuse-type GC.
Using recently updated data from the TCGA consortium, the current study revealed two distinct molecular subtypes of diffuse-type GC and several molecular features responsible for their activity. The subgroups of diffuse-type GCs showed different molecular characteristics between the INT and COD subtypes, where INT subtype included many MSI and EBV patients, whereas the COD subtype mainly included GS patients. Genes involved in the DDR were significantly activated in the INT subtype, while many EMT genes were highly activated in the COD subtype. While the response rate to standard chemotherapy in the COD subtype was significantly high, no such responsiveness was observed in the INT subtype, implying a crucial need for alternative treatment options in diffuse-type GC patients classified into the INT subtype. Through integrative gene expression and mutational analysis of diffuse-type GC, we discovered a number of distinct molecular features of the INT subtype, such as high TMB, enrichment of MSI or EBV molecular subtypes, activation of DDR genes, and inactivation of the TGFB1 pathway along with its downstream effectors related to EMT activity, indicating favorable responsiveness to ICI treatment in the INT subtype. The considerable molecular difference between the COD and INT subtypes of diffuse-type GC supports the practical utility of the COD signature in determining the clinical behavior and treatment options of diffuse-type GC patients. However, because of limited data availability in GC associated with treatment responsiveness, further validations are needed.
We also verified two proteins, IGF1 and NXPE2, as practical indicators for predicting the clinical course of diffuse-type GC. IGF1, insulin-like growth factor 1, is similar to insulin in its function and structure and is a member of a family of proteins involved in mediating growth and development. IGF1 is involved in signaling cross-talk at multiple levels with various components of the TGFβ signaling pathway, and its activity is associated with the activation of Akt, which increases cell survival, proliferation, and malignant transformation , consistent with our observations (Fig. 1; Fig. S4). IGF1 is also known as an indicator of a mesenchymal phenotype in GC , which was identified as a corresponding molecular feature of the COD subtype in the current investigation. When associations with pathological criteria were estimated, IGF1 was found to present a significant correlation with PCC, a histological subtype of GC with a poor prognosis (Fig. 4d). These results suggest that IGF1 is a good indicator for selecting high-risk diffuse-type GC patients. NXPE2, neurexophilin and PC-esterase domain family member 2, was also surveyed as a new predictive indicator. While several associations of the NXPE2 protein with inflammatory diseases such as Crohn’s disease , ulcerative colitis , and inflammatory bowel disease  have recently been reported, we identified discriminatory ability of the NXPE2 protein in classifying two distinct subtypes of diffuse-type GC, suggesting NXPE2 as a novel indicator predicting high-risk diffuse-type GC patients. Since no association with GC or druggable compounds has yet been described, more rigorous efforts to characterize NXPE2 are urgently needed.
In conclusion, we identified a signature distinguishing diffuse-type GC into molecular subtypes exhibiting different prognostic characteristics. Our results also confirmed a chemo-sensitivity and an ICI responsiveness of the molecular subtypes classified by the signature. Although our data demonstrate that the signature have significant prognostic and predictive values, a further validation study is needed to identify a limited number of biomarkers that still retain the robustness of our signature.
This research was supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (No. 2016M3C9A4922144 and 2017R1E1A1A01074883) and a grant from the KRIBB Research Initiative Program.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no competing interests.
The collection and analysis of all samples in this study was approved by the Institutional Review Board of Chungnam and Seoul National University hospitals.
Informed consent was obtained from each subject.
- 10.Mun DG, Bhin J, Kim S, Kim H, Jung JH, Jung Y, et al. Proteogenomic characterization of human early-onset gastric cancer. Cancer Cell. 2019;35(111–24):e10.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.