Molecular subtyping improves prognostication of Stage 2 colorectal cancer
Post-surgical staging is the mainstay of prognostic stratification for colorectal cancer (CRC). Here, we compare TNM staging to consensus molecular subtyping (CMS) and assess the value of subtyping in addition to stratification by TNM.
Three hundred and eight treatment-naïve colorectal tumours were accessed from our institutional tissue bank. CMS typing was carried out using tumour gene-expression data. Post-surgical TNM-staging and CMS were analysed with respect to clinicopathologic variables and patient outcome.
CMS alone was not associated with survival, while TNM stage significantly explained mortality. Addition of CMS to TNM-stratified tumours showed a prognostic effect in stage 2 tumours; CMS3 tumours had a significantly lower overall survival (P = 0.006). Stage 2 patients with a good prognosis showed immune activation and up-regulation of tumour suppressor genes.
Although stratification using CMS does not outperform TNM staging as a prognostic indicator, gene-expression based subtyping shows promise for improved prognostication in stage 2 CRC.
KeywordsColorectal cancer Molecular classification Prognostication Staging Gene expression
Colorectal cancer (CRC) is a highly heterogeneous disease with different outcomes, therapeutic response and clinical behavior, even within the same clinically designated staging and grading systems. Recent attention has been focused on the molecular mechanisms underlying CRC, and developing novel approaches to stratify tumours into subgroups that have clinical utility for improved prognostication and targeted therapy .
Previous molecular classification systems have relied on combinations of molecular features, including BRAF, KRAS and TP53 mutation status, microsatellite instability, CpG island methylator phenotype, somatic copy number alterations, and activation of various molecular pathways such as WNT and MYC, in order to classify CRC into subgroups [2, 3, 4, 5]. Associations with clinical and histological features and outcome have been reported using these subtyping classification systems, and many show overlapping characteristics. However, discrepancies exist between these classification systems, and they have not been widely embraced in the clinical setting for CRC, due to technological limitations and cost of undertaking the necessary laboratory tests, as well as the question of the impact of tumour heterogeneity on results [6, 7].
The advent of high-throughput sequencing has ushered in a new era of subtyping studies, and in 2015, the Colorectal Cancer Subtyping Consortium published a novel classification system based solely on gene expression data that drew largely on six previously published molecular classification systems . The introduced classifier stratifies CRC into four consensus molecular subtypes (CMS). In particular, the value of using CMS for prognostication and precision medicine has been highlighted in several recent publications . However, in the absence of targeted therapy regimens for primary CRC, the value of stratifying tumours using CMS as a prognostic tool has yet to be evaluated.
In order to assess the utility of CMS as a prognostic tool for CRC, we have evaluated molecular subtyping in relation to clinicopathologic features and patient outcome in a large single-institution cohort of treatment-naïve CRC tumours, and compared the findings to that of standard histological classification.
Colorectal cancer (CRC) tissue samples banked at the Cancer Society Tissue Bank (University of Otago, Christchurch, New Zealand), with informed written consent were used in this study. Patient data, including staging, recurrence, metastases, treatment and histology was retrospectively collected from patient medical records. Exclusion criteria included patients with hereditary CRC, and patients who had received pre-operative chemotherapy or radiation therapy, resulting in a cohort of 308 patients. This study was undertaken with ethical approval from the University of Otago Human Ethics Committee (ethics approval number: H16/037).
Tumour core samples were dissected from surgical specimens and immediately frozen in liquid nitrogen and initially stored at − 80 °C. RNA was extraction was carried out as detailed previously . Briefly, RNA was extracted from < 20 mg of tissue using RNEasy Plus Mini Kit (Qiagen, Hilden, Germany), including DNAse treatment, following tissue disruption using a Retsch Mixer Mill. Purified RNA was quantified using the NanoDrop 2000c spectrophotometer (Thermo Scientific, Asheville, NC, USA), and stored at − 80 °C.
RNA-sequencing was performed using the Illumina HiSeq 2500 V4 platform (Illumina, San Diego, CA, USA) to produce 125 bp paired end reads, as previously described . In brief, library preparation, including ribosomal RNA depletion using RiboZero Gold, was carried out using Illumina TruSeq V2 reagents. The libraries were sequenced on 3 × 5 lanes of an Illumina HiSeq 2500 instrument.
RNA sequencing data processing
Low quality read segments, remnant adaptor sequences and very short reads were subsequently removed using fastq-mcf from ea-utils (v 1·1·2·779 ,) and SolexaQA++ with default parameters (v3·1·7·1 ,). Salmon (v0·11·2 ,) was used to quantify transcript expression of GRCh38 (Ensembl release 93). Gene-level tags-per-million (TPM) counts were derived using the tximport package (v1·6·0 ,). The data processing protocols can be accessed at https://gitlab.com/schmeierlab/crc/crc-nz-2018.
Consensus molecular subtyping
The Single Sample Predictor (SSP) method available in the CMS classifier (v1·0·0, https://www.synapse.org/#!Synapse:syn4961785) R package  was used to classify samples into molecular subtypes of colorectal cancer based on derived TPM values for genes. Similarity between gene expression profiles is calculated as Pearson’s correlation of log2 scaled values, and a sample is considered to be similar to a centroid if the correlation is at least 0·15. In order for a sample to be classified, a correlation to the most similar centroid has to be higher by 0·06 than a correlation to the second most similar centroid. These values are set by default in the SSP method. The data processing protocols can be accessed at https://gitlab.com/schmeierlab/crc/crc-nz-2018.
Differential gene expression analysis
We used count tables from the RNA sequencing data processing, in addition to the class information produced through the CMS classification step, and ran a differential gene expression analysis for all genes. We used each sample within a given subtype as a replicate of that subtype, and ran the edgeR (v3·20·7 ,) package to compare each subtype against each other, extracted genes that are up- or down-regulated, using a Benjamini and Hochberg false-discovery rate  adjusted P-value (< 0·05), and a log2 fold-change greater or smaller than zero. Only genes were considered that were differentially expressed in all comparisons for a subtype against all other subtypes. The data processing protocols can be accessed at https://gitlab.com/schmeierlab/crc/crc-nz-2018.
We used the differentially expressed genes per CMS subtype, and the type of expression (up- or down-regulation) and sub-selected the top 500 genes and input to the clusterProfiler package (v3·6·0 ,) for term enrichment analysis. We sub-selected terms based on an FDR adjusted P-value < 0·1 for enrichment of the gene-set in biological categories. The biological categories and corresponding gene-sets used in the analysis were extracted from MSigDB  (version 6·1). We sub-selected the following categories for the analysis: KEGG, REACTOME, BIOCARTA, PID, HALLMARK GENES, and Gene Ontology (GO) biological processes. The data processing protocols can be accessed at https://gitlab.com/schmeierlab/crc/crc-nz-2018.
Associations between classifiers and clinicopathological variables were assessed by Chi-square tests or Fisher’s exact test, if there were expected cell counts less than one, or if 80% of cells had counts less than five. For tables larger than two by two, P-values were computed with Monte-Carlo simulation. Kaplan-Meier survival curves were calculated using 5- and 10-year estimates of survival for overall survival (OS) and progression-free survival (PFS). Association of classifiers with OS and PFS progression was assessed using Cox proportional hazard models with and without clinicopathological covariates. Significance of multilevel factors was assessed with Chi-squared tests on analysis of deviance. Examination of residual plots showed that modelling assumptions were valid. All statistical tests were 2-sided and considered significant at a P-value of 0·05, and all analysis was performed in R 3·4·3 (R foundation for statistical computing, Vienna, Austria).
Colorectal cancers from 308 patients were included (median age 73·7 years; range, 28–91 years). One hundred and sixty-three patients were female and 145 were male, with 296 of the patients of European decent, three Asian and nine Maori. Right-sided tumours were more common (55%) compared to left-sided colon tumours (27%) and rectal tumours made up 18%. Median follow up was 50 months (range, 0·3–172 months). More detailed patient demographics are shown in Additional file 1: Table S1.
Association of TNM staging with clinical variables
Tumour recurrence, metastasis and lymph-node invasion by post-operative stage and by Consensus Molecular Subtype
Mets + LR %
Histological characteristics of colorectal cancer tumours by post-operative stage and by Consensus Molecular Subtype
Poorly diff %
CMS subtypes and clinical variables
Patient and tumour characteristics by Consensus Molecular Subtype
CMS1 (n = 60)
CMS2 (n = 145)
CMS3 (n = 38)
CMS4 (n = 17)
10-year overall survival based on TNM staging showed that, when adjusted for age and gender, Stage 1 and 2 show little difference in survival outcome. However, there is some evidence that Stage 3 is associated with increased mortality, while it is quite clear that Stage 4 is associated with increased mortality (OR = 2·8, 95% CI 1·6–5·0, P < 0·0005).
Hazard ratios for risk factors associated with mortality in colorectal cancer
HR (95% CI)
Lymph node involvement
Of the 308 patients, 63 patients had relapse of their disease; either local recurrence or distant metastases or both within the follow-up period. There was no significant difference in the median survival after relapse which was 16·5 months, 12·4 months, 33·9 months and 4·6 months for CMS1, CMS2, CMS3 and CMS4 tumours respectively (P = 0·187).
Prognostic effect of CMS in CRC stratified by TNM stage
Survival time by CMS stratified by TNM stage
Differential gene expression and gene-set enrichment analysis in stage 2 tumours
Differentially expressed genes between Stage 2 patients who died and those who were alive at the end of the follow-up period, were further analysed to identify genes potentially associated with survival in Stage 2 tumours. Differentially up-regulated genes strongly associated with survival in this patient group includes immune-cell related genes, in particular genes coding for B-cell markers, and several known (LRRC4, PKNOX2, FEZF2) and putative tumour suppressor genes (MTO18B, NCAM1 and SCN4B) (Additional file 2: Table S3). Genes that were significantly up-regulated in patients with poor survival included pro-inflammatory genes (IL17REL, RETNLB) and genes that have been previously associated with progression and poor outcome in CRC (ERBB2, TBLRXR1, TAPBP, CPS1, AGR2) (Additional file 3: Table S4).
In order to compare biologic pathways and processes potentially associated with survival in this subgroup of patients, we used DEGs as input into an assortment of gene ontology tools (Additional file 1: Figure S1). Differentially upregulated biologic pathways associated with survival were predominantly immune pathways, including B-cell activation, IL-12 and PD-1 signalling and T-cell activation. In addition, glutamatergic signalling was differentially enriched in Stage 2 patients still alive at the end of follow-up (Additional file 4: Table S5). GSEA showed an enrichment of pathways involved in metabolic regulation in Stage 2 tumours, which reflects the association of CMS3 with poor survival, and also differential up-regulation of processes involved in protein and nucleic acid synthesis (Additional file 5: Table S6).
Recent advances in gene expression analysis have culminated in the publication of a Consensus Molecular Subtyping (CMS) system by the Colorectal Cancer Subtyping Consortium (CRCSC) that stratifies CRC into one of four subtypes based on transcriptional profiling. The CRCSC study reported an association between CMS4 and worse patient outcome, and between CMS1 and survival after relapse. Although many subsequent reports mention the prognostic potential of CMS, no study, to date, has validated the prognostic impact of the subtyping system compared to the routinely used staging for primary CRC.
In a large, single-institution cohort of chemotherapy-naïve, surgically treated colorectal cancers, we have shown that traditional TNM staging outperforms molecular subtyping in prognostication of CRC. Post-surgical staging of this cohort was carried out according to UICC guidelines and staging was similar to that expected of a treatment-naïve cohort. Association of stage with clinical variables found few associations beyond the parameters used to carry out staging, namely lymph-node involvement and distant metastases. As previously reported for other cohorts [18, 19], the association of increasing stage with metastasis was also largely driven by liver metastases.
In addition to histological staging, we carried out consensus molecular subtyping (CMS), based on RNA-sequencing derived gene-expression profiles from tumour tissue. Stratification into CMS yielded similar proportions of CMS1 and CMS3 and unclassified tumours as described by the CRCSC . Our cohort contained a considerably greater proportion of CMS2 tumours at 47%, compared to 37% reported by CRCSC, and fewer CMS4 tumours, 6% compared to 23%. The difference in reported proportions of CMS may be, at least in part, accounted for by the inclusion criteria of surgery with curative intent and the exclusion of patients who received neo-adjuvant chemo- or radiotherapy in this study. A recent report by Trumpi et al. reported that neoadjuvant therapy induces a mesenchymal phenotype in residual tumour cells and, as such, may lead to an increase in the reporting of CMS4 subtypes . These criteria may have excluded many advanced-stage tumours, which were shown to be associated with CMS4 . Intra-tumoural heterogeneity may also affect the classification of CMS4 tumours, as the EMT-associated genes seen in CMS4 tumours may reflect upregulated genes derived from fibroblast and mesenchymal cells present in the stromal background rather than directly from the tumour itself [9, 10, 21, 22], and several studies have suggested that the location and number of tumour biopsies can undermine the accuracy of CMS [23, 24, 25]; a limitation of this study is the use of a single tumour sample to carry out gene-expression profiling.
Stratification into CMS showed similar associations with clinic-pathological variables as previously reported by CRCSC and other studies. CMS1 tumours were associated with right-side, female, node-negative and poorly-differentiated, with a high proportion of mucinous histology and less likely to be seen in younger patients under 60 years of age. CMS2 tumours made up nearly half of our cohort and were predominantly left-sided tumours found in male patients, and showed a negative association with mucinous type. Patients with CMS3 type tumours were associated with a lower TNM stage. CMS4 tumours were associated with younger age, rectal tumours and presented at an advanced TNM stage with lymph node positivity. Indeed, lymph-node positivity is shown to increase through CMS1, 2, 3 to CMS4.
The established association between increasing tumour stage and poorer outcome was recapitulated in our cohort, in terms of progression-free and overall survival. While post-surgical staging is the mainstay of prognostication in most clinical centres, the potential for refining prognostication using molecular features has been widely investigated, and the combined use of different clinical and molecular markers have shown links with prognosis in CRC, e.g. while BRAF mutations have been associated with poorer outcome , the effects of these mutations may be mitigated in MSI tumours . KRAS mutations are also associated with a poorer outcome, but this association is stronger in distal compared to proximal tumours .
The original study by Guinney et al. first describing consensus molecular subtyping showed an association between CMS4 and poor overall survival, and between CMS1 and survival after relapse . Several studies have incorporated CMS with other molecular features to in order to refine prognostic groups, and have described poorer outcomes in BRAF-mutated CMS1 MSS tumours, and KRAS-mutated CMS2/3 MSS tumours , and favourable outcomes in CMS1 MSI tumours . Although many subsequent publications have emphasised the prognostic importance of CMS, the utility of CMS as a stand-alone prognostic tool in the clinical setting has not been investigated in an independent cohort. Survival analysis showed an association between CMS and both progression-free and overall survival in our cohort, and this was largely due to the difference between CMS4 and the other CMS classes. However, after adjusting for age and sex, CMS4 was not an independent prognostic marker for survival in this study. Including both TNM stage and CMS in models of overall survival shows that TNM significantly explains mortality independently of age and gender, whereas CMS does not. A potential limitation of the study is the relatively low numbers of CMS4 tumours, as discussed above, and that almost half of the tumours in our cohort are CMS2, and this imbalance may affect the power of our study to detect effects specific to CMS1, 3 and 4.
Clinical management of CRC is usually based on histological staging, with stage 1 tumours conservatively managed with surgery and tumours with nodal or distant metastases (Stage 3 and 4) usually treated with adjuvant chemotherapy. Stage 2 tumours remain a conundrum in terms of prognostication, as approximately 20% of patients with Stage 2 CRC die from the disease . Various factors including acute presentation with obstruction and perforation, histological factors such as perineural and perivascular invasion, as well as high grade, have been used as markers of poor prognosis, and as such indicators for adjunctive postoperative chemotherapy. Further stratification using molecular markers, such as BRAF and KRAS mutations, and MSI status  have been investigated with regard to their prognostic potential in this tumour group, but have not widely adopted to direct clinical management.
Molecular subtyping is a cornerstone of precision medicine in cancer treatment, and the mutation status of genes in the EGFR pathways, including RAS genes, PIK3CA, PTEN and BRAF have been shown to predict response to EGFR blockade therapy in CRC . MSI status and the effect of the tumour microenvironment, in particular the amount and type of tumour infiltrating lymphocytes, have more recently been proposed as predictors of response to immunotherapy . To date, although CMS1 tumours encompass a large proportion of MSI positive CRC, no targeted treatment options based solely on CMS have been proposed. In the context of metastatic CRC, CMS appears to associate with survival in clinical trials of patients with wild-type KRAS tumours, treated with anti-EGFR or VEGF inhibitors [35, 36]. However, in primary CRC and outside the clinical trials setting, improved stratification of CRC had not yet been demonstrated using CMS. In this study, we have observed that subtyping of TNM-stratified tumours into CMS could improve prognostication for Stage 2 CRC; tumours that were CMS3 subtype had significantly lower overall survival compared to other molecular subtypes. This demonstrates, for the first time, the potential utility of CMS in improving prognostication of CRC in combination with existing methods.
Differential gene expression between Stage 2 patients who died and those who were alive at the end of the follow-up period, identified significant up-regulation of immune-related genes and biologic processes, and tumour-suppressor genes, associated with survival. The importance of the immune microenvironment in tumour progression has been demonstrated in solid tumours, and has been linked to outcome in CRC. Our findings suggest that immune signatures may identify Stage 2 patients with good prognosis, for whom surgery alone may suffice, and conversely a CMS3 signature may identify patients who would benefit from adjuvant chemotherapy/increased surveillance. Further evaluations of the genetic signatures identified in this study, and prospective validation using a more clinically accessible platform e.g. gene panel test, will be necessary to confirm these findings.
Although stratification using CMS does not outperform TNM staging as a prognostic indicator in our cohort, it currently represents the best description of tumour heterogeneity in colorectal cancer at the level of gene expression, and shows promise for future advancement of precision medicine. Our findings also suggest the use of CMS in refining prognostication in the clinically heterogenous Stage 2 colorectal cancer.
The authors would like to thank Helen Morrin at the Cancer Society Tissue Bank, Christchurch, and the patients involved for generously participating in this study.
RP carried out sequencing preparation of tumour samples, and was a major contributor to study design and manuscript writing. SS carried out bioinformatics analysis and preparation of figures and contributed to manuscript writing. YL was involved in collated patient data and statistical analysis. JP was involved in study design, bioinformatics and data analysis, and manuscript preparation. FF was involved in study design and clinical aspects of the study. All authors read and approved the final manuscript.
Maurice and Phyllis Paykel Trust.
Gut Cancer Foundation (NZ), with support from the Hugh Green Foundation.
Colorectal Surgical Society of Australia and New Zealand (CSSANZ).
The Health Research Council of New Zealand.
The funding bodies had no role in the design of the study or collection, analysis, and interpretation of data or in writing the manuscript.
Ethics approval and consent to participate
Ethical approval was granted by the University of Otago Human Ethics Committee (ethics approval number: H16/037). Study participants gave informed written consent, and the study was performed in accordance with the Declaration of Helsinki.
The authors declare that they have no competing interests.
- 11.Aronesty E. ea-utils: “Command-line tools for processing biological sequencing data”; 2011.Google Scholar
- 25.Dunne PD, McArt DG, Bradley CA, O'Reilly PG, Barrett HL, Cummins R, et al. Challenging the Cancer molecular stratification dogma: Intratumoral heterogeneity undermines consensus molecular subtypes and potential diagnostic value in colorectal Cancer. Clin Cancer Res. 2016;22(16):4095–104.CrossRefGoogle Scholar
- 33.Sepulveda AR, Hamilton SR, Allegra CJ, Grody W, Cushman-Vokoun AM, Funkhouser WK, et al. Molecular biomarkers for the evaluation of colorectal Cancer: guideline from the American Society for Clinical Pathology, College of American Pathologists, Association for Molecular Pathology, and American Society of Clinical Oncology. Arch Pathol Lab Med. 2017;141(5):625–57.CrossRefGoogle Scholar
- 36.Mooi JK, Wirapati P, Asher R, Lee CK, Savas P, Price TJ, et al. The prognostic impact of consensus molecular subtypes (CMS) and its predictive effects for bevacizumab benefit in metastatic colorectal cancer: molecular analysis of the AGITG MAX clinical trial. Ann Oncol. 2018;29(11):2240–6.PubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.