Biomarker Identification in Colorectal Cancer Using Subnetwork Analysis with Feature Selection
- 9 Downloads
Gene Sub-Network-based Feature Selection (GSNFS) is an efficient method for handling case-control and multiclass studies for gene sub-network biomarker identification by an integrated analysis of gene expression, gene-set and network data. However, GSNFS has produce considerably high number of sub-network and has not assessed the importance of each sub-network. Recently, we have incorporated 2 feature selection techniques; correlation-based and information gain into the GSNFS workflow to help reduce the number and assess the importance of each individual sub-network. The extended GSNFS method was clearly shown to identify good candidate gene subnetwork markers in lung cancer. In this work, we applied a similar work flow to colorectal cancer. First, the top- and bottom- 5 ranked gene-sets were selected and investigated the classification performance. Similarly, the top-ranked gene-sets showed a better performance than the bottom-ranked gene-sets. The identified top-ranked gene-sets such as TNF-beta and MAPK signaling pathway were known to relate to cancer. In addition, the characteristic of top identified pathway network was further analyzed and visualized. SMAD3, a gene that was reported to be related to cancer by many studies, was mostly found to have the highest neighbor in 4 datasets. The results in this study has confirmed that GSNFS combined with feature selection is very promising as significantly fewer subnetworks were needed to build a classifier and gave a comparable performance to a full dataset classifier.
KeywordsGene expression analysis Gene-set Classification Colorectal cancer Correlation-based feature selection Information gain feature selection
The first author would like to acknowledge the graduate scholarship from the Department of Chemical Engineering, KMUTT for funding of his Master study.
- 5.Doungpan, N., Engchuan, W., Chan, J.H., Meechai, A.: GSNFS: gene subnetwork biomarker identification of lung cancer expression data. BMC Med. Genomics 9(S3) (2016). https://doi.org/10.1186/s12920-016-0231-4
- 6.Chan, J.H., Sootanan, P., Larpeampaisarl, P.: Feature selection of pathway markers for microarray-based disease classification using negatively correlated feature sets. In: The 2011 International Joint Conference on Neural Networks, pp. 3293–3299 (2011). https://doi.org/10.1109/ijcnn.2011.6033658
- 7.Kozuevanich S., Meechai A., Chan J.H.: Feature selection in GSNFS-based marker identification. In: The 10th International Conference on Computational Systems-Biology and Bioinformatics (CSBio 2019). (2019). https://doi.org/10.1145/3365953.3365964
- 12.Hong, Y., Downey, T., Eu, K.W., Koh, P.K., Cheah, P.Y.: A “metastasis-prone” signature for early-stage mismatch-repair proficient sporadic colorectal cancer patients and its implications for possible therapeutics. Clin. Exp. Metas. 27(2), 83–90 (2010). https://doi.org/10.1007/s10585-010-9305-4CrossRefGoogle Scholar
- 13.Khamas, A., Ishikawa, T., Shimokawa, K., Mogushi, K., et al.: Screening for epigenetically masked genes in colorectal cancer using 5-Aza-2’-deoxycytidine, microarray and gene expression profile. Cancer Genomics Proteomics 9(2), 67–75 (2012). PMID: 22399497Google Scholar