Medical & Biological Engineering & Computing

, Volume 57, Issue 4, pp 901–912 | Cite as

A reliable method for colorectal cancer prediction based on feature selection and support vector machine

  • Dandan Zhao
  • Hong LiuEmail author
  • Yuanjie Zheng
  • Yanlin He
  • Dianjie Lu
  • Chen Lyu


Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors.

Graphical abstract

Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.


Colorectal cancer Logistic regression Support vector machine Microbiome 


Funding information

This research is supported by the National Natural Science Foundation of China (61876102, 61472232, 61572300, 61402270, 61602286), Taishan Scholar Program of Shandong Province in China (TSHW201502038), and Natural Science Foundation of Shandong Province in China (ZR2016FB13).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Zadeh SA, Sj SMC, Mohammadi Z (2017) A novel and reliable computational intelligence system for breast cancer detection. Germ J Med Biol Eng Comp 9:1–12Google Scholar
  2. 2.
    Pal JK, Ray SS, Pal SK (2015) Identifying relevant group of miRNAs in cancer using fuzzy mutual information. Germ J Medical & Biological Engineering & Computing 54:701–710CrossRefGoogle Scholar
  3. 3.
    Chan AT, Giovannucci EL (2010) Primary prevention of colorectal cancer. J Gastroenterol 138:2029–2043CrossRefGoogle Scholar
  4. 4.
    Saleh M, Trinchieri G (2010) Innate immune mechanisms of colitis and colitis-associated colorectal cancer. N Eng J Nature Rev Immunol 11:9–20CrossRefGoogle Scholar
  5. 5.
    Brennan CA, Garrett WS (2016) Gut microbiota, inflammation, and colorectal cancer. US J Ann Rev Microbiol 70:395–411CrossRefGoogle Scholar
  6. 6.
    Chatterjee S, Dey N, Shi F, Ashour AS et al (2017) Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data. Germ J Med Biol Eng Comp:1–12Google Scholar
  7. 7.
    Ay A, Gong D, Kahveci T (2014) Network-based prediction of cancer under genetic storm. J Cancer Inform 13:15–31Google Scholar
  8. 8.
    Jung KJ, Won D, Jeon C et al (2015) A colorectal cancer prediction model using traditional and genetic risk scores in Koreans. N Eng J BMC Genet 16:1–7CrossRefGoogle Scholar
  9. 9.
    Cubiella J, Vega P, Salve M et al (2016) Development and external validation of a fecal immunochemical test-based prediction model for colorectal cancer detection in symptomatic patients. J BMC Med 14:128–140CrossRefGoogle Scholar
  10. 10.
    Coppedè F, Grossi E, Lopomo A et al (2015) Application of artificial neural networks to link genetic and environmental factors to DNA methylation in colorectal cancer. N Eng J Epigenomics 7:175–186CrossRefGoogle Scholar
  11. 11.
    Peng Y, Zhai Z, Li Z et al (2015) Role of blood tumor markers in predicting metastasis and local recurrence after curative resection of colon cancer. J Int J Clin Exp Med 8:982–990Google Scholar
  12. 12.
    Juan M, Philippe W, Nermin G et al (2016) An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighborhoods and health. US J Plos One 11:e0153778CrossRefGoogle Scholar
  13. 13.
    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. US J Mach Learn 46:389–422CrossRefGoogle Scholar
  14. 14.
    Ahmad F, Mat Isa NA, Hussain Z, Osman MK, Sulaiman SN (2015) GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. J Pattern Analysis Appl 18:861–870CrossRefGoogle Scholar
  15. 15.
    Peng S, Xu Q, Ling XB, Peng X, du W, Chen L (2003) Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. J Febs Lett 555:358–362CrossRefGoogle Scholar
  16. 16.
    Liu W, Zheng W L, Lu B L (2016) Emotion recognition using multimodal deep learningGoogle Scholar
  17. 17.
    Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. US J Inform Sci 282:111–135CrossRefGoogle Scholar
  18. 18.
    Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. N Eng J Bioinform 20:2429–2437CrossRefGoogle Scholar
  19. 19.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. J ACM Trans Intel Systems & Technol 2:1–27CrossRefGoogle Scholar
  20. 20.
    Park SI, Tae-Ho O (2016) Application of receiver operating characteristic (ROC) curve for evaluation of diagnostic test performance. J Vet Clin 33:97–108CrossRefGoogle Scholar
  21. 21.
    Kim KA, Choi JY, Yoo TK, Kim SK, Chung KS, Kim DW (2013) Mortality prediction of rats in acute hemorrhagic shock using machine learning techniques. Germ J Med Biol Eng Comp 51:1059–1067CrossRefGoogle Scholar
  22. 22.
    Chowdhury A R, Chatterjee T, Banerjee S (2018) A random forest classifier-based approach in the detection of abnormalities in the retina. Germ J Med Biol Eng Comp Available at doi:
  23. 23.
    Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia adverse effect by using SVM and naïve Bayesian approaches. Germ J Med Biol Eng Comp 54(2–3):361–369CrossRefGoogle Scholar
  24. 24.
    Zhang S, Li X, Zong M et al (2018) Efficient KNN classification with different numbers of nearest neighbors. US J IEEE Trans Neural Networks Learn Systems (99):1–12Google Scholar
  25. 25.
    Bertolaccini L, Solli P, Pardolesi A, Pasini A (2017) An overview of the use of artificial neural networks in lung cancer research. J Thorac Dis 9(4):924–931CrossRefGoogle Scholar
  26. 26.
    Siegel R, DeSantis C, Jemal A (2014) Colorectal cancer statistics, 2014. J CA: Cancer J Clin 64:104–117Google Scholar
  27. 27.
    Lee J, Meyerhardt JA, Giovannucci E, Jeon JY (2015) Association between body mass index and prognosis of colorectal cancer: a meta-analysis of prospective cohort studies. US J PloS one 10:e0120706CrossRefGoogle Scholar
  28. 28.
    Chu CM, Yao CT, Chang YT et al (2014) Gene expression profiling of colorectal tumors and normal mucosa by microarrays meta-analysis using prediction analysis of microarray, artificial neural network, classification, and regression trees. J Dis Markers 2014:459–462Google Scholar
  29. 29.
    Orang AV, Barzegari A (2014) MicroRNAs in colorectal cancer: from diagnosis to targeted therapy. Asian Pac J Cancer Prev 15:6989–6999CrossRefGoogle Scholar
  30. 30.
    Philip AK, Lubner MG, Harms B (2011) Computed tomographic colonography. J Surg Clin North Am 91:127–139CrossRefGoogle Scholar
  31. 31.
    Zhang H, Qi J, Wu YQ, Zhang P, Jiang J, Wang QX, Zhu YQ (2014) Accuracy of early detection of colorectal tumors by stool methylation markers: a meta-analysis. World J Gastroenterol 20:14040–14050CrossRefGoogle Scholar
  32. 32.
    Ip S, Sokoro AA, Kaita L, Ruiz C, McIntyre E, Singh H (2014) Use of fecal occult blood testing in hospitalized patients: results of an audit. Can J Gastroenterol Hepatol 28:489–494CrossRefGoogle Scholar
  33. 33.
    Li H, Jin Z, Li X et al (2017) Associations between single-nucleotide polymorphisms and inflammatory bowel disease-associated colorectal cancers in inflammatory bowel disease patients: a meta-analysis. J Clinical & Transl Oncol 19:1–10CrossRefGoogle Scholar
  34. 34.
    Zhang B, Liang XL, Gao HY et al (2016) Models of logistic regression analysis, support vector machine, and back-propagation neural network based on serum tumor markers in colorectal cancer diagnosis. J Genetics Mol Res 15:1–10Google Scholar
  35. 35.
    Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI, Amiot A, Bohm J, Brunetti F, Habermann N, Hercog R, Koch M, Luciani A, Mende DR, Schneider MA, Schrotz-King P, Tournigand C, Tran van Nhieu J, Yamada T, Zimmermann J, Benes V, Kloor M, Ulrich CM, von Knebel Doeberitz M, Sobhani I, Bork P (2014) Potential of fecal microbiota for early-stage detection of colorectal cancer. US J Mol Systems Biol 10:766–783CrossRefGoogle Scholar
  36. 36.
    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. N Eng J Bioinformatics 30:2114–2120CrossRefGoogle Scholar
  37. 37.
    Truong DT, Franzosa EA, Tickle EL et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. US J Nat Methods 12:902–903CrossRefGoogle Scholar
  38. 38.
    Vincent C, Manges AR (2015) Antimicrobial use, human gut microbiota and Clostridium difficile colonization and infection. J Antibiotics 4:230–253CrossRefGoogle Scholar
  39. 39.
    Endesfelder D, zu-Castell W, Ardissone A et al (2014) Compromised gut microbiota networks in children with anti-islet cell autoimmunity. US J Diabetes DB_131676 63:2006–2014Google Scholar
  40. 40.
    Gao R, Gao Z, Huang L, Qin H (2017) Gut microbiota and colorectal cancer. Eur J Eur J Clin Microbiol Infect Dis 36:1–13CrossRefGoogle Scholar
  41. 41.
    Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, Ben-Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, Suez J, Mahdi JA, Matot E, Malka G, Kosower N, Rein M, Zilberman-Schapira G, Dohnalová L, Pevsner-Fischer M, Bikovsky R, Halpern Z, Elinav E, Segal E (2015) Personalized nutrition by prediction of glycemic responses. US J Cell 163:1079–1094CrossRefGoogle Scholar
  42. 42.
    Schmid D, Leitzmann M F (2014) Television viewing and time spent sedentary in relation to cancer risk: a meta-analysis. J Natl Cancer InstitGoogle Scholar
  43. 43.
    Emmerzaal TL, Kiliaan AJ, Gustafson DR (2015) 2003-2013: a decade of body mass index, Alzheimer's disease, and dementia. J. J Alzheimers Dis 43:739–755CrossRefGoogle Scholar
  44. 44.
    Alfa-Wali M, Boniface S, Sharma A et al (2015) Metabolic syndrome (Mets) and risk of colorectal cancer (CRC): a systematic review and meta-analysis. J World J Surg Med Radiat Oncol 4:41–52Google Scholar
  45. 45.
    Sears CL, Garrett WS (2014) Microbes, microbiota, and colon cancer. US J Cell Host Microbe 15:317–328CrossRefGoogle Scholar
  46. 46.
    Zhu Q, Jin Z, Wu W, Gao R et al (2014) Analysis of the intestinal lumen microbiota in an animal model of colorectal cancer. US J PLoS One e90849Google Scholar
  47. 47.
    Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. J Expert Syst App l38:5197–5204CrossRefGoogle Scholar
  48. 48.
    Hu X, Wong KK, Young GS, Guo L, Wong ST (2011) Support vector machine multiparametric MRI identification of pseudoprogression from tumor recurrence in patients with resected glioblastoma. US J Journal of Magnetic Resonance Imaging 33:296–305CrossRefGoogle Scholar
  49. 49.
    Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia adverse effect by using SVM and naive Bayesian approaches. Germ J Medical & Biological Engineering & Computing 54:361–370CrossRefGoogle Scholar
  50. 50.
    Chen T, Cao Y, Zhang Y et al Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evidence-Based Complementray and Alternative Medicine 2013, 2013:298183–298193Google Scholar
  51. 51.
    Saccá V, Campolo M, Mirarchi D et al (2018) On the classification of EEG signal by using an SVM based algorithmGoogle Scholar
  52. 52.
    Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118CrossRefGoogle Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2018

Authors and Affiliations

  • Dandan Zhao
    • 1
    • 2
  • Hong Liu
    • 1
    • 2
    Email author
  • Yuanjie Zheng
    • 1
    • 2
  • Yanlin He
    • 1
    • 2
  • Dianjie Lu
    • 1
    • 2
  • Chen Lyu
    • 1
    • 2
  1. 1.Shandong Normal UniversitySchool of Information Science and EngineeringJinanPeople’s Republic of China
  2. 2.Shandong Provincial Key Laboratory for Novel Distributed Computer Software TechnologyJinanChina

Personalised recommendations