Machine Learning for Cancer Subtype Prediction with FSA Method

  • Yan Liu
  • Xu-Dong Wang
  • Meikang QiuEmail author
  • Hui Zhao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11910)


Recent research demonstrates that gene expression based cancer subtype classification has more advantages over the traditional classification. However, since this kind of data always has thousands of features, performing classification is impossible by human beings without efficient and accurate algorithms. This paper reports an empirical study that explores the problem of finding a highly-efficient and accurate machine learning method on human cancer subtype classification based on the gene expression data in cancer cells. Several machine learning algorithms are well developed to solve this kind of problems, including Naive Bayes Classifier, Support Vector Machine (SVM), Random Forest, Neural Networks. Here we generate two prediction models using SVM and Random Forest algorithms along with a feature selection approach (FSA) to predict the subtype of lung cell lines. The accuracy of the two prediction models is close with a rate of more than 90%. However, the running time of SVM is much shorter than that of Random Forest.


Machine learning Feature selection Support Vector Machine Random Forest Cancer subtype 


  1. 1.
    Samuel, A.: Some studies in machine learning using the game of checkers. ii—recent progress. IBM J. Res. Dev. 11, 601–617 (1967)CrossRefGoogle Scholar
  2. 2.
    Kourou, K., Exarchos, T., Exarchos, K., Karamouzis, M., Fotiadis, D.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)CrossRefGoogle Scholar
  3. 3.
    Zemouri, R., Zerhouni, N., Racoceanu, D.: Deep learning in the biomedical applications: recent and future status. Appl. Sci. 9, 1526 (2019)CrossRefGoogle Scholar
  4. 4.
    Gulshan, V., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016)CrossRefGoogle Scholar
  5. 5.
    Inamura, K.: Lung cancer: understanding its molecular pathology and the 2015 WHO classification. Front Oncol. 7, 193 (2017)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Jiang, L., Xiao, Y., Ding, Y., Tang, J., Guo, F.: Discovering cancer subtypes via an accurate fusion strategy on multiple profile data. Front. Genet. 10, 20 (2019)CrossRefGoogle Scholar
  8. 8.
    Wu, M., et al.: Prediction of molecular subtypes of breast cancer using BI-RADS features based on a “white box” machine learning approach in a multi-modal imaging setting. Eur. J. Radiol. 114, 175–184 (2019)CrossRefGoogle Scholar
  9. 9.
    Aruna, S., Rajagopalan, S.: A novel SVM based CSSFFS feature selection algorithm for detecting breast cancer. Int. J. Comput. Appl. 31(8), 14–20 (2011)Google Scholar
  10. 10.
    de Souto, M., Costa, I., de Araujo, D., Ludermir, T., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinf. 9, 497 (2008)CrossRefGoogle Scholar
  11. 11.
    Kakushadze, Z., Yu, W.: *K-means and cluster models for cancer signatures. Biomol. Detect. Quantification 13, 7–31 (2017)CrossRefGoogle Scholar
  12. 12.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Wang, X., et al.: Subtype-specific secretomic characterization of pulmonary neuroendocrine tumor cells. Nat. Commun. 10, 3201 (2019)CrossRefGoogle Scholar
  14. 14.
    Borromeo, M., et al.: ASCL1 and NEUROD1 reveal heterogeneity in pulmonary neuroendocrine tumors and regulate distinct genetic programs. Cell Rep. 16, 1259–1272 (2016)CrossRefGoogle Scholar
  15. 15.
    Augustyn, A., et al.: ASCL1 is a lineage oncogene providing therapeutic targets for high-grade neuroendocrine lung cancers. Proc. Natl. Acad. Sci. U.S.A. 111, 14788–14793 (2014)CrossRefGoogle Scholar
  16. 16.
    Liu, S., et al.: Feature selection of gene expression data for cancer classification using double RBF-kernels. BMC Bioinformatics 19, 396 (2018)CrossRefGoogle Scholar
  17. 17.
    Chen, H., Zhang, Y., Gutman, I.: A kernel-based clustering method for gene selection with gene expression data. J. Biomed. Inf. 62, 12–20 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceTexas A&M University CommerceCommerceUSA
  2. 2.Department of Radiation OncologyUT Southwestern Medical CenterDallasUSA
  3. 3.Department of BiochemistryUT Southwestern Medical CenterDallasUSA
  4. 4.School of SoftwareHenan UniversityKaifengChina

Personalised recommendations