Machine Learning for Cancer Subtype Prediction with FSA Method
Recent research demonstrates that gene expression based cancer subtype classification has more advantages over the traditional classification. However, since this kind of data always has thousands of features, performing classification is impossible by human beings without efficient and accurate algorithms. This paper reports an empirical study that explores the problem of finding a highly-efficient and accurate machine learning method on human cancer subtype classification based on the gene expression data in cancer cells. Several machine learning algorithms are well developed to solve this kind of problems, including Naive Bayes Classifier, Support Vector Machine (SVM), Random Forest, Neural Networks. Here we generate two prediction models using SVM and Random Forest algorithms along with a feature selection approach (FSA) to predict the subtype of lung cell lines. The accuracy of the two prediction models is close with a rate of more than 90%. However, the running time of SVM is much shorter than that of Random Forest.
KeywordsMachine learning Feature selection Support Vector Machine Random Forest Cancer subtype
- 6.“what-is-cancer”. https://www.cancer.gov/about-cancer/understanding/what-is-cancer
- 9.Aruna, S., Rajagopalan, S.: A novel SVM based CSSFFS feature selection algorithm for detecting breast cancer. Int. J. Comput. Appl. 31(8), 14–20 (2011)Google Scholar