A Study on Wrapper-Based Feature Selection Algorithm for Leukemia Dataset

  • M. J. Abinash
  • V. Vasudevan
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 695)


In many fields, big data play a predominant role such as in research, business, biological science, and many other fields in our day-to-day activities. It is mainly the voluminous amount of structured, semi-structured, and unstructured data. It is the base from the data mining. So the knowledge discovery using this data is very difficult. Bioinformatics is an interdisciplinary of biology and information technology; the gene expression or the microarray data are analyzed using some softwares. These gene data are grown higher and higher, so the analyze and the classification are more difficult among these growing big data. So we focus on analyzing these data for cancer classification. The proposed work discusses the SVM-based wrapper feature selection for cancer classification. The cancer dataset are applied in two feature selection algorithms, and among them, the wrapper-based SVM method is made best for feature selection for cancer classification.


Gene Feature selection Support vector machines (SVM) 


  1. 1.
    Tuimala, J., Laine, M.: DNA Microarray Data Analysis, 2nd edn. PicasetOy, Helsinki (2005)Google Scholar
  2. 2.
    Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and Prognosis. Cancer Informat (2006)Google Scholar
  3. 3.
    Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays a multiple random validation strategy. Lancet 365, 488–492 (2005)Google Scholar
  4. 4.
    Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to data mining (2006)Google Scholar
  5. 5.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)Google Scholar
  6. 6.
    Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification, pp. 547–53 (1999)Google Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification, 2nd edn. Wiley, New York (2001)Google Scholar
  8. 8.
    Karplus, A.: Machine learning algorithms for cancer diagnosis. Santa Cruz County Science Fair (2012)Google Scholar
  9. 9.
    Kim, W., Kim, K.S., Lee, J.E., Noh, D.Y., Kim, S.W., Jung, Y.S.: Development of novel breast cancer recurrence prediction model using support vector machine. J Breast Cancer 15, 230–238 (2012)Google Scholar
  10. 10.
    Salem, H., Attiya, G., EI Fishway, N.: Classification of human cancer diseases by gene expression profiles. Appl. Soft Comput. 50, 124–134 (2016)Google Scholar
  11. 11.
    Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23, 89–109 (2001)Google Scholar
  12. 12.
    Feldman, B., Martin, E.M., Skotnes, T.: Big data in healthcare hype and hope. Dr. Bonnie 360 (2012).
  13. 13.
    IBM: Large Gene interaction Analytics at University at Buffalo, SUNY (2012).
  14. 14.
    Samb, M.L., Camara, F., Ndiaye, S., Slimani, Y., Esseghir, M.A.: A novel RFE-SVM-based feature selection approach for classification. Int. J. Adv. Sci. Technol. 43 (2012)Google Scholar
  15. 15.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)Google Scholar
  16. 16.
    Hernandez, J.C.H., Duval, B., Hao, J.K.: SVM based local search for gene selection and classification of microarray data, in BIRD, pp. 99–508 (2008)Google Scholar
  17. 17.
    Raza, M., Gondal, I., Green, D., Coppel, R.L.: Feature selection and classification of gene expression profile in hereditary breast cancer. In: Hybrid Intelligent Systems, Fourth International Conference on Kitakyushu, Japan (2004)Google Scholar
  18. 18.
    Gautam, A.: An improved mammogram classification approach using back propagation neural network. In: Data Engineering and Intelligent Computing, pp. 369–376. Springer, Singapore (2018)Google Scholar
  19. 19.
    Cui, Y., Jin, J.S., Zhang, S., Luo, S., Tian, Q.: Correlation-based feature selection and regression. Part I, LNCS 6297, pp. 25–35. Springer-Verlag Berlin Heidelberg (2010)Google Scholar
  20. 20.
    Uci machine learning repository.

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Information TechnologyKalasalingam UniversityVirudhunagarIndia

Personalised recommendations