Feature Extraction and Classification of Microarray Cancer Data Using Intelligent Techniques

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 243)


Feature extraction plays an important role to improve the performance of the classifier. Microarray consists of a large amount of features with small number of samples. In this paper, we address the dimension reduction of DNA features in which relevant features are extracted among thousands of irrelevant ones through dimensionality reduction. This enhances the speed and accuracy of the classifiers. Principal component analysis (PCA) is a very powerful statistical technique to represent the d-dimensional data in a lower-dimensional space without any significant loss of information. The aim is to project the original I-dimensional space into an \( I_{0} \)-dimensional linear subspace, where \( I > I_{0} \) such that the variance in the data is maximally explained within the smaller \( I_{0} \)-dimensional space to solve the curse of dimensionality problem (where number of features are large with less samples). Support vector machine (SVM) is implemented, and its performance is measured in terms of predictive accuracy, specificity, and sensitivity. First, we implement PCA for significant feature extraction and then SVM to train the reduced feature set. In the second part, we attempt to validate our results on two public data sets (ovarian and colon).


Cancer classification Feature extraction Principal component analysis Support vector machine 


  1. 1.
    Heller, M.J.: DNA microarray technology: devices, systems, and applications. Annu. Rev. Biomed. Eng. 4(1), 129–153 (2002)CrossRefGoogle Scholar
  2. 2.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gassenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)CrossRefGoogle Scholar
  3. 3.
    Shlens, J.: A tutorial on principal component analysis. Systems Neurobiology Laboratory, University of California, San Diego (2005)Google Scholar
  4. 4.
    Jollie, I.: Principal Component Analysis. Wiley Online Library (2005)Google Scholar
  5. 5.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York, USA (1998)zbMATHGoogle Scholar
  6. 6.
    Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  7. 7.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNIT RourkelaRourkelaIndia

Personalised recommendations