Review on Feature Selection Methods for Gene Expression Data Classification

  • Talal AlmutiriEmail author
  • Faisal Saeed
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1073)


Microarray technology makes it easier for scientists to rapidly measure thousands of gene’s expression levels. By analyzing these data, we can find the altered genes, thereby facilitating easy diagnosis and classification of the genetic-related diseases. However, predicting and identifying cancer types is a great challenge in the medical field. Gene expression microarray contains information that can help in this regard, but microarray data have high dimensionality problem which means a large number of genes or features and a small number of samples, also there are redundant and irrelevant features that increase the challenge of microarray analysis. This study reviewed recent studies about methods, algorithms, and limitations of feature selection for microarray gene expression classification. This study compared and focused on four aspects for each related study: datasets, feature selection methods, classifiers, and accuracy results. Feature selection methods are considered as a pre-processing step which plays a vital role in the effectiveness of a classification. This paper showed that applying filter methods such as t-Test, Pearson’s Correlation Coefficient (PCC), and Bhattacharyya distance eliminate irrelevant features that help to increase classification performance and accuracy. Therefore, applying wrapper or embedded methods such as Genetic Algorithm (GA) without applying filter methods in advance could affect the effectiveness of a classification negatively.


Cancer classification Gene expression Feature selection Microarray data 


  1. 1.
    Miko, I., LeJeune, L.: Essentials of Genetics. NPG Education, Cambridge (2009)Google Scholar
  2. 2.
    Khurana, S., Singh, M.: Biotechnology: Principles and Process, 12th edn. Studium Press LLC, Houston (2015)Google Scholar
  3. 3.
    Difference between DNA and Genes. Accessed 20 May 2019
  4. 4.
    Matilainen, M.: Identification and characterization of target genes of the nuclear receptors VDR and PPARs. Doctoral dissertation, University of Kuopio, Finland (2007)Google Scholar
  5. 5.
    Gene editing: a molecular miracle. Accessed 20 May 2019
  6. 6.
    Babu, M., Sarkar, K.: A comparative study of gene selection methods for cancer classification using microarray data. In: Second International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, pp. 204–211. IEEE (2016).
  7. 7.
    Srivastava, S., Joshi, N., Gaur, M.: A review paper on feature selection methodologies and their applications. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 14(5), 78–81 (2014)Google Scholar
  8. 8.
    Plunkett, J.: Plunkett’s Biotech and Genetics Industry Almanac. Plunkett Research Ltd., Houston (2006)Google Scholar
  9. 9.
    Bustin, S., Benes, V., Garson, J., Hellemans, J., Huggett, J., Kubista, M., Vandesompele, J.: The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 55(4), 611–622 (2009)CrossRefGoogle Scholar
  10. 10.
    Lefever, S., Hellemans, J., Pattyn, F., Przybylski, D., Taylor, C., Geurts, R.: RDML: structured language and reporting guidelines for real-time quantitative PCR data. Nucleic Acids Res. 37(7), 2065–2069 (2009)CrossRefGoogle Scholar
  11. 11.
    Vandesompele, J., De Preter, K., Pattyn, F., Poppe, B., Van Roy, N., De Paepe, A., Speleman, F.: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3(7), 1–12 (2002)CrossRefGoogle Scholar
  12. 12.
    Zhong, W.: Feature selection for cancer classification using microarray gene expression data. Doctoral dissertation, University of Calgary, Canada (2014)Google Scholar
  13. 13.
    Mwadulo, M.: A review on feature selection methods for classification tasks. Int. J. Comput. Appl. Technol. Res. 5(6), 395–402 (2016)Google Scholar
  14. 14.
    Mahmoud, A., Maher, B.: A hybrid reduction approach for enhancing cancer classification of microarray data. Int. J. Adv. Res. Artif. Intell. (IJARAI) 3(10), 1–10 (2014)Google Scholar
  15. 15.
    Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256(2017), 56–62 (2017)CrossRefGoogle Scholar
  16. 16.
    Zhong, W., Lu, X., Wu, J.: Feature selection for cancer classification using microarray gene expression data. Biostat. 03 Biom. Open Acc. J. 1(2), 1–7 (2017)Google Scholar
  17. 17.
    Hameed, S., Petinrina, O., Hashi, A., Saeed, F.: Filter-wrapper combination and embedded feature selection for gene expression data. Int. J. Adv. Soft Comput. Appl. 10(1), 91–105 (2018)Google Scholar
  18. 18.
    Hameed, S., Muhammad, F., Hassan, R., Saeed, F.: Gene selection and classification in microarray datasets using a hybrid approach of PCC-BPSO/GA with multi classifiers. J. Comput. Sci. (JCS) 14(6), 868–880 (2018)CrossRefGoogle Scholar
  19. 19.
    Uma, S.: A hybridization of genetic – firefly algorithm technique for feature selection in micro array gene data. Int. J. Innov. Adv. Comput. Sci. IJIACS 7(4), 70–81 (2018)Google Scholar
  20. 20.
    Liu, S., Xu, C., Zhang, Y., Liu, J., Yu, B., Liu, X., Dehmer, M.: Feature selection of gene expression data for cancer classification using double RBF-kernels. BMC Bioinform. 19(1), 396 (2018)CrossRefGoogle Scholar
  21. 21.
    Khan, Z., Naeem, M., Khalil, U., Khan, D., Aldahmani, S., Hamraz, M.: Feature selection for binary classification within functional genomics experiments via interquartile range and clustering. IEEE Access 7(1), 78159–78169 (2019)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.College of Computer Science and EngineeringTaibah UniversityMedinaSaudi Arabia

Personalised recommendations