Analysis of Classification Methods for Gene Expression Data

  • Lamiaa Zakaria
  • Hala M. EbeidEmail author
  • Sayed Dahshan
  • Mohamed F. Tolba
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 921)


The discovery of diseases at a molecular level is a great challenge for researchers in the field of bioinformatics and cancer classification. Understanding the genes that contribute to the cancer malady is a great challenge to many researchers. Cancer classification based on the molecular level investigation has gained the interest of researches as it provides a systematic, accurate and objective diagnosis for different cancer types. This Paper aims to present some classification methods for gene expression data. We compared the efficiency of three different classification methods; support vector machines, k-nearest neighbor and random forest. Two publicly available gene expression data sets were used in the classifications; Freije and Philips dataset. By performing the classification methods, results revealed that the best performance was achieved by using support vector machine classifier for both datasets comparing with other used classifiers.


Gene expression Classification 


  1. 1.
    Stewart, B.W., Wild, C.P.: World Cancer report 2014. In: International Agency for Research on Cancer (IARC), World Health Organization (WHO). WHO Press, Switzerland (2014)Google Scholar
  2. 2.
    Wang, J.J.-Y., Bensmail, H., Gao, X.: Multiple graph regularized nonnegative matrix factorization. Pattern Recogn. 46(10), 2840–2847 (2013)CrossRefGoogle Scholar
  3. 3.
    Wang, J.J.-Y., Wang, X., Gao, X.: Non-negative matrix factorization by maximizing correntropy for cancer clustering. BMC Bioinform. 14, 107–118 (2013)CrossRefGoogle Scholar
  4. 4.
    Wang, J.-Y., Almasri, I., Gao, X.: Adaptive graph regularized nonnegative matrix factorization via feature selection. In: 21st International Conference on Pattern Recognition (ICPR), pp. 963–966 (2012)Google Scholar
  5. 5.
    Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98(9), 5116–5121 (2001)CrossRefGoogle Scholar
  6. 6.
    Spang, R.: Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1, 64–68 (2003)CrossRefGoogle Scholar
  7. 7.
    Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. U.S.A. 101(12), 4164–4169 (2004)CrossRefGoogle Scholar
  8. 8.
    McLachlan, G.J., Bean, R., Peel, D.: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3), 413–422 (2002)CrossRefGoogle Scholar
  9. 9.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  10. 10.
    Li, Y., Kang, K., Krahn, J.M., Croutwater, N., Lee, K., Umbach, D.M., Li, L.: A comprehensive genomic pan-cancer classification using the Cancer Genome Atlas gene expression data. BMC Genom. 18(1), 508 (2017)CrossRefGoogle Scholar
  11. 11.
    Li, L., Weinberg, C.R., Darden, T., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)CrossRefGoogle Scholar
  12. 12.
    Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb. Chem. High Throughput Screen. 4(8), 727–739 (2001)CrossRefGoogle Scholar
  13. 13.
    Singha, R.K., Sivabalakrishnan, M.: Feature selection of gene expression data for cancer classification: a review. Procedia Comput. Sci. 50, 52–57 (2015)CrossRefGoogle Scholar
  14. 14.
    Zhong, W., Lu, X., Wu, J.: Feature selection for cancer classification using microarray gene expression data. Biostat. Biometr. 1(2), 1–7 (2017)Google Scholar
  15. 15.
    Li, T., Zhang, C., Ogihara, M.A.: comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)CrossRefGoogle Scholar
  16. 16.
    Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  17. 17.
    Nello, C., Taylor, J.S.: An Introduction to support vector machines and other kernel-based learning methods. Cambridge Univ. Press 22(2), 204–210 (2001)zbMATHGoogle Scholar
  18. 18.
    The Freije dataset. last accessed 10 Aug 2018
  19. 19.
    The Phillips dataset. last accessed 10 Aug 2018
  20. 20.
    Schlkopf, B., Tsuda, K., Vert, J.P.: Kernel methods in computational biology. MIT Press series on Computational Molecular Biology, Berlin (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Scientific Computing, Faculty of Computer and Information SciencesAin Shams UniversityCairoEgypt

Personalised recommendations