A Framework of Gene Subset Selection Using Multiobjective Evolutionary Algorithm

  • Yifeng Li
  • Alioune Ngom
  • Luis Rueda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)


Microarray gene expression technique can provide snap shots of gene expression levels of samples. This technique is promising to be used in clinical diagnosis and genomic pathology. However, the curse of dimensionality and other problems have been challenging researchers for a decade. Selecting a few discriminative genes is an important choice. But gene subset selection is a NP hard problem. This paper proposes an effective gene selection framework. This framework integrates gene filtering, sample selection, and multiobjective evolutionary algorithm (MOEA). We use MOEA to optimize four objective functions taking into account of class relevance, feature redundancy, classification performance, and the number of selected genes. Experimental comparison shows that the proposed approach is better than a well-known recursive feature elimination method in terms of classification performance and time complexity.


gene selection sample selection non-negative matrix factorization multiobjective evolutionary algorithm 


  1. 1.
    Zhang, A.: Advanced Analysis of Gene Expression Microarray Data. World Scientific, Singapore (2009)Google Scholar
  2. 2.
    Li, Y., Ngom, A.: Non-Negative Matrix and Tensor Factorization Based Classification of Clinical Microarray Gene Expression Data. In: BIBM, pp. 438–443. IEEE Press, New York (2010)Google Scholar
  3. 3.
    Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and Molecular Pattern Discovery Using Matrix Factorization. PNAS 101(12), 4164–4169 (2004)CrossRefGoogle Scholar
  4. 4.
    Lee, D.D., Seung, S.: Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  5. 5.
    Saeys, Y., Inza, I., Larrañaga, P.: A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  6. 6.
    Ding, C., Peng, H.: Munimun Redundancy Feature Selection from Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  8. 8.
    Guyon, I., Weston, J., Barnhill, S.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  9. 9.
    Mundra, P.A., Rajapakse, J.C.: Gene and Sample Selection for Cancer Classification with Support Vectors Based t-statistic. Neurocomputing 73(13-15), 2353–2362 (2010)CrossRefGoogle Scholar
  10. 10.
    Mundra, P.A., Rajapakse, J.C.: Support Vectors Based Correlation Coefficient for Gene and Sample Selection in Ccancer Classification. In: CIBCB, pp. 88–94. IEEE Press, New York (2010)Google Scholar
  11. 11.
    Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR Filter for Gene Selection. IEEE Transactions on Nanobioscience 9(1), 31–37 (2010)CrossRefGoogle Scholar
  12. 12.
    Liu, J., Iba, H.: Selecting Informative Genes Using A Multiobjective Evolutionary Algorithm. In: CEC, vol. 1, pp. 297–302. IEEE Press, New York (2002)Google Scholar
  13. 13.
    Paul, T.K., Iba, H.: Selection of The Most Useful Subset of Genes for Gene Expression-Based Classification. In: CEC, vol. 2, pp. 2076 - 2083. IEEE Press, New York (2004)Google Scholar
  14. 14.
    Kohane, I.S., Kho, A.T., Butte, A.J.: Microarrays for An Integrative Genomics. MIT Press, Cambridge (2003)Google Scholar
  15. 15.
    Kim, H., Park, H.: Sparse Non-Negatice Matrix Factorization via Alternating Non-Negative-Constrained Least Squares for Microarray Data Analysis. Bioinformatics 23(12), 1495–1502 (2007)CrossRefGoogle Scholar
  16. 16.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)zbMATHGoogle Scholar
  17. 17.
    Chang, C., Lin, C.: LIBSVM : A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2(2), 27:1–27:27 (2001), Google Scholar
  18. 18.
    Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithm. Wiley, West Sussex (2001)Google Scholar
  19. 19.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002)CrossRefGoogle Scholar
  20. 20.
    Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(15), 531–537 (1999), CrossRefGoogle Scholar
  21. 21.
    Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., et al.: Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression. Nature 415, 436–442 (2002), Data Available at Google Scholar
  22. 22.
    Alon, U., Barkai, N., Notterman, D.A., et al.: Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. PNAS 96(12), 6745–6750 (1999), Data Available at

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yifeng Li
    • 1
  • Alioune Ngom
    • 1
  • Luis Rueda
    • 1
  1. 1.School of Computer SciencesUniversity of WindsorWindsorCanada

Personalised recommendations