A New Combined Filter-Wrapper Framework for Gene Subset Selection with Specialized Genetic Operators

  • Edmundo Bonilla Huerta
  • J. Crispín Hernández Hernández
  • L. Alberto Hernández Montiel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6256)


This paper introduces a new combined filter-wrapper gene subset selection approach where a Genetic Algorithm (GA) is combined with Linear Discriminant Analysis (LDA). This LDA-based GA algorithm has the major characteristic that the GA uses not only a LDA classifier in its fitness function, but also LDA’s discriminant coefficients in its dedicated crossover and mutation operators. This paper studies the effect of these informed operators on the evolutionary process. The proposed algorithm is assessed on a several well-known datasets from the literature and compared with recent state of art algorithms. The results obtained show that our filter-wrapper approach obtains globally high classification accuracies with very small number of genes to those obtained by other methods.


Microarray gene expression Feature selection Genetic algorithms Linear Discriminant Analysis Filter Wrapper 


  1. 1.
    Alizadeh, A., Eisen, M.B., et al.: Distinct types of diffuse large (b)-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefGoogle Scholar
  2. 2.
    Alon, U., Barkai, N., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
  3. 3.
    Ben-Dor, A., Bruhn, L., et al.: Tissue classification with gene expression profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)CrossRefGoogle Scholar
  4. 4.
    Bonilla-Huerta, E., Duval, B., Hao, J.-K., et al.: A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 34–44. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Bonilla-Huerta, E., Duval, B., Hao, J.-K., et al.: Gene selection for microarray by a LDA-based genetic algorithms. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 250–261. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Golub, T., Slonim, D., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  7. 7.
    Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97, 77–87 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Cai, R., Hao, Z., Yang, X., Wen, W.: An efficient gene selection algorithm based on mutual information. Neurocomputing 26(3), 243–250 (2008)Google Scholar
  9. 9.
    Liao, C., Li, S., Luo, Z.: Gene selection for cancer classification using Wilcoxon Rank Sum Test and Support Vector Machine. In: International Conference on Computation Intelligence and Security, pp. 368–373 (2006)Google Scholar
  10. 10.
    Ye, J., Li, T., et al.: Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biology Bioinform. 1(4), 181–190 (2004)CrossRefGoogle Scholar
  11. 11.
    Yue, F., Wang, K., Zuo, W.: Informative gene selection and tumor classification by null space lda for Microarray data. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 435–446. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)zbMATHGoogle Scholar
  13. 13.
    Furey, T.S., Cristianini, N., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
  14. 14.
    Li, L., Weinberg, C.R., et al.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)CrossRefGoogle Scholar
  15. 15.
    Jourdan, L.: Metaheuristics for knowledge discovery: Application to genetic data, PhD thesis, University of Lille (2003) (in French)Google Scholar
  16. 16.
    Peng, S., Xu, Q., et al.: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letter 555(2), 358–362 (2003)CrossRefGoogle Scholar
  17. 17.
    Reddy, A.R., Deb, K.: Classification of two-class cancer data reliably using evolutionary algorithms, Technical Report. KanGAL (2003)Google Scholar
  18. 18.
    Guyon, I., Weston, J., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  19. 19.
    Saeys, Y., Aeyels, S., et al.: Feature selection for splice site prediction: A new method using eda-based feature ranking. BMC Bioinformatics, 5–64 (2004)Google Scholar
  20. 20.
    Goh, L., Song, Q., Kasabov, N.: A novel feature selection method to improve classification of gene expression data. In: Proc. of the 2nd Asia-Pacific Conference on Bioinformatics, ACS, Darlinghurst, Australia, pp. 161–166 (2004)Google Scholar
  21. 21.
    Hall, M., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15(6), 1437–1447 (2003)CrossRefGoogle Scholar
  22. 22.
    Gordon, G.J., Jensen, R.V., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 17(62), 4963–4967 (2002)Google Scholar
  23. 23.
    Singh, D., Febbo, P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
  24. 24.
    Piqué-Regí, R., Ortega, A., Asgharzadeh, S.: Sequential diagonal linear discriminant analysis (SeqDLDA) for microarray classification and gene identification. Computational Systems and Bioinformatics (2005)Google Scholar
  25. 25.
    Pomeroy, S.L., Tamayo, P., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)CrossRefGoogle Scholar
  26. 26.
    Petricoin, E.F., Ardekani, A.M., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)CrossRefGoogle Scholar
  27. 27.
    Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic pattern. Genomic Informatics 13, 51–60 (2002)Google Scholar
  28. 28.
    Tan, F., Fu, X., et al.: Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data. In: CEC-IEEE, pp. 2529–2534 (2006)Google Scholar
  29. 29.
    Ding, C., Peng, H.: Minimum redundancy feature selection from Microarray gene expression data. Bioinformatics and Computational. Biology 3(2), 185–206 (2005)CrossRefGoogle Scholar
  30. 30.
    Cho, S.B., Won, H.H.: Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 26(3), 243–250 (2007)CrossRefzbMATHGoogle Scholar
  31. 31.
    Yang, W.H., Dai, D.Q., Yan, H.: Generalized discriminant analysis for tumor classification with gene expression data. Machine Learning and Cybernetics 1, 4322–4327 (2006)Google Scholar
  32. 32.
    Yang, P., et al.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics 11(suppl. 1), S6 (2010)Google Scholar
  33. 33.
    Peng, Y., Li, W., Liu, Y.: A hybrid approach for biomarker discovery from Microarray gene expression data. Cancer Informatics 2, 301–311 (2006)Google Scholar
  34. 34.
    Wang, Z., Palade, V., Xu, Y.: Neuro-fuzzy ensemble approach for Microarray cancer gene expression data analysis. In: Proc. E. Fuzzy Systems, pp. 241–246 (2006)Google Scholar
  35. 35.
    Pang, S., Havukkala, I., et al.: Classification consistency analysis for bootstrapping gene selection. Neural Computing and Applications 16, 527–539 (2007)CrossRefGoogle Scholar
  36. 36.
    Li, G.Z., Zeng, X.Q., et al.: Partial least squares based dimension reduction with gene selection for tumor classification. In: BIBE-IEEE, pp. 1439–1444 (2007)Google Scholar
  37. 37.
    Zhang, L., Li, Z., Chen, H.: An effective gene selection method based on relevance analysis and discernibility matrix. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1088–1095. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  38. 38.
    Li, S., Wu, X., Hu, X.: Gene selection using genetic algorithm and support vectors machines. Soft Computing 12(7), 693–698 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Edmundo Bonilla Huerta
    • 1
  • J. Crispín Hernández Hernández
    • 1
  • L. Alberto Hernández Montiel
    • 1
  1. 1.LITI, Instituto Tecnológico de ApizacoApizacoMexico

Personalised recommendations