A New Combined Filter-Wrapper Framework for Gene Subset Selection with Specialized Genetic Operators
Abstract
This paper introduces a new combined filter-wrapper gene subset selection approach where a Genetic Algorithm (GA) is combined with Linear Discriminant Analysis (LDA). This LDA-based GA algorithm has the major characteristic that the GA uses not only a LDA classifier in its fitness function, but also LDA’s discriminant coefficients in its dedicated crossover and mutation operators. This paper studies the effect of these informed operators on the evolutionary process. The proposed algorithm is assessed on a several well-known datasets from the literature and compared with recent state of art algorithms. The results obtained show that our filter-wrapper approach obtains globally high classification accuracies with very small number of genes to those obtained by other methods.
Keywords
Microarray gene expression Feature selection Genetic algorithms Linear Discriminant Analysis Filter WrapperReferences
- 1.Alizadeh, A., Eisen, M.B., et al.: Distinct types of diffuse large (b)-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefGoogle Scholar
- 2.Alon, U., Barkai, N., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
- 3.Ben-Dor, A., Bruhn, L., et al.: Tissue classification with gene expression profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)CrossRefGoogle Scholar
- 4.Bonilla-Huerta, E., Duval, B., Hao, J.-K., et al.: A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 34–44. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 5.Bonilla-Huerta, E., Duval, B., Hao, J.-K., et al.: Gene selection for microarray by a LDA-based genetic algorithms. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 250–261. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 6.Golub, T., Slonim, D., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
- 7.Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97, 77–87 (2002)MathSciNetCrossRefMATHGoogle Scholar
- 8.Cai, R., Hao, Z., Yang, X., Wen, W.: An efficient gene selection algorithm based on mutual information. Neurocomputing 26(3), 243–250 (2008)Google Scholar
- 9.Liao, C., Li, S., Luo, Z.: Gene selection for cancer classification using Wilcoxon Rank Sum Test and Support Vector Machine. In: International Conference on Computation Intelligence and Security, pp. 368–373 (2006)Google Scholar
- 10.Ye, J., Li, T., et al.: Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biology Bioinform. 1(4), 181–190 (2004)CrossRefGoogle Scholar
- 11.Yue, F., Wang, K., Zuo, W.: Informative gene selection and tumor classification by null space lda for Microarray data. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 435–446. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 12.Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)MATHGoogle Scholar
- 13.Furey, T.S., Cristianini, N., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
- 14.Li, L., Weinberg, C.R., et al.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)CrossRefGoogle Scholar
- 15.Jourdan, L.: Metaheuristics for knowledge discovery: Application to genetic data, PhD thesis, University of Lille (2003) (in French)Google Scholar
- 16.Peng, S., Xu, Q., et al.: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letter 555(2), 358–362 (2003)CrossRefGoogle Scholar
- 17.Reddy, A.R., Deb, K.: Classification of two-class cancer data reliably using evolutionary algorithms, Technical Report. KanGAL (2003)Google Scholar
- 18.Guyon, I., Weston, J., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)CrossRefMATHGoogle Scholar
- 19.Saeys, Y., Aeyels, S., et al.: Feature selection for splice site prediction: A new method using eda-based feature ranking. BMC Bioinformatics, 5–64 (2004)Google Scholar
- 20.Goh, L., Song, Q., Kasabov, N.: A novel feature selection method to improve classification of gene expression data. In: Proc. of the 2nd Asia-Pacific Conference on Bioinformatics, ACS, Darlinghurst, Australia, pp. 161–166 (2004)Google Scholar
- 21.Hall, M., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15(6), 1437–1447 (2003)CrossRefGoogle Scholar
- 22.Gordon, G.J., Jensen, R.V., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 17(62), 4963–4967 (2002)Google Scholar
- 23.Singh, D., Febbo, P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
- 24.Piqué-Regí, R., Ortega, A., Asgharzadeh, S.: Sequential diagonal linear discriminant analysis (SeqDLDA) for microarray classification and gene identification. Computational Systems and Bioinformatics (2005)Google Scholar
- 25.Pomeroy, S.L., Tamayo, P., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)CrossRefGoogle Scholar
- 26.Petricoin, E.F., Ardekani, A.M., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)CrossRefGoogle Scholar
- 27.Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic pattern. Genomic Informatics 13, 51–60 (2002)Google Scholar
- 28.Tan, F., Fu, X., et al.: Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data. In: CEC-IEEE, pp. 2529–2534 (2006)Google Scholar
- 29.Ding, C., Peng, H.: Minimum redundancy feature selection from Microarray gene expression data. Bioinformatics and Computational. Biology 3(2), 185–206 (2005)CrossRefGoogle Scholar
- 30.Cho, S.B., Won, H.H.: Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 26(3), 243–250 (2007)CrossRefMATHGoogle Scholar
- 31.Yang, W.H., Dai, D.Q., Yan, H.: Generalized discriminant analysis for tumor classification with gene expression data. Machine Learning and Cybernetics 1, 4322–4327 (2006)Google Scholar
- 32.Yang, P., et al.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics 11(suppl. 1), S6 (2010)Google Scholar
- 33.Peng, Y., Li, W., Liu, Y.: A hybrid approach for biomarker discovery from Microarray gene expression data. Cancer Informatics 2, 301–311 (2006)Google Scholar
- 34.Wang, Z., Palade, V., Xu, Y.: Neuro-fuzzy ensemble approach for Microarray cancer gene expression data analysis. In: Proc. E. Fuzzy Systems, pp. 241–246 (2006)Google Scholar
- 35.Pang, S., Havukkala, I., et al.: Classification consistency analysis for bootstrapping gene selection. Neural Computing and Applications 16, 527–539 (2007)CrossRefGoogle Scholar
- 36.Li, G.Z., Zeng, X.Q., et al.: Partial least squares based dimension reduction with gene selection for tumor classification. In: BIBE-IEEE, pp. 1439–1444 (2007)Google Scholar
- 37.Zhang, L., Li, Z., Chen, H.: An effective gene selection method based on relevance analysis and discernibility matrix. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1088–1095. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 38.Li, S., Wu, X., Hu, X.: Gene selection using genetic algorithm and support vectors machines. Soft Computing 12(7), 693–698 (2008)CrossRefGoogle Scholar