A New Gene Selection Method Based on Random Subspace Ensemble for Microarray Cancer Classification

  • Giuliano Armano
  • Camelia Chira
  • Nima Hatami
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)

Abstract

Gene expression microarray data provides simultaneous activity measurement of thousands of features facilitating a potential effective and reliable cancer diagnosis. An important and challenging task in microarray analysis refers to selecting the most relevant and significant genes for data (cancer) classification. A random subspace ensemble based method is proposed to address feature selection in gene expression cancer diagnosis. The introduced Diverse Accurate Feature Selection method relies on multiple individual classifiers built based on random feature subspaces. Each feature is assigned a score computed based on the pairwise diversity among individual classifiers and the ratio between individual and ensemble accuracies. This triggers the creation of a ranked list of features for which a final classifier is applied with an increased performance using minimum possible number of genes. Experimental results focus on the problem of gene expression cancer diagnosis based on microarray datasets publicly available. Numerical results show that the proposed method is competitive with related models from literature.

Keywords

random subspace ensembles multiple classifier systems multivariate feature selection gene expression data analysis pairwise diversity 

References

  1. 1.
    Banerjee, M., Mitra, S., Banka, H.: Evolutionary Rough Feature Selection in Gene Expression Data. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 37(4), 622–632 (2007)CrossRefGoogle Scholar
  2. 2.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform.Comput. Biol. 3, 185–205 (2005)CrossRefGoogle Scholar
  3. 3.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  4. 4.
    Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  5. 5.
    Huang, H.-L., Chang, F.-L.: ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data. BioSystems 90, 516–528 (2007)CrossRefGoogle Scholar
  6. 6.
    Kuncheva, L.I., Rodriguez, J.J., Plumpton, C.O., Linden, D.E.J., Johnston, S.J.: Random Subspace Ensembles for fMRI Classification. IEEE Transactions on Medical Imaging 29(2), 531–542 (2010)CrossRefGoogle Scholar
  7. 7.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles. Machine Learning 51, 181–207 (2003)CrossRefMATHGoogle Scholar
  8. 8.
    Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., Robles, V.: Machine learning in bioinformatics. Briefings in Bioinformatics 7(1), 86–112 (2006)CrossRefGoogle Scholar
  9. 9.
    Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Applied Soft Computing 11, 208–213 (2011)CrossRefGoogle Scholar
  10. 10.
    Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. Journal of Biomedical Informatics 43, 81–87 (2010)CrossRefGoogle Scholar
  11. 11.
    Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems 28(4), 243–268 (2003)CrossRefMATHGoogle Scholar
  12. 12.
    Maji, P., Paul, S.: Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. International Journal of Approximate Reasoning 52, 408–426 (2011)CrossRefGoogle Scholar
  13. 13.
    Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: (2004), http://www.gems-system.org
  14. 14.
    Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis. Bioinformatics 21(5), 631–643 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Giuliano Armano
    • 1
  • Camelia Chira
    • 2
  • Nima Hatami
    • 1
  1. 1.DIEE-Department of Electrical and Electronic EngineeringUniversity of CagliariCagliariItaly
  2. 2.Department of Computer ScienceBabes-Bolyai UniversityCluj-NapocaRomania

Personalised recommendations