A New Gene Selection Method Based on Random Subspace Ensemble for Microarray Cancer Classification
Gene expression microarray data provides simultaneous activity measurement of thousands of features facilitating a potential effective and reliable cancer diagnosis. An important and challenging task in microarray analysis refers to selecting the most relevant and significant genes for data (cancer) classification. A random subspace ensemble based method is proposed to address feature selection in gene expression cancer diagnosis. The introduced Diverse Accurate Feature Selection method relies on multiple individual classifiers built based on random feature subspaces. Each feature is assigned a score computed based on the pairwise diversity among individual classifiers and the ratio between individual and ensemble accuracies. This triggers the creation of a ranked list of features for which a final classifier is applied with an increased performance using minimum possible number of genes. Experimental results focus on the problem of gene expression cancer diagnosis based on microarray datasets publicly available. Numerical results show that the proposed method is competitive with related models from literature.
Keywordsrandom subspace ensembles multiple classifier systems multivariate feature selection gene expression data analysis pairwise diversity
- 3.Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
- 13.Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: (2004), http://www.gems-system.org