Microarray Data Feature Selection Using Hybrid GA-IBPSO
DNA microarray examples are generated by a hybridization of mRNA from sample tissues or blood to cDNA (in the case of a spotted array), or hybridization of oligonucleotide of DNA (in the case of Affymetrix chips, on the surface of a chiparray). DNA microarray technology allows for the simultaneous monitoring and measurement of thousands of gene expression activation levels in a single experiment. Class memberships are characterized by the production of proteins, meaning that gene expressions refer to the production level of proteins specific for a gene. Thus, microarray data can provide valuable results for a variety of gene expression profile problems, and contribute to advances in clinical medicine. The application of microarray data on cancer type classification has recently gained in popularity. Coupled with statistical techniques, gene expression patterns have been used in the screening of potential tumor markers. Differential expressions of genes are analyzed statistically and genes are assigned to various classes, which may (or may not) enhance the understanding of the underlying biological processes.
In our study, we used a combination of a genetic algorithm (GA) and improved binary particle swarm optimization (IBPSO) to implement feature selection. IBPSO was embedded in the GA to serve as a local optimizer for each generation. The K-nearest neighbor method (K-NN) with leave-one-out cross-validation (LOOCV) based on Euclidean distance calculations served as an evaluator of the GA and IBPSO for five classification problems taken from the literature. This procedure can improve the performance of populations by having a chromosome approximate a local optimum, reducing the number of features and preventing the GA from getting trapped in a local optimum.
KeywordsGenetic Algorithm Support Vector Machine Particle Swarm Optimization Feature Selection Gene Expression Data
Unable to display preview. Download preview PDF.
- 2.Narendra, P.M. and Fukunage, K. (1997). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 6(9): 917–922.Google Scholar
- 13.Vafaie, H. and De Jong, K. (1992). Genetic algorithms as a tool for feature selection in machine learning. In: Proceedings of the 4th International Conference on Tools with Artificial Intelligence, pp. 200–204.Google Scholar
- 17.Pullan, W. (2003). Adapting the genetic algorithm to the traveling salesman problem, IEEE Congress on Evolutionary Computation, 1209–1035.Google Scholar
- 18.Holland, J. (1992). Adaptation in Nature and Artificial Systems, Cambridge, MA: MIT Press.Google Scholar
- 20.Kennedy, J., Eberhart, R.C., and Shi, Y. (2001). San Mateo, CA: Morgan Kaufman.Google Scholar
- 21.Kennedy, J. and Eberhart, R.C. (1997). A discrete binary version of the particle swarm algorithm. In: Systems, Man, and Cybernetics, 1997 IEEE International Conference on ‘Computational Cybernetics and Simulation’, vol. 5, Oct. 12–15, pp. 4104–4108.Google Scholar
- 22.Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. In: Proceedings of the IEEE Transactions Information Theory, pp. 21–27.Google Scholar
- 23.Fix, E. and Hodges, J.L. (1951). Discriminatory analysis—Nonparametric discrimination: Consistency properties. Technical Report 21-49-004, Report no. 4, US Air Force School of Aviation Medicine, Randolph Field, pp. 261–279.Google Scholar
- 24.Platt, J.C., Cristianini, N., and Shawe-Taylor, J. (2000). Large margin DAGS for multiclass classification. In: Advances in Neural Information Processing Systems 12, Cambridge, MA: MIT Press, pp. 547–553.Google Scholar
- 26.Kreßel, U. (1999). Pairwise classification and support vector machines. In: Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, pp. 255–268.Google Scholar
- 27.Weston, J. and Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks (ESANN 99), Bruges, April 21–23.Google Scholar
- 28.Crammer, K. and Singer, Y. (2000). On the learnability and design of output codes for multiclass problems. In: Proceedings of the Thirteen Annual Conference on Computational Learning Theory (COLT 2000), Stanford University, Palo Alto, CA, June 28–July 1.Google Scholar
- 29.Dasarathy, B.V. (Ed.) (1991). NN Concepts and Techniques, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Washington, DC: IEEE Computer Society Press, pp. 1–30.Google Scholar