Microarray Data Feature Selection Using Hybrid GA-IBPSO

  • Cheng-San Yang
  • Li-Yeh Chuang
  • Chang-Hsuan Ho
  • Cheng-Hong Yang
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 6)

DNA microarray examples are generated by a hybridization of mRNA from sample tissues or blood to cDNA (in the case of a spotted array), or hybridization of oligonucleotide of DNA (in the case of Affymetrix chips, on the surface of a chiparray). DNA microarray technology allows for the simultaneous monitoring and measurement of thousands of gene expression activation levels in a single experiment. Class memberships are characterized by the production of proteins, meaning that gene expressions refer to the production level of proteins specific for a gene. Thus, microarray data can provide valuable results for a variety of gene expression profile problems, and contribute to advances in clinical medicine. The application of microarray data on cancer type classification has recently gained in popularity. Coupled with statistical techniques, gene expression patterns have been used in the screening of potential tumor markers. Differential expressions of genes are analyzed statistically and genes are assigned to various classes, which may (or may not) enhance the understanding of the underlying biological processes.

In our study, we used a combination of a genetic algorithm (GA) and improved binary particle swarm optimization (IBPSO) to implement feature selection. IBPSO was embedded in the GA to serve as a local optimizer for each generation. The K-nearest neighbor method (K-NN) with leave-one-out cross-validation (LOOCV) based on Euclidean distance calculations served as an evaluator of the GA and IBPSO for five classification problems taken from the literature. This procedure can improve the performance of populations by having a chromosome approximate a local optimum, reducing the number of features and preventing the GA from getting trapped in a local optimum.


Genetic Algorithm Support Vector Machine Particle Swarm Optimization Feature Selection Gene Expression Data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., and Jain, A.K. (2000). Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2): 164–171.CrossRefGoogle Scholar
  2. 2.
    Narendra, P.M. and Fukunage, K. (1997). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 6(9): 917–922.Google Scholar
  3. 3.
    Pudil, P., Novovicova, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15: 1119–1125.CrossRefGoogle Scholar
  4. 4.
    Roberto, B. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4): 537–550.CrossRefGoogle Scholar
  5. 5.
    Zhang, H. and Sun, G. (2002). Feature selection using tabu search method. Pattern Recognition, 35: 701–711.MATHCrossRefGoogle Scholar
  6. 6.
    Liu, X., Krishnan, A., and Mondry, A. (2005). An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics, 6: 76.CrossRefGoogle Scholar
  7. 7.
    Ancona, N., Maglietta, R., D’Addabbo, A., Liuni, S., and Pesole, G. (2005). Regularized least squares cancer classifiers from DNA microarray data. Bioinformatics, 6(Suppl 4): S2.CrossRefGoogle Scholar
  8. 8.
    Diaz-Uriarte, R. and Alvarez de Andres, S. (2006). Gene selection and classification of microarray data using random forest. Bioinformatics, 7: 3.CrossRefGoogle Scholar
  9. 9.
    Berrar, D., Bradbury, I., and Dubitzky, W. (2006). Instance-based concept learning from multiclass DNA microarray data. Bioinformatics, 7: 73.CrossRefGoogle Scholar
  10. 10.
    Tang, E.K., Suganthan, P., and Yao, X. (2006). Gene selection algorithms for microarray data based on least squares support vector machine. Bioinformatics, 7: 95.CrossRefGoogle Scholar
  11. 11.
    Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning, Reading, MA: Addison-Wesley.MATHGoogle Scholar
  12. 12.
    Hou, E.S., Ansari, N., and Ren, H. (1994). A genetic algorithm for multiprocessor scheduling, IEEE Transactions on Parallel and Distributed Systems, 5(2): 113–120.CrossRefGoogle Scholar
  13. 13.
    Vafaie, H. and De Jong, K. (1992). Genetic algorithms as a tool for feature selection in machine learning. In: Proceedings of the 4th International Conference on Tools with Artificial Intelligence, pp. 200–204.Google Scholar
  14. 14.
    Deb, K. Agrawal, S. Pratap, A., and Meyarivan, T. (2002). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II, IEEE Transactions on Evolutionary Computation, 6, 182–197.CrossRefGoogle Scholar
  15. 15.
    Oh et al. (2004). Hybrid genetic algorithm for feature selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11): 1424–1437.CrossRefGoogle Scholar
  16. 16.
    Kim, S. and Zhang, B.-T. (2001). Evolutionary learning of web-document structure for information retrieval. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2, pp. 1253–1260.CrossRefGoogle Scholar
  17. 17.
    Pullan, W. (2003). Adapting the genetic algorithm to the traveling salesman problem, IEEE Congress on Evolutionary Computation, 1209–1035.Google Scholar
  18. 18.
    Holland, J. (1992). Adaptation in Nature and Artificial Systems, Cambridge, MA: MIT Press.Google Scholar
  19. 19.
    Kennedy, J. and Eberhart, R.C. (1995). Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948.CrossRefGoogle Scholar
  20. 20.
    Kennedy, J., Eberhart, R.C., and Shi, Y. (2001). San Mateo, CA: Morgan Kaufman.Google Scholar
  21. 21.
    Kennedy, J. and Eberhart, R.C. (1997). A discrete binary version of the particle swarm algorithm. In: Systems, Man, and Cybernetics, 1997 IEEE International Conference on ‘Computational Cybernetics and Simulation’, vol. 5, Oct. 12–15, pp. 4104–4108.Google Scholar
  22. 22.
    Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. In: Proceedings of the IEEE Transactions Information Theory, pp. 21–27.Google Scholar
  23. 23.
    Fix, E. and Hodges, J.L. (1951). Discriminatory analysis—Nonparametric discrimination: Consistency properties. Technical Report 21-49-004, Report no. 4, US Air Force School of Aviation Medicine, Randolph Field, pp. 261–279.Google Scholar
  24. 24.
    Platt, J.C., Cristianini, N., and Shawe-Taylor, J. (2000). Large margin DAGS for multiclass classification. In: Advances in Neural Information Processing Systems 12, Cambridge, MA: MIT Press, pp. 547–553.Google Scholar
  25. 25.
    Statnikov, A., Aligeris, C.F., Tsamardinos, L., Hardin, D., and Levy, S. (2004). A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5), Sept.: 631–643.CrossRefGoogle Scholar
  26. 26.
    Kreßel, U. (1999). Pairwise classification and support vector machines. In: Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, pp. 255–268.Google Scholar
  27. 27.
    Weston, J. and Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks (ESANN 99), Bruges, April 21–23.Google Scholar
  28. 28.
    Crammer, K. and Singer, Y. (2000). On the learnability and design of output codes for multiclass problems. In: Proceedings of the Thirteen Annual Conference on Computational Learning Theory (COLT 2000), Stanford University, Palo Alto, CA, June 28–July 1.Google Scholar
  29. 29.
    Dasarathy, B.V. (Ed.) (1991). NN Concepts and Techniques, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Washington, DC: IEEE Computer Society Press, pp. 1–30.Google Scholar
  30. 30.
    Mitchell, T.M. (1997). Machine Learning. New York: McGraw-Hill.MATHGoogle Scholar
  31. 31.
    Specht, D.F. (1990). Probabilistic neural network. Neural Networks, 3, 109–118.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Cheng-San Yang
    • 1
  • Li-Yeh Chuang
    • 2
  • Chang-Hsuan Ho
  • Cheng-Hong Yang
  1. 1.Hospital
  2. 2.UniversityKaohsiung

Personalised recommendations