Similarity Measurement and Feature Selection Using Genetic Algorithm

  • Shangfei Wang
  • Shan He
  • Hua Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7368)


This paper proposes a novel approach to search for the optimal combination of a measure function and feature weights using an evolutionary algorithm. Different combinations of measure function and feature weights are used to construct the searching space. Genetic Algorithm is applied as an evolutionary algorithm to search for the candidate solution, in which the classification rate of the K-Nearest Neighbor classifier is used as the fitness value. Three experiments are carefully designed to show the attractiveness of our approach. In the first experiment, an artificial data set is constructed to verify the effectiveness of the proposed approach by testing whether it could find the optimal combination of measure function and feature weights which satisfy the data set. In the second experiment, data sets from the University of California at Irvine are employed to verify the general applicability of the method. Finally, a prostate cancer data set is used to show its effectiveness on high-dimensional data.


feature selection measure function genetic algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yang, L.: An overview of distance metric learning. Technical report, School of Computer Science, Carnegie Mellon University (2007)Google Scholar
  2. 2.
    Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10, 207–244 (2009)zbMATHGoogle Scholar
  3. 3.
    Zhan, D.C., Li, M., Li, Y.F., Zhou, Z.H.: Learning instance specific distances using metric propagation. In: ICML 2009, pp. 1225–1232 (2009)Google Scholar
  4. 4.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHGoogle Scholar
  5. 5.
    Wang, S., Zhu, H.: Musical perceptual similarity estimation using interactive genetic algorithm. In: CEC 2010, pp. 1–7 (2010)Google Scholar
  6. 6.
    Wang, S., He, S.: A ga-based similarity measurement and feature selection method for spontaneous facial expression recognition. In: Affective Interaction in Natural Environments Workshop, ICMI 2011 (2011)Google Scholar
  7. 7.
    Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recognition 43(1), 5–13 (2010)zbMATHCrossRefGoogle Scholar
  8. 8.
    Li, L., Darden, T.A., Weingberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry & High Throughput Screening 4(8), 727–739 (2001)Google Scholar
  9. 9.
    Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Shangfei Wang
    • 1
  • Shan He
    • 1
  • Hua Zhu
    • 1
  1. 1.Key Lab of Computing and Communicating Software of Anhui Province, School of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiP.R. China

Personalised recommendations