Feature Selection and Transformation



A fundamental problem in pattern classification is to work with a set of features which are appropriate for the classification requirements. The first step is the feature extraction. In image classification, for example, the feature set commonly consists of gradients, salient points, SIFT features, etc. High-level features can also be extracted. For example, the detection of the number of faces and their positions, the detection of walls or surfaces in a structured environment, or text detection are high-level features which also are classification problems in and of themselves.

Once designed the set of features it is convenient to select the most informative of them. The reason for this is that the feature extraction process does not yield the best features for some concrete problems. The original feature set usually contains more features than it is necessary. Some of them could be redundant, and some could introduce noise, or be irrelevant. In some problems the number of features is very high and their dimensionality has to be reduced in order to make the problem tractable. In other problems feature selection provides new knowledge about the data classes. For example, in gene selection [146] a set of genes (features) are sought in order to explain which genes cause some disease. On the other hand, a properly selected feature set significantly improves classification performance. However, feature selection is a challenging task.


Feature Selection Mutual Information Independent Component Analysis Independent Component Analysis Algorithm Feature Selection Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Key References

  1. I. Guyon and A. Elisseeff. “An Introduction to Variable and Feature Selection”. Journal of Machine Learning Research 3:1157–1182 (2003)zbMATHCrossRefGoogle Scholar
  2. K. Torkkola. “Feature Extraction by Non-Parametric Mutual Information Maximization”. Journal of Machine Learning Research 3:1415–1438 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  3. H. Peng, F. Long, and C. Ding. “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy”. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8):1226–1238 (2005)CrossRefGoogle Scholar
  4. B. Bonev, F. Escolano, and M. Cazorla. “Feature Selection, Mutual Information, and the Classification of High-Dimensional Patterns”. Pattern Analysis and Applications 1433–7541 (2008)Google Scholar
  5. A. Vicente, P.O. Hoyer, and A. Hyvärinen. “Equivalence of Some Common Linear Feature Extraction Techniques for Appearence-Based Object Recognition Tasks”. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(5):896–900 (2007)CrossRefGoogle Scholar
  6. N. Vasconcelos and M. Vasconcelos. “Scalable Discriminant Feature Selection for Image Retrieval and Recognition”. Computer Vision and Pattern Recognition Conference, Washington, DC (USA) (2004)Google Scholar
  7. D. Koller and M. Sahami. “Toward Optimal Feature Selection”. ICML-96: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292, San Francisco, CA: Morgan Kaufmann, Bari (Italy) (1996)Google Scholar
  8. M. Law, M. Figueiredo, and A.K. Jain. “Simultaneous Feature Selection and Clustering Using a Mixture Model”. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9):1154–1166 (2004)CrossRefGoogle Scholar
  9. S.C. Zhu, Y.N. Wu, and D.B. Mumford. “FRAME: Filters, Random field And Maximum Entropy: Towards a Unified Theory for Texture Modeling”. International Journal of Computer Vision 27(2):1–20 (1998)CrossRefGoogle Scholar
  10. A. Hyvärinen and E. Oja. “Independent Component Analysis: Algorithms and Applications”. Neural Networks 13(4–5):411–430 (2000)CrossRefGoogle Scholar
  11. T. Bell and T. Sejnowski. “An Information-Maximization Approach to Blind Separation and Blind Deconvolution”. Neural Computation 7:1129–1159 (1995)CrossRefGoogle Scholar
  12. D. Erdogmus, K.E. Hild II, Y.N. Rao, and J.C. Príncipe. “Minimax Mutual Information Approach for Independent Component Analysis”. Neural Computation 16:1235–1252 (2004)zbMATHCrossRefGoogle Scholar
  13. Y. Ma, A.Y. Yang, H. Derksen, and R. Fossum. “Estimation of Sub-space Arrangements with Applications in Modeling and Segmenting Mixed data”. SIAM Review 50(3):413–458 (2008)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Verlag London Limited 2009

Personalised recommendations