A Minority Class Feature Selection Method
In many classification problems, and in particular in medical domains, it is common to have an unbalanced class distribution. This pose problems to classifiers as they tend to perform poorly in the minority class which is often the class of interest. One commonly used strategy that to improve the classification performance is to select a subset of relevant features. Feature selection algorithms, however, have not been designed to favour the classification performance of the minority class. In this paper, we present a novel filter feature selection algorithm, called FSMC, for unbalanced data sets. FSMC selects attributes that have minority class distributions significantly different from the majority class distributions. FSMC is fast, simple, selects a small number of features and outperforms in most cases other feature selection algorithms in terms of global accuracy and in terms of performance measures for the minority class such as precision, recall, F-measure and ROC values.
Keywordsfeature selection unbalanced data set medical domain
- 8.Hsu, C.N., Huang, H.J., Dietrich, S.: The ANNIGMA-Wrapper Approach to Fast Feature Selection for Neural Nets. IEEE Transactions on System, Man and Cybernetics, Part B 32(2), 207–212 (2004)Google Scholar
- 10.Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Technical report, Department of Computer Science, Rutgers University, New Jersey (2001)Google Scholar
- 11.Chen, X., Wasikowski, M.: FAST: A ROC-based feature selection metric for small samples and imbalanced data classification problems. In: 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 124–132 (2008)Google Scholar
- 13.Alibeigi, M., Hashemi, S., Hamzeh, A.: Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets. International Journal of Artificial Intelligence and Expert Systems 2(1), 133–144 (2011)Google Scholar
- 14.Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml