Advertisement

A Minority Class Feature Selection Method

  • German Cuaya
  • Angélica Muñoz-Meléndez
  • Eduardo F. Morales
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7042)

Abstract

In many classification problems, and in particular in medical domains, it is common to have an unbalanced class distribution. This pose problems to classifiers as they tend to perform poorly in the minority class which is often the class of interest. One commonly used strategy that to improve the classification performance is to select a subset of relevant features. Feature selection algorithms, however, have not been designed to favour the classification performance of the minority class. In this paper, we present a novel filter feature selection algorithm, called FSMC, for unbalanced data sets. FSMC selects attributes that have minority class distributions significantly different from the majority class distributions. FSMC is fast, simple, selects a small number of features and outperforms in most cases other feature selection algorithms in terms of global accuracy and in terms of performance measures for the minority class such as precision, recall, F-measure and ROC values.

Keywords

feature selection unbalanced data set medical domain 

References

  1. 1.
    Jain, A., Zongker, D.: Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans. Pattern Analysis and Machine Intelligence 19(2), 153–158 (1997)CrossRefGoogle Scholar
  2. 2.
    Dash, M., Liu, H.: Feature Selection for Classification. Intelligent Data Analysis 1(3), 131–156 (1997)CrossRefGoogle Scholar
  3. 3.
    Dash, M., Liu, H.: Consistency-based Search in Feature Selection. Artificial Intelligence 151(1-2), 155–176 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Kohavi, R., John, G.H.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  5. 5.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)CrossRefzbMATHGoogle Scholar
  6. 6.
    Robnic-Sikonja, M., Kononenko, I.: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53(1-2), 23–69 (2003)CrossRefzbMATHGoogle Scholar
  7. 7.
    Mao, K.Z.: Feature Subset Selection for Support Vector Machines Through Discriminative Function Pruning Analysis. IEEE Transactions on System, Man and Cybernetics, Part B 34(1), 60–67 (2004)CrossRefGoogle Scholar
  8. 8.
    Hsu, C.N., Huang, H.J., Dietrich, S.: The ANNIGMA-Wrapper Approach to Fast Feature Selection for Neural Nets. IEEE Transactions on System, Man and Cybernetics, Part B 32(2), 207–212 (2004)Google Scholar
  9. 9.
    Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis 6(5), 429–449 (2002)zbMATHGoogle Scholar
  10. 10.
    Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Technical report, Department of Computer Science, Rutgers University, New Jersey (2001)Google Scholar
  11. 11.
    Chen, X., Wasikowski, M.: FAST: A ROC-based feature selection metric for small samples and imbalanced data classification problems. In: 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 124–132 (2008)Google Scholar
  12. 12.
    Kamal, A.H.M., Zhu, X., Pandya, A.S., Hsu, S., Narayanan, R.: Feature Selection for Datasets with Imbalanced Class Distributions. International Journal of Software Engineering and Knowledge Engineering 20(2), 113–137 (2010)CrossRefGoogle Scholar
  13. 13.
    Alibeigi, M., Hashemi, S., Hamzeh, A.: Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets. International Journal of Artificial Intelligence and Expert Systems 2(1), 133–144 (2011)Google Scholar
  14. 14.
    Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml
  15. 15.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • German Cuaya
    • 1
  • Angélica Muñoz-Meléndez
    • 1
  • Eduardo F. Morales
    • 1
  1. 1.Computer Science DepartmentNational Institute of Astrophysics, Optics and ElectronicsTonantzintlaMéxico

Personalised recommendations