Advertisement

RF-SEA-Based Feature Selection for Data Classification in Medical Domain

  • S. SasikalaEmail author
  • S. Appavu alias Balamurugan
  • S. Geetha
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 243)

Abstract

Dimensionality reduction is an essential problem in data analysis that has received a significant amount of attention from several disciplines. It includes two types of methods, i.e., feature extraction and feature selection. In this paper, we introduce a simple method for supervised feature selection for data classification tasks. The proposed hybrid feature selection mechanism (HFS), i.e., RF-SEA (ReliefF-Shapley ensemble analysis) which combines both filter and wrapper models for dimension reduction. In the first stage, we use the filter model to rank the features by the ReliefF(RF) between classes and then choose the highest relevant features to the classes with the help of the threshold. In the second stage, we use Shapley ensemble algorithm to evaluate the contribution of features to the classification task in the ranked feature subset and principal component analysis (PCA) is carried out as preprocessing step before both the steps. Experiments with several medical datasets proves that our proposed approach is capable of detecting completely irrelevant features and remove redundant features without significantly hurting the performance of the classification algorithm and also experimental results show obviously that the RF-SEA method can obtain better classification performance than singly Shapley-value-based or ReliefF (RF)-algorithm based method.

Keywords

Data classification Feature selection Feature extraction  ReliefF-shapley ensemble analysis Dimensionality reduction Medical disease diagnosis 

References

  1. 1.
    Liu, H, Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. (2005)Google Scholar
  2. 2.
    Lemke, F., Mueller, J.-A.: Medical data analysis using self-organizing data mining technologies. Syst. Anal. Model. Simul. 43(10), 1399–1408 (2003)CrossRefGoogle Scholar
  3. 3.
    Li, W., Han, J., Pei, J.: CMAR accurate and efficient classification based on multiple association rules. In: Proceedings of 2001 International Conference on Data Mining (2001)Google Scholar
  4. 4.
    Importance of feature selection in decision-tree and artificial-neural-network ecological applications Alburnus alburnus alborella: A practical example : Tina Tirelli, Daniela Pessani. Ecol. Inf. 6, 309–315 (2011)Google Scholar
  5. 5.
    Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the AAAI-92, AAAI Press, pp. 129–134 (1992)Google Scholar
  6. 6.
    Robnic-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53(1–2), 23–69 (2003)CrossRefGoogle Scholar
  7. 7.
    Sun, Y., Wu, D.: A Relief based feature extraction algorithm. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 188–195 (2008)Google Scholar
  8. 8.
    Ghiselli, E.E.: Theory of Psychological Measurement. McGraw_HillGoogle Scholar
  9. 9.
    Quinlan, J.R.: Induction of decision trees. Machine Learn. 1, 81–106 (1986)Google Scholar
  10. 10.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 5(4), 537–550 (1994)CrossRefGoogle Scholar
  11. 11.
    Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games Annals of Mathematics Studies II (28), pp. 307–317. Princeton University Press, Princeton (1953)Google Scholar
  12. 12.
    Weka 3: Machine learning software in java, The University of Waikato software documentation (http://www.cs.waikato.ac.nz/_ml/weka)
  13. 13.
    Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases. (http://www.ics.uci.edu/mlearn/MLRepository.html) (1998)
  14. 14.
    Jolliffe, I.T.: Principal Component Analysis. Springer (2002)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  • S. Sasikala
    • 1
    Email author
  • S. Appavu alias Balamurugan
    • 2
  • S. Geetha
    • 3
  1. 1.Research ScholarAnna UniversityChennaiIndia
  2. 2.Department of Electronics and Communication EngineeringK.L.N. College of Information TechnologyMaduraiIndia
  3. 3.Thiagarajar College of EngineeringMaduraiIndia

Personalised recommendations