Performance Tuning of PCA by CFS-Shapley Ensemble and Its Application to Medical Diagnosis

  • S. Sasikala
  • S. Appavu Alias Balamurugan
  • S. Geetha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)


Selection of optimal features is an important area of research in medical data mining systems. Principal component analysis (PCA) is one among the most popular feature selection methods. Still PCA faces a drawback – i.e., the measurements from all of the original features are used in the projection to the lower dimensional space. Hence this work is aimed to tune the performance of PCA and classify the medical profiles. The proposed method is realized as an ensemble procedure with three steps – (i) feature selection using PCA, (ii) feature ranking with CFS and (iii) dimension reduction using Shapley Values Analysis. The variance coverage parameter of PCA is adjusted so as to yield maximum accuracy which are measured with specificity, sensitivity, precision and recall. This facilitates the selection of a compact set of superior features with uncompromised detection rates, remarkably at a low cost. To appraise the success of the proposed method, experiments were conducted across 6 different medical data sets using J48 decision tree classifier, which showed that the proposed procedure improves the classification efficiency and accuracy compared with individual usage.


Data mining Dimensionality reduction Feature Extraction Feature selection Principal component analysis Shapley value Analysis Classification 


  1. 1.
    Fernández-Navarro, F., et al.: Evolutionary Generalized Radial Basis Function neural networks for improving prediction accuracy in gene classification using feature selection. Applied Soft Computing Journal (2012), doi:10.1016/j.asoc.2012.01.008Google Scholar
  2. 2.
    Li, D.-C., Liu, C.-W., Hu, S.C.: A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artificial Intelligence in Medicine 52(1), 45–56 (2011)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Huang, Y., McCullagh, P., Black, N., Harper, R.: Feature selection and classification model construction on type 2 diabetic patients data. Artificial Intelligencte Medicine Journal 41, 251–262 (2007)CrossRefGoogle Scholar
  4. 4.
    Ingui, B.J., Rogers, M.A.: Searching for clinical prediction rules in MEDLINE. Journal of American Medical Information Association 8(4), 391–397 (2007)CrossRefGoogle Scholar
  5. 5.
    Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games Annals of Mathematics Studies, vol. II(28), pp. 307–317. Princeton University Press, Princeton (1953)Google Scholar
  6. 6.
    Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998),
  7. 7.
    Weka 3: Machine Learning Software in Java. The University of Waikato software documentation,

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • S. Sasikala
    • 1
  • S. Appavu Alias Balamurugan
    • 2
  • S. Geetha
    • 3
  1. 1.Anna universityIndia
  2. 2.K.L.N. College of Information TechnologyIndia
  3. 3.Thiagarajar College of EngineeringIndia

Personalised recommendations