Evaluating the Effectiveness of Wrapper Feature Selection Methods with Artificial Neural Network Classifier for Diabetes Prediction

  • M. A. FahmiinEmail author
  • T. H. Lim
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 309)


Feature selection is an important preprocessing technique used to determine the most important features that contributes to the classification of a dataset, typically performed on high dimension datasets. Various feature selection algorithms have been proposed for diabetes prediction. However, the effectiveness of these proposed algorithms have not been thoroughly evaluated statistically. In this paper, three types of feature selection methods (Sequential Forward Selection, Sequential Backward Selection and Recursive Feature Elimination) classified under the wrapper method are used in identifying the optimal subset of features needed for classification of the Pima Indians Diabetes dataset with an Artificial Neural Network (ANN) as the classifying algorithm. All three methods manage to identify the important features of the dataset (Plasma Glucose Concentration and BMI reading), indicating their effectiveness for feature selection, with Sequential Forward Selection obtaining the feature subset that most improves the ANN. However, there are little to no improvements in terms of classifier evaluation metrics (accuracy and precision) when trained using the optimal subsets from each method as compared to using the original dataset, showing the ineffectiveness of feature selection on the low-dimensional Pima Indians Diabetes dataset.


Feature selection Wrapper methods Diabetes classification 


  1. 1.
    Norhafizah, D., Pg, B., Muhammad, H., Lim, T.H., Binti, N.S., Arifin, M.: Non-intrusive wearable health monitoring systems for emotion detection. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Siem Reap, pp. 985–989 (2017)Google Scholar
  2. 2.
    Adenin, H., Zahari, R., Lim, T.H.: Microcontroller based driver alertness detection systems to detect drowsiness. In: Proceedings of SPIE 10615, Ninth International Conference on Graphic and Image Processing (2018)Google Scholar
  3. 3.
    Veena Vijayan, V., Anjali, C.: Prediction and diagnosis of diabetes mellitus—a machine learning approach. In: IEEE Recent Advances in Intelligent Computational Systems (RAICS), Trivandrum, India, 10–12 December 2015 (2015)Google Scholar
  4. 4.
    Wei, S., Zhao, X., Miao, C.: A comprehensive exploration to the machine learning techniques for diabetes identification. In: IEEE 4th World Forum on Internet of Things (WF-IoT), Singapore, Singapore, 5–8 February 2018 (2018)Google Scholar
  5. 5.
    Sowjanya, K., Singhal, A., Choudhary, C.: MobDBTest: a machine learning based system for predicting diabetes risk using mobile devices. In: IEEE International Advance Computing Conference (IACC), Bangalore, India, 12–13 June 2015, pp. 297–402 (2015)Google Scholar
  6. 6.
    Duke, D.L., Thorpe, C., Mahmoud, M., Zirie, M.: Intelligent diabetes assistant: using machine learning to help manage diabetes. In: IEEE/ACS International Conference on Computer Systems and Applications, Doha, Qatar, 31 March–4 April 2008, pp. 913–914 (2008)Google Scholar
  7. 7.
    Gacav, C., Benligiray, B., Topal, C.: Sequential forward feature selection for facial expression recognition. In: 24th Signal Processing and Communication Application Conference, Zonguldak, Turkey, 16–19 May 2016 (2016)Google Scholar
  8. 8.
    Zheng, H., Park, H.W., Li, D., Park, K.H., Ryu, K.H.: A hybrid feature selection approach for applying to patients with diabetes mellitus: KNHANES 2013–2015. In: 5th NAFOSTED Conference on Information and Computer Science, Ho Chi Minh City, Vietnam, 23–24 November 2018 (2018)Google Scholar
  9. 9.
    Lv, X., Wu, J., Liu, W.: Face image feature selection based on gabor feature and recursive feature elimination. In: Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 26–27 August 2014 (2014)Google Scholar
  10. 10.
    Zhang, C., Li, Y., Yu, Z., Tian, F.: Feature selection of power system transient stability assessment based on random forest and recursive feature elimination. In: IEEE PES Asia-Pacific Power and Energy Engineering Conference, Xi’an, China, 25–28 October 2016 (2016)Google Scholar
  11. 11.
  12. 12.
    Dutta, D., Paul, D., Ghosh, P.: Analysing feature importances for diabetes prediction using machine learning. In: IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, BC, Canada, 1–3 November 2018 (2018)Google Scholar
  13. 13.
    Balakrishnan, S., Narayanaswamy, R., Savarimuthu, N., Samikannu, R.: SVM ranking with backward search for feature selection in type II diabetes databases. In: IEEE International Conference on Systems, Man and Cybernetics, Singapore, Singapore, 12–15 October 2008 (2008)Google Scholar
  14. 14.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefGoogle Scholar
  15. 15.
    Jayalakshmi, T., Santhakumaran, A.: A novel classification method for diagnosis of diabetes mellitus using artificial neural networks. In: International Conference on Data Storage and Data Engineering, Bangalore, India, 9–10 February 2010 (2010)Google Scholar
  16. 16.
    Dey, R., Bajpai, V., Gandhi, G., Dey, B.: Application of Artificial Neural Network (ANN) technique for diagnosing diabetes mellitus. In: IEEE Region 10 and the Third international Conference on Industrial and Information Systems, Kharagpur, India, 8–10 December 2008 (2008)Google Scholar
  17. 17.
    Keras Sequential Model.
  18. 18.
    Hoo, T., Lim, I.B., Timmis, J.: A self-adaptive fault-tolerant systems for a dependable Wireless Sensor Networks. Des. Autom. Embedded Syst. 18(3–4), 223 (2014)Google Scholar
  19. 19.
    Lim, T., Lau, H., Timmis, J., Bate, I.: Immune-inspired self healing in wireless sensor networks. In: Coello Coello, C.A., Greensmith, J., Krasnogor, N., Liò, P., Nicosia, G., Pavone, M. (eds.) ICARIS 2012. LNCS, vol. 7597, pp. 42–56. Springer, Heidelberg (2012). Scholar
  20. 20.
    Choubey, D., Paul, S., Kumar, S., Kumar, S.: Classification of Pima indian diabetes dataset using Naive Bayes with genetic algorithm as an attribute selection, pp. 451–455 (2016)Google Scholar
  21. 21.
    Rubaiat, S.Y., Rahman, Md.M., Hasan, Md.K.: Important feature selection & accuracy comparisons of different machine learning models for early diabetes detection. In: International Conference on Innovation in Engineering and Technology (2018)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2020

Authors and Affiliations

  1. 1.Universiti Teknologi BruneiBandar Seri BegawanBrunei Darussalam

Personalised recommendations