Comparing Machine Learning Algorithms to Predict Diabetes in Women and Visualize Factors Affecting It the Most—A Step Toward Better Health Care for Women

  • Arushi AgarwalEmail author
  • Ankur Saxena
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1087)


Diabetes affects millions of people throughout the world, and more than half of the people suffering from it are women. Creating a better diagnosis and study tool will enable us to take a step forward in better healthcare. We use sklearn to create a model for the Pima Indians’ Diabetes Dataset. The main goal is to compare the different algorithms to obtain the best accuracy. Prediction of diabetes in women is crucial as it not only ensures an early start of treatment, but also helps in prevention in cases of high probability of the disease occurring. We have not only focused on the detection part, but also tried to study and visualize the factors that were most correlated to a diabetic person. By studying the most common algorithms, we can figure out which area needs to be worked upon to develop better ways of healthcare. Machine learning has been actively used in health care and by implementing this in conditions like diabetes which affects a major population in the world, including almost 100 million Americans and more than 62 million Indians. The idea behind choosing the dataset was to get parameters and features, which are not determined by geography or region, but the overall physiology of women, so that most women throughout the world can be benefitted. The algorithms compared are decision trees, logistic regression, Naïve Bayes, SVM, and KNN. The final result got us an accuracy of 81.1% with the help of K-Fold and Cross-Validation.


Diabetes Sklearn Pima Indians Decision trees Logistic regression KNN Naïve Bayes SVM Diagnosis 



This paper would have not been accomplished without the support of Dr. Ankur Saxena, who was a guide and mentor throughout the process.


  1. 1.
    A. Rathore, S. Chauhan, S. Gujral, in Detecting and predicting diabetes using supervised learning: an approach towards better healthcare for women, IGDTUW Kashmiri Gate, Delhi, IndiaGoogle Scholar
  2. 2.
    O. Chandrakar, J.R. Saini, Development of Indian weighted diabetic risk score (IWDRS) using machine learning techniques for type-2 diabetes. ACM COMPUTE’16, 21–23 Oct 2016Google Scholar
  3. 3.
    A.G. Karegowda, A.S. Manjunath, M.S. Jayaram, Application of genetic algorithm optimized neural network connection weights for medical diagnosis of pima Indians diabetes. Int. J. Soft Comput. (IJSC) 2(2) (2011)CrossRefGoogle Scholar
  4. 4.
    V.V. Vijayan, C. Anjali, Prediction and diagnosis of diabetes mellitus—a machine learning approach, in 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Trivandrum, 10–12 Dec 2015Google Scholar
  5. 5.
    V.V. Kamadi, A.R. Allam, S.M. Thummala, A computational intelligence technique for the effective diagnosis of diabetic patients using principal component analysis (PCA) and modified fuzzy SLIQ decision tree approach. Appl. Soft ComputGoogle Scholar
  6. 6.
    Y. Hayashi, S. Yukita, Rule extraction using recursive-rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset. Inf. Med. UnlockedGoogle Scholar
  7. 7.
    A. Sarvwar, V. Sharma, Intelligent Naive Bayes approach to diagnose diabetes type-2. Special Issue Int. J. Comput. Appl. Issues Challeng. Netw. Intell. Comput. Technol. (2012)Google Scholar
  8. 8.
    R. Motka, V. Parmar, Diabetes mellitus forecast using different data mining techniques. IEEE Int. Conf. Comput. Commun. Technol. (ICCCT) (2013)Google Scholar
  9. 9.
    S. Sapna, A. Tamilarasi, M. Pravin, Implementation of genetic algorithm in predicting diabetes. Int. J. Comput. Sci. Issues 9, 234–240 (2012)Google Scholar
  10. 10.
    S. Karatsiolis, C.N. Schizas, Region based support vector machine algorithm for medical diagnosis on pima Indian diabetes dataset, in IEEE Conference on Bioinformatics and Bioengineering (2012), pp. 139–144Google Scholar
  11. 11.
    A. AlJarullah Asma, Decision discovery for the diagnosis of type II diabetes, in IEEE Conference on Innovations in Information Technology (2011), pp. 303–307Google Scholar
  12. 12.
    D.M. Nirmala, S. Balamurugan, A. Appavu, U.V. Swathi, An amalgam KNN to predict diabetes mellitus, in IEEE International Conference on Emerging Trends in Computing Communication and Nanotechnology (ICECCN) (2013), pp. 691–695Google Scholar
  13. 13.
    P. Undre, H. Kaur, P. Patil, Improvement in prediction rate and accuracy of diabetic diagnosis system using fuzzy logic hybrid combination, in International Conference on Pervasive Computing (ICPC) (2015). pp. 1–4Google Scholar
  14. 14.
    S.S. Vinod Chandra, S. Anand Hareendran, in Artificial Intelligence and Machine Learning (PHI learning Private Limited, Delhi, 110092, 2014)Google Scholar
  15. 15.
    R. Bellazzi, B. Zupan, Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inf. 77, 81–97 (2008)CrossRefGoogle Scholar
  16. 16.
    A. Agarwal, A. Saxena, in Machine Learning—A Simple and Modern Approach to Biometrics. IndiaCom IEEE (2017)Google Scholar
  17. 17.
    A. Agarwal, A. Saxena Special Issue Malignant tumor detection using machine learning through scikit-learn: machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Int. J. Pure Appl. Mathem. 119(15), 2863–2874 (2018)
  18. 18.
    J. Wiens, Clinical Infectious Diseases 66(1), 153 (6 Jan 2018) Scholar
  19. 19.
    S. Saria, A.K. Rajani, J. Gould, D. Koller, A.A. Penn, Integration of early physiological responses predicts later illness severity in preterm infants. Sci. Transl. Med. 2:48ra65 (2010)CrossRefGoogle Scholar
  20. 20.
    D.C. Kale, D. Gong, Z. Che et al. An examination of multivariate time series hashing with applications to health car, in IEEE International Conference on Data Mining (ICDM) (2014), pp. 260–69Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Amity Institute of Biotechnology, AUUPNoidaIndia

Personalised recommendations