Abstract
Data mining and machine learning are two interesting areas of computer science that go hand in hand in identifying hidden patterns and extracting valuable information from data. Indeed, Data mining covers the entire process of data analysis including machine learning which aims at constructing programs that learn automatically from experiences. The main purpose of this paper is to make a comparative study of four well-known classification algorithms namely Naive Bayes, Neural network, Support vector machines and Decision tree in order to categorize female patients into two groups; having diabetes or not. Therefore, after adopting well-chosen criteria based on confusion matrix, we run the selected algorithms in two different data mining technologies Weka and Orange. Indeed, the results obtained demonstrate that support vector machines; implemented in Weka toolkit as SMO, is the best technique in terms of accuracy, sensitivity and precision when handling diabetes in women dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
How Diabetes Affects Women: Symptoms, Risks, and More. http://www.healthline.com/health/diabetes/symptoms-in-women
Wang, Y., Zhang, J.: Exploring topics related to data mining on Wikipedia. Electron. Libr. 35, 667–688 (2017)
Peters, S.A.E., Huxley, R.R., Woodward, M.: Diabetes as risk factor for incident coronary heart disease in women compared with men: a systematic review and meta-analysis of 64 cohorts including 858,507 individuals and 28,203 coronary events. Diabetologia 57, 1542–1551 (2014)
Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017)
Dogan, N., Tanrikulu, Z.: A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness. Inf. Technol. Manag. 14, 105–124 (2013)
Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–228 (2000)
Rashid, T.A., Abdulla, S.M., Abdulla, R.M.: Decision support system for diabetes mellitus through machine learning techniques. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7 (2016)
Gupta, A., Mohammad, A., Syed, A., Halgamuge, M.N.: A comparative study of classification algorithms using data mining: crime and accidents in Denver City the USA. Education 7, 374–381 (2016)
UCI Machine Learning Repository: Pima Indians Diabetes Data Set. https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
Weka 3 - Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/ml/weka/
Orange – Data Mining Fruitful & Fun. https://orange.biolab.si/
KNIME | Open for Innovation. https://www.knime.org/
Data Science Platform | Machine Learning. https://rapidminer.com/
Wahbeh, A.H., Al-Radaideh, Q.A., Al-Kabi, M.N., Al-Shawakfa, E.M.: A comparison study between data mining tools over some classification methods. IJACSA Int. J. Adv. Comput. Sci. Appl. Spec. Issue Artif. Intell. 8, 18–26 (2011)
John, G., Langley, P.: Estimating continuous distributions in Bayesian classifiers (1995)
Salzberg, S.L.: C4. 5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc. (1993). Mach. Learn. 16, 235–240 (1994)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
Rumelhart, G., Hinton, G., Williams, R.: Learning internal representations by error propagation. Presented at the (1986)
Platt, J.C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machine (1998)
Stefanowski, J.: Data Mining - Evaluation of Classifiers (2008). http://www.cs.put.poznan.pl/jstefanowski/sed/DM-4-evaluatingclassifiersnew.pdf
Kirkby, R., Frank, E., Reutemann, P.: Weka explorer user guide for version 3-5-8. Univ, Waikato (2007)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sossi Alaoui, S., Aksasse, B., Farhaoui, Y. (2020). Data Mining and Machine Learning Approaches and Technologies for Diagnosing Diabetes in Women. In: Farhaoui, Y. (eds) Big Data and Networks Technologies. BDNT 2019. Lecture Notes in Networks and Systems, vol 81. Springer, Cham. https://doi.org/10.1007/978-3-030-23672-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-23672-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23671-7
Online ISBN: 978-3-030-23672-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)