Abstract
Big data is the collection of complex and huge amount of data that comes from different sources such as social media, online transaction details, sensor data, etc. Such collection of voluminous data becomes hard to analyze using traditional processing applications. In healthcare system, doctors prescribed the insulin to the diabetic patients and the decision is based on the patient’s previous record and measure the sugar level at the regular intervals. The aim of this paper is to analyze the medical database of diabetes patients using data mining algorithms such as decision tree and naïve Bayes. This analysis is done using UCI machine learning datasets of diabetes having four features for the training phase. The results have shown that the decision tree algorithm has the more accuracy, precision, recall, and F-measure than naïve Bayes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Philip, C., Zhang, C.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Elsevier J. Inf. Sci. 275, 314–347 (2014)
Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with Big Data. ACM-Proc. VLDB Endow. 5(12) (2012)
Raghupathi, W., Raghupathi, V.: Big Data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2, 1–10 (2014)
Herland, M., et al.: A review of data mining using big data in health informatics. J. Big Data, 1–35 (2014)
Larose, D.: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, New Jersey (2005)
Patel, J., Sharma, P.: Big data for better health planning. In: IEEE International Conference on Advances in Engineering & Technology Research, 1–2 Aug 2014
Arvind, S., Gupta, P.: Predicting the number of blood donors through their age and blood group by using data mining tool. Int. J. Commun. Comput. Technol. 1(2), 6–10 (2012)
Pandeeswari, L., Rajeswari, K.: K-means clustering and Naïve Bayes classifier for categorization of diabetes patients. Int. J. Innov. Sci. Eng. Technol. (IJISET) 2(1) (2015)
Koklu, M., Unal, Y.: Analysis of a population of diabetic patients databases with classifiers. World Acad. Sci. Eng. Technol. 7(8) (2013)
Ianchao, J., Rodriguze, C., Beheshti, M.: Diabetes data analysis and prediction model discovery. In: International Conference on Future Generation Communication and Networking, pp 96–99 (2008)
Rokach, L.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific (2007)
Sun, W., Chen, J., Li, J.: Decision tree and PCA-based fault diagnosis of rotating machinery. Mech. Syst. Signal. Process. 21, 1300–1317 (2007)
Rahman, R.M., Afroz, F.: Comparison of various classification techniques. J. Softw. Eng. Appl. 6, 85–97 (2013)
Huang, J., Lu, J., Ling, C.X.: Comparing naive bayes, decision trees and SVM with AUC and accuracy. In: Third IEEE International Conference on Data Mining, USA, 22 Nov 2003
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Springer-Mob. Netw. Appl. 19(2), 171–209 (2014)
Sugumaran, V., Muralidharan, V., Ramachandran, K.I.: Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mech. Syst. Signal Process. 21, 930–942 (2007)
Addin, O., Sapuan, S.M., et al.: A Naïve-Bayes classifier for damage detection in engineering materials. Mater. Des. 28, 2379–2386 (2008)
Xhemali, D., Hinde, C.J., Stone, R.G.: Naïve bayes vs. decision trees vs. neural networks in the classification of training web pages. Int. J. Comput. Sci. 4(1) (2009)
Xindong, W., Xingquan, Z., Gong-Qing, W., Wei, D.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26, 97–107 (2014)
Sharma, S., Mangat, V.: Technology and trends to handle big data: survey. In: IEEE 5th International Conference on Advanced Computing & Communication Technologies, 21–22 Feb 2015
Shin, D., Choi, M., Kim, W.: Ecological views of big data: perspective and issues. Elsevier Telemat. Inf. 32, 311–320 (2014)
Patnaik, D., Marwah, M., Sharma, R.K., Ramakrishnan, N.: Data mining for modeling chiller systems in data centers. Adv. Intell. Data Anal. IX, 125–136 (2010)
Augustine, D.P.: Leveraging big data analytics and hadoop in developing india’s healthcare services. Int. J. Comput. Appl. 89, 44–50 (2014)
Baldominos, E., Albacete, Y., Saez, P.: A scalable machine learning online service for big data real-time analysis. In: IEEE Symposium on Computational Intelligence in Big Data (CIBD), pp. 1–8, Dec 2014
Parthiban, G., Rajesh, A., Srivatsa, S.: Diagnosis of heart disease for diabetic patients using Naive Bayes method. Int. J. Comput. Appl. 24, 7–11 (2011)
Chen, J., Huang, H., et al.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36, 5432–5435 (2009)
Ibrahim, S., Jin, H., Qi, L., Wu, S., Shi, X.: Evaluating MapReduce on virtual machines: the hadoop case. Springer: Cloud Computing Lecture Notes in Computer Science, Vol. 5931, pp. 519–528 (2009)
O’Driscoll, A., Daugelaite, J.: Big data, hadoop and cloud computing in genomics. Elsevier J. Biomed. Inf. 46, 774–781 (2013)
Dede, E., Sendir, B., Kuzlu, P., Ramakrishnan, L.: Processing cassandra datasets with hadoop-streaming based approaches. IEEE Trans. Serv. Comput. 9 (2016)
Apache Hadoop. http://hadoop.apache.org/. Sept 2014
Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: 6th Conference on Symposium on Operating Systems Design & Implementation. USENIX Association, Berkley, USA, pp. 137–150 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kumar, S., Singh, M. (2019). Diabetes Data Analysis Using MapReduce with Hadoop. In: Ray, K., Sharan, S., Rawat, S., Jain, S., Srivastava, S., Bandyopadhyay, A. (eds) Engineering Vibration, Communication and Information Processing. Lecture Notes in Electrical Engineering, vol 478. Springer, Singapore. https://doi.org/10.1007/978-981-13-1642-5_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-1642-5_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1641-8
Online ISBN: 978-981-13-1642-5
eBook Packages: EngineeringEngineering (R0)