Skip to main content

Diabetes Data Analysis Using MapReduce with Hadoop

  • Conference paper
  • First Online:
Engineering Vibration, Communication and Information Processing

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 478))

Abstract

Big data is the collection of complex and huge amount of data that comes from different sources such as social media, online transaction details, sensor data, etc. Such collection of voluminous data becomes hard to analyze using traditional processing applications. In healthcare system, doctors prescribed the insulin to the diabetic patients and the decision is based on the patient’s previous record and measure the sugar level at the regular intervals. The aim of this paper is to analyze the medical database of diabetes patients using data mining algorithms such as decision tree and naïve Bayes. This analysis is done using UCI machine learning datasets of diabetes having four features for the training phase. The results have shown that the decision tree algorithm has the more accuracy, precision, recall, and F-measure than naïve Bayes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Philip, C., Zhang, C.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Elsevier J. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  2. Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with Big Data. ACM-Proc. VLDB Endow. 5(12) (2012)

    Article  Google Scholar 

  3. Raghupathi, W., Raghupathi, V.: Big Data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2, 1–10 (2014)

    Article  Google Scholar 

  4. Herland, M., et al.: A review of data mining using big data in health informatics. J. Big Data, 1–35 (2014)

    Google Scholar 

  5. Larose, D.: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, New Jersey (2005)

    MATH  Google Scholar 

  6. Patel, J., Sharma, P.: Big data for better health planning. In: IEEE International Conference on Advances in Engineering & Technology Research, 1–2 Aug 2014

    Google Scholar 

  7. Arvind, S., Gupta, P.: Predicting the number of blood donors through their age and blood group by using data mining tool. Int. J. Commun. Comput. Technol. 1(2), 6–10 (2012)

    Google Scholar 

  8. Pandeeswari, L., Rajeswari, K.: K-means clustering and Naïve Bayes classifier for categorization of diabetes patients. Int. J. Innov. Sci. Eng. Technol. (IJISET) 2(1) (2015)

    Google Scholar 

  9. Koklu, M., Unal, Y.: Analysis of a population of diabetic patients databases with classifiers. World Acad. Sci. Eng. Technol. 7(8) (2013)

    Google Scholar 

  10. Ianchao, J., Rodriguze, C., Beheshti, M.: Diabetes data analysis and prediction model discovery. In: International Conference on Future Generation Communication and Networking, pp 96–99 (2008)

    Google Scholar 

  11. Rokach, L.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific (2007)

    Google Scholar 

  12. Sun, W., Chen, J., Li, J.: Decision tree and PCA-based fault diagnosis of rotating machinery. Mech. Syst. Signal. Process. 21, 1300–1317 (2007)

    Article  Google Scholar 

  13. Rahman, R.M., Afroz, F.: Comparison of various classification techniques. J. Softw. Eng. Appl. 6, 85–97 (2013)

    Article  Google Scholar 

  14. Huang, J., Lu, J., Ling, C.X.: Comparing naive bayes, decision trees and SVM with AUC and accuracy. In: Third IEEE International Conference on Data Mining, USA, 22 Nov 2003

    Google Scholar 

  15. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Springer-Mob. Netw. Appl. 19(2), 171–209 (2014)

    Article  Google Scholar 

  16. Sugumaran, V., Muralidharan, V., Ramachandran, K.I.: Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mech. Syst. Signal Process. 21, 930–942 (2007)

    Article  Google Scholar 

  17. Addin, O., Sapuan, S.M., et al.: A Naïve-Bayes classifier for damage detection in engineering materials. Mater. Des. 28, 2379–2386 (2008)

    Article  Google Scholar 

  18. Xhemali, D., Hinde, C.J., Stone, R.G.: Naïve bayes vs. decision trees vs. neural networks in the classification of training web pages. Int. J. Comput. Sci. 4(1) (2009)

    Google Scholar 

  19. Xindong, W., Xingquan, Z., Gong-Qing, W., Wei, D.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26, 97–107 (2014)

    Article  Google Scholar 

  20. Sharma, S., Mangat, V.: Technology and trends to handle big data: survey. In: IEEE 5th International Conference on Advanced Computing & Communication Technologies, 21–22 Feb 2015

    Google Scholar 

  21. Shin, D., Choi, M., Kim, W.: Ecological views of big data: perspective and issues. Elsevier Telemat. Inf. 32, 311–320 (2014)

    Article  Google Scholar 

  22. Patnaik, D., Marwah, M., Sharma, R.K., Ramakrishnan, N.: Data mining for modeling chiller systems in data centers. Adv. Intell. Data Anal. IX, 125–136 (2010)

    Google Scholar 

  23. Augustine, D.P.: Leveraging big data analytics and hadoop in developing india’s healthcare services. Int. J. Comput. Appl. 89, 44–50 (2014)

    Google Scholar 

  24. Baldominos, E., Albacete, Y., Saez, P.: A scalable machine learning online service for big data real-time analysis. In: IEEE Symposium on Computational Intelligence in Big Data (CIBD), pp. 1–8, Dec 2014

    Google Scholar 

  25. Parthiban, G., Rajesh, A., Srivatsa, S.: Diagnosis of heart disease for diabetic patients using Naive Bayes method. Int. J. Comput. Appl. 24, 7–11 (2011)

    Google Scholar 

  26. Chen, J., Huang, H., et al.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36, 5432–5435 (2009)

    Article  Google Scholar 

  27. Ibrahim, S., Jin, H., Qi, L., Wu, S., Shi, X.: Evaluating MapReduce on virtual machines: the hadoop case. Springer: Cloud Computing Lecture Notes in Computer Science, Vol. 5931, pp. 519–528 (2009)

    Google Scholar 

  28. O’Driscoll, A., Daugelaite, J.: Big data, hadoop and cloud computing in genomics. Elsevier J. Biomed. Inf. 46, 774–781 (2013)

    Article  Google Scholar 

  29. Dede, E., Sendir, B., Kuzlu, P., Ramakrishnan, L.: Processing cassandra datasets with hadoop-streaming based approaches. IEEE Trans. Serv. Comput. 9 (2016)

    Article  Google Scholar 

  30. Apache Hadoop. http://hadoop.apache.org/. Sept 2014

  31. Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007)

    Google Scholar 

  32. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: 6th Conference on Symposium on Operating Systems Design & Implementation. USENIX Association, Berkley, USA, pp. 137–150 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunil Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, S., Singh, M. (2019). Diabetes Data Analysis Using MapReduce with Hadoop. In: Ray, K., Sharan, S., Rawat, S., Jain, S., Srivastava, S., Bandyopadhyay, A. (eds) Engineering Vibration, Communication and Information Processing. Lecture Notes in Electrical Engineering, vol 478. Springer, Singapore. https://doi.org/10.1007/978-981-13-1642-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1642-5_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1641-8

  • Online ISBN: 978-981-13-1642-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics