MapReduce based integration of health hubs: a healthcare design approach

  • Ramesh DharavathEmail author
  • Samuel Nyakotey
  • Damodar Reddy Edla
Original Paper


The increasing population in Asia brings up the need for integration of healthcare for efficient and timely manageable treatment for different diseases. Healthcare domain is one of the most important and challenging fields in terms of data collection and analysis. This domain always provide lots of opportunities to explore the hidden knowledge in accessing health records. With the growth of unstructured data in large volume that leads towards the solution by the NoSQL data management tool to manage the huge amount of data. This framework proposes a MapReduce Approach (MRA) for data management in healthcare industry with join based expectation maximization algorithm for NoSQL data management solution, which scales the data with accurate modality. This approach also simplifies the way to integrate healthcare data from different models in the distributed environment from different health hubs. Experimental results show that the proposed approach works in a scalable manner to integrate and match the unstructured data of different health data sources. Examples are illustrated with suitable methodology and further research scope is pinpointed.


NoSQL database MapReduce Expectation maximization HDFS Health data 


Compliance with ethical standards

Conflict of interest

The author(s) declare(s) that there is no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Jee K, Kim G-H. Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Healthcare Inform Res. 2013;19(2):79–85.MathSciNetCrossRefGoogle Scholar
  2. 2.
    Hermon R, Williams PAH. Big data in healthcare: What is it used for?. 2014.Google Scholar
  3. 3.
    Lee KKY, Tang WC, Choi KS. Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. Comput Methods Prog Biomed. 2013;110(1):99–109.CrossRefGoogle Scholar
  4. 4.
    Dharavath R, Kumar C. Entity resolution based EM for integrating heterogeneous distributed probabilistic data. J Syst Softw. 2015;107:93–109.CrossRefGoogle Scholar
  5. 5.
    Chouvarda IG, Goulis DG, Lambrinoudaki I, Maglaveras N. Connected health and integrated care: toward new models for chronic disease management. Maturitas. 2015;82(1):22–7.CrossRefGoogle Scholar
  6. 6.
    Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.CrossRefGoogle Scholar
  7. 7.
    Apache. Hadoop., 2006.
  8. 8.
    Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B. Parallel data processing with MapReduce: a survey. AcMsIGMoD Record. 2012;40(4):11–20.CrossRefGoogle Scholar
  9. 9.
    Fekr AR, Radecka K, Zilic Z. Design and evaluation of an intelligent remote tidal volume variability monitoring system in E-health applications. Biomed Health Inform IEEE J. 2015;19(5):1532–48.CrossRefGoogle Scholar
  10. 10.
    Clarke M. The need for an integrated approach to remote monitoring of physiological data and activity data. J Telemed Telecare. 2014;20(3):159–60.CrossRefGoogle Scholar
  11. 11.
    Casas A, Troosters T, Garcia-Aymerich J, Roca J, Hernández C, Alonso A, et al. Integrated care prevents hospitalisations for exacerbations in COPD patients. Eur Respir J. 2006;28(1):123–30.CrossRefGoogle Scholar
  12. 12.
    Chouvarda I, Philip NY, Natsiavas P, Kilintzis V, Sobnath D, Kayyali R, Maglaveras N. WELCOME—innovative integrated care platform using wearable sensing and smart cloud computing for COPD patients with comorbidities. In Engineering in medicine and biology society (EMBC), 2014 36th annual international conference of the IEEE. 2014: 3180–3183.Google Scholar
  13. 13.
    Kelders SM, van Gemert-Pijnen JE. Using log-data as a starting point to make e-health more persuasive. Persuasive. 2013:99–109.Google Scholar
  14. 14.
    Martínez-García A, Moreno-Conde A, Jódar-Sánchez F, Leal S, Parra C. Sharing clinical decisions for multi morbidity case management using social network and open-source tools. J Biomed Inform. 2013;46(6):977–84.CrossRefGoogle Scholar
  15. 15.
    Jen CH, Wang CC, Jiang BC, Chu YH, Chen MS. Application of classification techniques on development an early-warning system for chronic illnesses. Expert Syst Appl. 2012;39(10):8852–8.CrossRefGoogle Scholar
  16. 16.
    Clarke M, Schluter P, Reinhold B, Reinhold B. Designing robust and reliable timestamps for remote patient monitoring. Biomed Health Inform IEEE J. 2015;19(5):1718–23.CrossRefGoogle Scholar
  17. 17.
    Maglaveras N, Kilintzis V, Koutkias V, Chouvarda I. Integrated care and connected health approaches leveraging personalised health through big data analytics. pHealth. 2016: 117–122.Google Scholar
  18. 18.
    Harte R, Glynn L, Rodríguez-Molinero A, Baker PMA, Scharf T, Quinlan LR, et al. A human-centered design methodology to enhance the usability, human factors, and user experience of connected health systems: a three-phase methodology. JMIR Human Fact. 2017;4(1):e8.CrossRefGoogle Scholar
  19. 19.
    Suhonen R, Stolt M, Berg A, Katajisto J, Lemonidou C, Patiraki E, et al. Cancer patients' perceptions of quality-of-care attributes—associations with age, perceived health status, gender and education. J Clin Nurs. 2018;27(1–2):306–16.CrossRefGoogle Scholar
  20. 20.
    Kim J-Y, Yi E-S. Analysis of differences in subjective health status according to characteristics of hospitalized cancer patients. J Exercise Rehab. 2018;14(5):810–6.CrossRefGoogle Scholar
  21. 21.
    Rodrigues JJ, de la Torre I, Fernández G, López-Coronado M. Analysis of the security and privacy requirements of cloud-based electronic health records systems. J Med Internet Res. 2013;15(8):e186.CrossRefGoogle Scholar
  22. 22.
    Bellazzi R. Big data and biomedical informatics: a challenging opportunity. Yearbook Med Inform. 2014;9(1):8.Google Scholar
  23. 23.
    Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inform Sci Syst. 2014;2(1):3.CrossRefGoogle Scholar
  24. 24.
    Ramesh D, Suraj P, Saini L. Big data analytics in healthcare: a survey approach. 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), 2016: 1–6. IEEE.Google Scholar
  25. 25.
    Ahmed S, Usman Ali M, Ferzund J, Sarwar MA, Rehman A, Mehmood A. Modern data formats for big bioinformatics data analytics. arXiv preprint arXiv:1707.05364. 2017.Google Scholar
  26. 26.
    Jiang D, Tung AKH, Chen G. Map-join-reduce: toward scalable and efficient data analysis on large clusters. Knowl Data Eng IEEE Trans. 2011;23(9):1299–311.CrossRefGoogle Scholar
  27. 27.
    Mall R et al. Representative subsets for big data learning using k-NN graphs. Big data (big data), 2014 IEEE international conference on. IEEE, 2014.Google Scholar
  28. 28.
    Rama Satish KV, Kavya NP. Big data processing with harnessing hadoop-MapReduce for optimizing analytical workloads. Contemporary computing and informatics (IC3I), 2014 international conference on. IEEE, 2014.Google Scholar
  29. 29.
    Afrati FN, Ullman JD. Optimizing multiway joins in a map-reduce environment. Knowl Data Eng IEEE Trans. 2011;23(9):1282–98.CrossRefGoogle Scholar
  30. 30.
    Mohamed, Marwa Hussien, and Mohamed Helmy Khafagy. Hash semi cascade join for joining multi-way map reduce. SAI intelligent systems conference (IntelliSys), 2015, pp. 355–361. IEEE, 2015.Google Scholar
  31. 31.
    Afrati FN, Ullman JD. Optimizing joins in a map-reduce environment. Proceedings of the 13th international conference on extending database technology. ACM, 2010, 99–110.Google Scholar
  32. 32.
    Blanas S, Patel JM, VukErcegovac JR, Shekita EJ, Tian Y. A comparison of join algorithms for log processing in mapreduce. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 975–986. ACM, 2010.Google Scholar
  33. 33.
    Ferrera P, de Prado I, Palacios E, Fernandez-Marquez JL, Di Marzo Serugendo G. Tuple MapReduce: beyond classic MapReduce. Data Mining (ICDM), 2012 IEEE 12th international conference on. 260-269. IEEE, 2012.Google Scholar
  34. 34.
    David M, Benjelloun O, Garcia Molina H. Generic entity resolution with data confidences. Stanford University 2006.Google Scholar
  35. 35.
    Akbarinia R, Ayat N, Afsarmanesh H, Valduriez P. Entity resolution for uncertain data. BDA 2012.Google Scholar
  36. 36.
    Ayat N, Akbarinia R, Afsarmanesh H, Valduriez P. Entity resolution for uncertain data. 2012.Google Scholar

Copyright information

© IUPESM and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology (ISM)DhanbadIndia
  2. 2.Department of Computer Science and EngineeringNational Institute of TechnologyFarmagudiIndia

Personalised recommendations