Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution

  • Abderrazak Sebaa
  • Fatima Chikh
  • Amina Nouicer
  • AbdelKamel Tari
Transactional Processing Systems
Part of the following topical collections:
  1. Transactional Processing Systems


The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data. However, without an efficient system design and architecture, these performances will not be significant and valuable for medical managers. In this paper, we provide a short review of the literature about research issues of traditional data warehouses and we present some important Hadoop-based data warehouses. In addition, a Hadoop-based architecture and a conceptual data model for designing medical Big Data warehouse are given. In our case study, we provide implementation detail of big data warehouse based on the proposed architecture and data model in the Apache Hadoop platform to ensure an optimal allocation of health resources.


Data warehouse Hadoop Big data Decision support Medical resources allocation 



This work was partially supported by the Ministry of Higher Education and Scientific Research of Algeria and the University of Bejaia, under the project CNEPRU (Ref. B*00620140066/2015-2018).

Compliance with Ethical Standards

Conflict of Interest

Authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Kuo, M.H., Sahama, T., Kushniruk, A.W., Borycki, E.M., and Grunwell, D.K., Health big data analytics: Current perspectives, challenges and potential solutions. Int. J. Big Data Intell. 1(1–2):114–126, 2014. Scholar
  2. 2.
    Cuzzocrea, A., Warehousing and Protecting Big Data: State-Of-The-Art-Analysis, Methodologies, Future Challenges. In Proceedings of the International Conference on Internet of things and Cloud Computing (p. 14). ACM, 2016.
  3. 3.
    White, T., Hadoop: The definitive guide (third edition). O’Reilly, 2012. ISBN: 978-1-449-322252-0.Google Scholar
  4. 4.
    Sumathi, S., and Esakkirajan, S., Fundamentals of relational database management systems (Vol. 47). Springer, 2007. ISBN: 978 3 540 48397 7.Google Scholar
  5. 5.
    Ewen, E.F., Medsker, C.E., and Dusterhoft, L.E., Data warehousing in an integrated health system: building the business case. In Proceedings of the 1st ACM international workshop on Data warehousing and OLAP (pp. 47–53). ACM, 1998.
  6. 6.
    Pedersen, T.B., and Jensen, C.S., Research issues in clinical data warehousing. In Scientific and Statistical Database Management. Proceedings. Tenth international conference on (pp. 43–52). IEEE, 1998.
  7. 7.
    Guérin, E., Moussouni, F., Courselaud, B., and Loréal, O., UML modeling of Gedaw: A gene expression data warehouse specialised in the liver. In The 3rd French bioinformatics conference proceeding: JOBIM 2002 (pp. 319–334), Saint-Malo, France, 2002.Google Scholar
  8. 8.
    Banek, M., Tjoa, A.M., and Stolba, N., Integrating different grain levels in a medical data warehouse federation. In International Conference on Data Warehousing and Knowledge Discovery (pp. 185–194). Springer Berlin Heidelberg, 2006.
  9. 9.
    Kerkri, E.M., Quantin, C., Allaert, F.A., Cottin, Y., Charve, P., Jouanot, F., and Yétongnon, K., An approach for integrating heterogeneous information sources in a medical data warehouse. J. Med. Syst. 25(3):167–176, 2001. Scholar
  10. 10.
    Pavalam, S.M., Jawahar, M., and Akorli, F.K., Data warehouse based Architecture for Electronic Health Records for Rwanda. In Education and Management Technology (ICEMT) International Conference on (pp. 253–255). IEEE, 2010.
  11. 11.
    Sebaa, A., Nouicer, A., Tari, A., Ramtani, T., and Ouhab, A., Decision support system for health care resources allocation. Electron. Physician. 9(6):4661–4668, 2017. Scholar
  12. 12.
    Sebaa, A., Nouicer, A., Tari, A., Ramtani, T., and Ouhab, A., Decision support system for Health Care Resources allocation. Abstracts Book of ICHSMT’16- International Conference on Health Sciences and Medical Technologies; 2016 Sep 27-29; Tlemcen, Algeria. Mehr publishing. p. 8, 2016. ISBN: 978-600-96661-0-2.Google Scholar
  13. 13.
    Sebaa, A., Tari, A., Ramtani, T., and Ouhab, A., DW RHSB: A framework for optimal allocation of health resources. Int. J. Comput. Sci. Commun Inf. Technol. 2(1):12–17, 2015.Google Scholar
  14. 14.
    Wang, L., and Alexander, C.A., Big data in medical applications and health care. Am. Med. J. 6(1):1, 2015. Scholar
  15. 15.
    Cuzzocrea, A., Song, I.Y., and Davis, K.C., Analytics over large-scale multidimensional data: the big data revolution. In Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP. pp. 101–104. ACM, 2011.
  16. 16.
    Sebaa, A., Nouicer, N., Chikh, F., and Tari, A., Big Data Technologies to Improve Medical Data Warehousing. In Proceedings of 2nd international conference on Big Data, Cloud and Applications. ACM, 2017.
  17. 17.
    Yao, Q., Tian, Y., Li, P.F., Tian, L.L., Qian, Y.M., and Li, J.S., Design and development of a medical big data processing system based on Hadoop. J. Med. Syst. 39(3):23, 2015. Scholar
  18. 18.
    Istephan, S., and Siadat, M.R., Unstructured medical image query using big data–an epilepsy case study. J. Biomed. Inform. 59:218–226, 2016. Scholar
  19. 19.
    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., and Saltz, J., Hadoop GIS: a high performance spatial data warehousing system over Map-Reduce. VLDB Endowment. 6(11):1009–1020, 2013. Scholar
  20. 20.
    Saravanakumar, N.M., Eswari, T., Sampath, P., and Lavanya, S., Predictive methodology for diabetic data analysis in big data. In 2nd ISBCC. Procedia Computer Science. 50:203–208, 2015. Scholar
  21. 21.
    Rodger, J.A., Discovery of medical big data analytics: Improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid Hadoop hive. Informatics in Medicine Unlocked. 1:17–26, 2015. Scholar
  22. 22.
    Sundvall, E., Wei-Kleiner, F., Freire, S.M., and Lambrix, P., Querying archetype-based electronic health records using Hadoop and Dewey encoding of openEHR models. Stud. Health Technol. Inform. 235:406, 2017. Scholar
  23. 23.
    Raja, P.V., and Sivasankar, E., Modern Framework for Distributed Healthcare Data Analytics Based on Hadoop. In Information and Communication Technology-EurAsia Conference (pp. 348–355). Springer Berlin Heidelberg, 2014.
  24. 24.
    Yang, C.T., Liu, J.C., Chen, S.T., and Lu, H.W., Implementation of a big data accessing and processing platform for medical records in cloud. J. Med. Syst. 41(10):149, 2017. Scholar
  25. 25.
    Sebaa, A., Chick, F., Nouicer, A., and Tari, A., Research in big data warehousing using Hadoop. J. Inform. Syst. Eng. Manag. 2(2), 2017.
  26. 26.
    Dean, J., and Ghemawat, S., MapReduce: A flexible data processing tool. CACM. 53(1):72–77, 2010. Scholar
  27. 27.
    Wu, S., Li, F., Mehrotra, S., and Ooi, B.C., Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing (p. 12). ACM, 2011.
  28. 28.
    Apache Hadoop:, Viewed in 02/2015.
  29. 29.
    Taylor, R.C., An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC bioinform. 11(12):S1, 2010. Scholar
  30. 30.
    Apache Hive:, Viewed in 02/2015.
  31. 31.
    Liu, X., Thomsen, C., and Pedersen, T.B., ETLMR: a highly scalable dimensional ETL framework based on mapreduce. In Transactions on Large-Scale Data-and Knowledge-Centered Systems VIII (pp. 1–31). Springer Berlin Heidelberg, 2013.
  32. 32.
    Gao, S., Li, L., Li, W., Janowicz, K., and Zhang, Y., Constructing gazetteers from volunteered big geo-data based on Hadoop. Comput. Environ. Urban. Syst. 61:172–186, 2017. Scholar
  33. 33.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., et al., Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endowment. 2(2):1626–1629, 2009. Scholar
  34. 34.
    Ross, J., The use of economic evaluation in health care: Australian decision makers' perceptions. Health Policy. 31(2):103–110, 1995. Scholar
  35. 35.
    ANDI: National Agency for Investment Development of Algeria,, Viewed in 02/2015.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.LIMED Laboratory, Faculty of Exact SciencesUniversity of BejaiaBejaiaAlgeria
  2. 2.Department of Computer Science, Faculty of Exact SciencesUniversity of BejaiaBejaiaAlgeria

Personalised recommendations