Relational Databases and Biomedical Big Data

  • N. H. Nisansa D. de SilvaEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1617)


In various biomedical applications that collect, handle, and manipulate data, the amounts of data tend to build up and venture into the range identified as bigdata. In such occurrences, a design decision has to be taken as to what type of database would be used to handle this data. More often than not, the default and classical solution to this in the biomedical domain according to past research is relational databases. While this used to be the norm for a long while, it is evident that there is a trend to move away from relational databases in favor of other types and paradigms of databases. However, it still has paramount importance to understand the interrelation that exists between biomedical big data and relational databases. This chapter will review the pros and cons of using relational databases to store biomedical big data that previous researches have discussed and used.

Key words

Relational databases Big data Biomedical big data Data mining 


  1. 1.
    Codd E (1970) A relational model of data for large shared data banks. Commun ACM 13(6):377–387. doi: 10.1145/362384.362685 CrossRefGoogle Scholar
  2. 2.
    Data, data everywhere. The Economist, 25 Feb 2010Google Scholar
  3. 3.
    Scherer M (2012) Inside the secret world of the data crunchers who helped Obama win. Accessed 28 Oct 2015
  4. 4.
    Weber GM, Mandl KD, Kohane IS (2014) Finding the missing link for big biomedical data. JAMA 311(24):2479–2480. doi: 10.1001/jama.2014.4228 PubMedGoogle Scholar
  5. 5.
    Hilbert M, López P (2011) The World’s technological capacity to store, communicate, and compute information. Science 332(6025):60–65. doi: 10.1126/science.1200970 CrossRefPubMedGoogle Scholar
  6. 6.
    IBM What is big data?—Bringing big data to the enterprise. IBM. Accessed 27 Oct 2015
  7. 7.
    Oracle and FSN. Mastering big data: CFO strategies to transform insight into opportunity. Accessed 27 Oct 2015
  8. 8.
    Jacobs A. The pathologies of big data. ACMQueue. Accessed 27 Oct 2015
  9. 9.
    Kayyali B, Knott D, Kuiken S (2013) The big-data revolution in US health care: accelerating value and innovation. McKinsey & Co, Chicago, ILGoogle Scholar
  10. 10.
    Grannis S, Overhage J, McDonald C (2002) Analysis of identifier performance using a deterministic linkage algorithm. In: Proceeding of the AMIA Symposium, pp 305–309Google Scholar
  11. 11.
    Margolis R, Derr L, Dunn M, Huerta M, Larkin J, Sheehan J, Guyer M, Green E (2014) The National Institutes of Health's big data to knowledge (BD2K) initiative: capitalizing on biomedical big data. J Am Med Inform Assoc 21(6):957–958. doi: 10.1136/amiajnl-2014-002974 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Ayers J, Althouse B, Dredze M (2014) Could behavioral medicine lead the web data revolution? JAMA 311(14):1399–1400. doi: 10.1001/jama.2014.1505 CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Sweeney L (2000) Simple demographics often identify people uniquely. Carnegie Mellon University. Accessed 28 Oct 2015
  14. 14.
    Gymrek M, McGuire A, Golan D, Halperin E, Erlich Y (2013) Identifying personal genomes by surname inference. Science 339(6117):321–324. doi: 10.1126/science.1229566 CrossRefPubMedGoogle Scholar
  15. 15.
    Kohane I, Altman R (2005) Health-information altruists. N Engl J Med 353(19):2074–2077. doi: 10.1056/NEJMsb051220 CrossRefPubMedGoogle Scholar
  16. 16.
    Dinu V, Nadkarni P (2007) Guidelines for the effective use of entity-attribute-value modeling for biomedical databases. Int J Med Inform 76(11-12):769–779. doi: 10.1016/j.ijmedinf.2006.09.023 CrossRefPubMedGoogle Scholar
  17. 17.
    Nadkarni P (2011) Metadata-driven software systems in biomedicine: designing systems that can adapt to changing knowledge. Springer, New YorkCrossRefGoogle Scholar
  18. 18.
    Luo G (2015) MLBCD: a machine learning tool for big clinical data. Health Inf Sci Syst 3:3. doi: 10.1186/s13755-015-0011-0 CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI, pp 137–150. doi:  10.1145/1327452.1327492
  20. 20.
    Xin R, Rosen J, Zaharia M, Franklin M, Shenker S, Shark SI (2013) Spark SQL: relational data processing in spark. In: SIGMOD, pp 13–24. doi:  10.1145/2723372.2742797
  21. 21.
    Saeed M, Villarroel M, Reisner A, Clifford G, Lehman L, Moody G, Heldt T, Kyaw T, Moody B, Mark R (2011) Multiparameter intelligent monitoring in intensive care II: a public-access intensive care unit database. Crit Care Med 39(5):952–960. doi: 10.1097/CCM.0b013e31820a92c6 CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Wang S, Pandis I, Chao W, Sijin H, Johnson D, Emam I, Guitton F, Guo Y (2014) High dimensional biological data retrieval optimization with NoSQL technology. BMC Genomics 15(8):S3. doi: 10.1186/1471-2164-15-S8-S3 CrossRefGoogle Scholar
  23. 23.
    Szalma S, Koka V, Khasanova T, Perakslis E (2010) Effective knowledge management in translational medicine. J Transl Med 8:68. doi: 10.1186/1479-5876-8-68 CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    George L (2008) HBase the definitive guide. O'Reilly Media, CaliforniaGoogle Scholar
  25. 25.
    Ježek P, Mouček R (2015) Semantic framework for mapping object-oriented model to semantic web languages. Front Neuroinform 9:3. doi: 10.3389/fninf.2015.00003 PubMedPubMedCentralGoogle Scholar
  26. 26.
    Jezek P, Moucek R (2012) System for EEG/ERP data and metadata storage and management. Neural Network World 22:277–290. doi: 10.14311/NNW.2012.22.016 CrossRefGoogle Scholar
  27. 27.
    Baker EJ (2012) Biological databases for behavioral neurobiology. Int Rev Neurobiol 103:19–38. doi: 10.1016/B978-0-12-388408-4.00002-2 CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    de Brevern AG, Meyniel J-P, Fairhead C, Cécile N, Malpertuy A (2015) Trends in IT innovation to build a next generation bioinformatics solution to manage and analyse biological big data produced by NGS technologies. Biomed Res Int 2015:904541. doi: 10.1155/2015/904541 CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Jayapandian CP, Chen C-H, Bozorgi A, Lhatoo SD, Zhang G-Q, Sahoo SS (2013) Cloudwave: distributed processing of “big data” from electrophysiological recordings for epilepsy clinical research using hadoop. In: AMIA Annual Symposium, pp 691–700Google Scholar
  30. 30.
    Bower MR, Stead M, Brinkmann BH, Dufendach K, Worrell GA (2009) Metadata and annotations for multi-scale electrophysiological data. In: Conference proceeding of the IEEE engineering in medical and biology society, pp 2811–2814. doi:  10.1109/IEMBS.2009.5333570
  31. 31.
    Arend D, Lange M, Chen J, Colmsee C, Flemming S, Hecht D, Scholz U (2014) e!DAL—a framework to store, share and publish research data. BMC Bioinformatics 15:214. doi: 10.1186/1471-2105-15-214 CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    H2 Database. Accessed 30 Oct 2015
  33. 33.
    Scott A, Courtney W, Wood D, de la Garza R, Lane S, King M, Wang R, Roberts J, Turner JA, Calhoun VD (2011) COINS: an innovative informatics and neuroimaging tool suite built for large heterogeneous datasets. Front Neuroinform 5:33. doi: 10.3389/fninf.2011.00033 CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Kumar A, Grupcev V, Berrada M, Fogarty JC, Tu Y-C, Zhu X, Pandit SA, Xia Y (2015) DCMS: a data analytics and management system for molecular simulation. J Big Data 2(1):9. doi: 10.1186/s40537-014-0009-5 CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Computer and Information ScienceUniversity of OregonEugeneUSA

Personalised recommendations