Skip to main content

An Emerging Role for Polystores in Precision Medicine

  • Conference paper
  • First Online:
Data Management and Analytics for Medicine and Healthcare (DMAH 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10494))

Abstract

Medical data is organically heterogeneous, and it usually varies significantly in both size and composition. Yet, this data is also a key for the recent and promising field of precision medicine, which focuses on identifying and tailoring appropriate medical treatments for the needs of the individual patients, based on their specific conditions, their medical history, lifestyle, genetic, and other individual factors. As we, and a database community at large, recognize that a “one size does not fit all” solution is required to work with such data, we present our observations based on our experiences, and the applications in the field of precision medicine. We make the case for the use of polystore architecture; how it applies for precision medicine; we discuss the reference architecture; describe some of its critical components (array database); and discuss the specific types of analysis that directly benefit from this database architecture, and the ways it serves the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. GenomicsDB. https://github.com/Intel-HLS/GenomicsDB

  2. Intel-Broad Collaboration. http://genomicinfo.broadinstitute.org/acton/media/13431/broad-intel-collaboration

  3. PostgreSQL. http://www.postgresql.org

  4. Unboxing GATK4. https://gatkforums.broadinstitute.org/gatk/discussion/9644/unboxing-gatk4

  5. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: SIGMOD (1998)

    Google Scholar 

  6. Benneyan, J.C., Lloyd, R.C., Plsek, P.E.: Statistical process control as a tool for research and healthcare improvement. Qual. Saf. Health Care 12(6), 458–464 (2003)

    Article  Google Scholar 

  7. Brown, P.G.: Overview of SciDB: large scale array storage, processing and analysis. In: SIGMOD (2010)

    Google Scholar 

  8. Carey, M.J., Haas, L.M., Schwarz, P.M., Arya, M., Cody, W.E., Fagin, R., Flickner, M., Luniewski, A.W., Niblack, W., Petkovic, D., et al.: Towards heterogeneous multimedia information systems: the Garlic approach. In: Proceedings of the Fifth International Workshop on Research Issues in Data Engineering, 1995: Distributed Object Management. RIDE-DOM 1995, pp. 124–131. IEEE (1995)

    Google Scholar 

  9. Chen, P., Gadepally, V., Stonebraker, M.: The bigdawg monitoring framework. In: High Performance Extreme Computing Conference (HPEC), 2016 IEEE, pp. 1–6. IEEE (2016)

    Google Scholar 

  10. Dasgupta, S., Coakley, K., Gupta, A.: Analytics-driven data ingestion and derivation in the AWESOME polystore. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2555–2564. IEEE (2016)

    Google Scholar 

  11. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  12. Dziedzic, A., Elmore, A.J., Stonebraker, M.: Data transformation and migration in polystores. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)

    Google Scholar 

  13. Elmore, A., Duggan, J., Stonebraker, M., Balazinska, M., Cetintemel, U., Gadepally, V., Heer, J., Howe, B., Kepner, J., Kraska, T., et al.: A demonstration of the BigDAWG polystore system. Proc. VLDB Endow. 8(12), 1908–1911 (2015)

    Article  Google Scholar 

  14. Gadepally, V., Chen, P., Duggan, J., Elmore, A., Haynes, B., Kepner, J., Madden, S., Mattson, T., Stonebraker, M.: The BigDAWG polystore system and architecture. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)

    Google Scholar 

  15. Gadepally, V., OBrien, K., Dziedzic, A., Elmore, A., Kepner, J., Madden, S., Mattson, T., Rogers, J., She, Z., Stonebraker, M.: Version 0.1 of the BigDAWG Polystore System. arXiv preprint arXiv:1707.00721 (2017)

  16. Gassner, P., Lohman, G.M., Schiefer, K.B., Wang, Y.: Query optimization in the IBM DB2 family. IEEE Data Eng. Bull. 16(4), 4–18 (1993)

    Google Scholar 

  17. Gupta, A.M., Gadepally, V., Stonebraker, M.: Cross-engine query execution in federated database systems. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)

    Google Scholar 

  18. Hudak, D.E., Ludban, N., Krishnamurthy, A., Gadepally, V., Samsi, S., Nehrbass, J.: A computational science IDE for HPC systems: design and applications. Int. J. Parallel Prog. 37(1), 91–105 (2009)

    Article  MATH  Google Scholar 

  19. Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The cloudmdsql multistore system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2113–2116. ACM (2016)

    Google Scholar 

  20. Krishnamurthy, A., Samsi, S., Gadepally, V.: Parallel MATALAB techniques. In: Image Processing. InTech (2009)

    Google Scholar 

  21. Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)

    Article  Google Scholar 

  22. Mattson, T., Gadepally, V., She, Z., Dziedzic, A., Parkhurst, J.: Demonstrating the BigDAWG polystore system for ocean metagenomics analysis. In: CIDR (2017)

    Google Scholar 

  23. Mirnezami, R., Nicholson, J., Darzi, A.: Preparing for precision medicine. N. Engl. J. Med. 366(6), 489–491 (2012)

    Article  Google Scholar 

  24. Ng, K., Ghoting, A., Steinhubl, S.R., Stewart, W.F., Malin, B., Sun, J.: PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. J. Biomed. Inform. 48, 160–170 (2014)

    Article  Google Scholar 

  25. Palmer, C.R.: Ethics, data-dependent designs, and the strategy of clinical trials: time to start learning-as-we-go? Stat. Methods Med. Res. 11(5), 381–402 (2002)

    Article  MATH  Google Scholar 

  26. Papadopoulos, S., Datta, K., Madden, S., Mattson, T.: The tiledb array data storage manager. Proc. VLDB Endow. 10(4), 349–360 (2016)

    Article  Google Scholar 

  27. Roland, M., Torgerson, D.J.: Understanding controlled trials: what are pragmatic trials? BMJ: Br. Med. J. 316(7127), 285 (1998)

    Article  Google Scholar 

  28. Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L.-W., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39(5), 952 (2011)

    Article  Google Scholar 

  29. Safran, C., Bloomrosen, M., Hammond, W.E., Labkoff, S., Markel-Fox, S., Tang, P.C., Detmer, D.E.: Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J. Am. Med. Inform. Assoc. 14(1), 1–9 (2007)

    Article  Google Scholar 

  30. She, Z., Ravishankar, S., Duggan, J.: Bigdawg polystore query optimization through semantic equivalences. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)

    Google Scholar 

  31. Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22(3), 183–236 (1990)

    Article  Google Scholar 

  32. Stonebraker, M., Cetintemel, U.: “one size fits all”: an idea whose time has come and gone. In: Proceedings of the 21st International Conference on Data Engineering. ICDE 2005, pp. 2–11. IEEE (2005)

    Google Scholar 

  33. Wang, J., Baker, T., Balazinska, M., Halperin, D., Haynes, B., Howe, B., Hutchison, D., Jain, S., Maas, R., Mehta, P., et al.: The myria big data management and analytics system and cloud services. In: CIDR (2017)

    Google Scholar 

  34. Yong, K.K., Karuppiah, E.K., See, S.C.-W.: Galactica: a GPU parallelized database accelerator. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, p. 10. ACM (2014)

    Google Scholar 

  35. Zhou, X., Liu, S., Kim, E.S., Herbst, R.S., Lee, J.J.: Bayesian adaptive design for targeted therapy development in lung cancera step toward personalized medicine. Clin. Trials 5(3), 181–193 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This manuscript has been in part authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy, and under a joint program (MVP CHAMPION), between the U.S. Department of Energy, and the U.S. Department of Veterans Affairs.

The authors would like to thank the Intel Science and Technology Center (ISTC) for Big Data and the BigDAWG contributors (https://bigdawg.mit.edu/contributors) for their role in developing the BigDAWG system.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edmon Begoli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Begoli, E., Christian, J.B., Gadepally, V., Papadopoulos, S. (2017). An Emerging Role for Polystores in Precision Medicine. In: Begoli, E., Wang, F., Luo, G. (eds) Data Management and Analytics for Medicine and Healthcare. DMAH 2017. Lecture Notes in Computer Science(), vol 10494. Springer, Cham. https://doi.org/10.1007/978-3-319-67186-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67186-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67185-7

  • Online ISBN: 978-3-319-67186-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics