Advertisement

Analytical and Bioanalytical Chemistry

, Volume 411, Issue 26, pp 6791–6800 | Cite as

Challenges of big data integration in the life sciences

  • Sven Fillinger
  • Luis de la Garza
  • Alexander Peltzer
  • Oliver Kohlbacher
  • Sven NahnsenEmail author
Feature Article

Abstract

Big data has been reported to be revolutionizing many areas of life, including science. It summarizes data that is unprecedentedly large, rapidly generated, heterogeneous, and hard to accurately interpret. This availability has also brought new challenges: How to properly annotate data to make it searchable? What are the legal and ethical hurdles when sharing data? How to store data securely, preventing loss and corruption? The life sciences are not the only disciplines that must align themselves with big data requirements to keep up with the latest developments. The large hadron collider, for instance, generates research data at a pace beyond any current biomedical research center. There are three recent major coinciding events that explain the emergence of big data in the context of research: the technological revolution for data generation, the development of tools for data analysis, and a conceptual change towards open science and data. The true potential of big data lies in pattern discovery in large datasets, as well as the formulation of new models and hypotheses. Confirmation of the existence of the Higgs boson, for instance, is one of the most recent triumphs of big data analysis in physics. Digital representations of biological systems have become more comprehensive. This, in combination with advances in machine learning, creates exciting new research possibilities. In this paper, we review the state of big data in bioanalytical research and provide an overview of the guidelines for its proper usage.

Keywords

Big data Bioanalytics Data integration Bioinformatics Scalability 

Notes

Funding information

This work was carried out with the support of the German Research Foundation (DFG) within project INF, SFB/TR 209 “Liver Cancer.”

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

Not applicable.

Informed consent

Not applicable.

References

  1. 1.
    Mayer-Schönberger V, Cukier K. Big data: a revolution that will transform how we live, work and think. In: Houghton Mifflin Harcourt Publishing Company, vol. 215. New York: Park Avenue South; 2013. p. 10003.Google Scholar
  2. 2.
    NGRAM Viewer. https://books.google.com/ngrams. Accessed Oct 2018
  3. 3.
    Price MO, Rider F. The scholar and the future of the research library. A problem and its solution. Columbia Law Rev. 1944;44:938.CrossRefGoogle Scholar
  4. 4.
    Yao Q, Tian Y, Li P-F, Tian L-L, Qian Y-M, Li J-S. Design and development of a medical big data processing system based on Hadoop. J Med Syst. 2015;39:23.CrossRefGoogle Scholar
  5. 5.
    CERN Data Centre passes the 200-petabyte milestone | CERN. https://home.cern/about/updates/2017/07/cern-data-centre-passes-200-petabyte-milestone. Accessed 16 Oct 2018.
  6. 6.
    Savage N. Big data goes green. Nature. 2018;558:S19.CrossRefGoogle Scholar
  7. 7.
    Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.CrossRefGoogle Scholar
  8. 8.
    Sansone S-A, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, et al. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol. 2019;37:358–67.CrossRefGoogle Scholar
  9. 9.
    Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, et al. International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database. 2011.  https://doi.org/10.1093/database/bar026.PubMedGoogle Scholar
  10. 10.
    DataCite Schema. In: DataCite Schema. https://schema.datacite.org/meta/kernel-4.1/index.html. Accessed 9 Oct 2018.
  11. 11.
    Schroeder B, Pinheiro E, Weber W-D. DRAM errors in the wild: a large-scale field study. In: Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems. New York: ACM; 2009. p. 193–204.Google Scholar
  12. 12.
    Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29:147–60.CrossRefGoogle Scholar
  13. 13.
    Savage N. Bioinformatics: big data versus the big C. Nature. 2014;509:S66–7.CrossRefGoogle Scholar
  14. 14.
    Dai L, Gao X, Guo Y, Xiao J, Zhang Z. Bioinformatics clouds for big data manipulation. Biol Direct. 2012;7:43 discussion 43.CrossRefGoogle Scholar
  15. 15.
    Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods. 2016;13:741–8.CrossRefGoogle Scholar
  16. 16.
    Hildebrandt A, Dehof AK, Rurainski A, Bertsch A, Schumann M, Toussaint NC, et al. BALL--biochemical algorithms library 1.3. BMC Bioinformatics. 2010;11:531.CrossRefGoogle Scholar
  17. 17.
    Döring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics. 2008;9:11.CrossRefGoogle Scholar
  18. 18.
    Vizcaíno JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016;44:D447–56.CrossRefGoogle Scholar
  19. 19.
    Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011;39:D19–21.CrossRefGoogle Scholar
  20. 20.
    Cochrane G, Alako B, Amid C, Bower L, Cerdeño-Tárraga A, Cleland I, et al. Facing growth in the European Nucleotide Archive. Nucleic Acids Res. 2013;41:D30–5.CrossRefGoogle Scholar
  21. 21.
    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41:D991–5.CrossRefGoogle Scholar
  22. 22.
    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5.CrossRefGoogle Scholar
  23. 23.
    The 1000 Genomes Project Consortium, Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs (Principal Investigator) RA, Boerwinkle E, Doddapaneni H, Han Y, Korchina V, Kovar C, Lee S, Muzny D, Reid JG, Zhu Y, Wang (Principal Investigator) J, Chang Y, Feng Q, Fang X, Guo X, Jian M, Jiang H, Jin X, Lan T, Li G, Li J, Li Y, Liu S, Liu X, Lu Y, Ma X, Tang M, Wang B, Wang G, Wu H, Wu R, Xu X, Yin Y, Zhang D, Zhang W, Zhao J, Zhao M, Zheng X, Lander (Principal Investigator) ES, Altshuler DM, Gabriel (Co-Chair) SB, Gupta N, Gharani N, Toji LH, Gerry NP, Resch AM, Flicek (Principal Investigator) P, Barker J, Clarke L, Gil L, Hunt SE, Kelman G, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Roa A, Smirnov D, Smith RE, Streeter I, Thormann A, Toneva I, Vaughan B, Zheng-Bradley X, Bentley (Principal Investigator) DR, Grocock R, Humphray S, James T, Kingsbury Z, Lehrach (Principal Investigator) H, Sudbrak (Project Leader), Ralf, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo M-L, Mardis (Co-Principal Investigator) (Co-Chair) ER, Wilson (Co-Principal Investigator) RK, Fulton L, Fulton R, Sherry (Principal Investigator) ST, Ananiev V, Belaia Z, Beloslyudtsev D, Bouk N, Chen C, Church D, Cohen R, Cook C, Garner J, Hefferon T, Kimelman M, Liu C, Lopez J, Meric P, O’Sullivan C, Ostapchuk Y, Phan L, Ponomarov S, Schneider V, Shekhtman E, Sirotkin K, Slotta D, Zhang H, McVean (Principal Investigator) GA, Durbin (Principal Investigator) RM, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Schmidt (Principal Investigator) JP, Davies CJ, Gollub J, Webster T, Wong B, Zhan Y, Auton (Principal Investigator) A, Campbell CL, Kong Y, Marcketta A, Gibbs (Principal Investigator) RA, Yu (Project Leader), Fuli, Antunes L, Bainbridge M, Muzny D, Sabo A, Huang Z, Wang (Principal Investigator) J, Coin LJM, Fang L, Guo X, Jin X, Li G, Li Q, Li Y, Li Z, Lin H, Liu B, Luo R, Shao H, Xie Y, Ye C, Yu C, Zhang F, Zheng H, Zhu H, Alkan C, Dal E, Kahveci F, Marth (Principal Investigator) GT, Garrison (Project Lead), Erik P, Kural D, Lee W-P, Fung Leong W, Stromberg M, Ward AN, Wu J, Zhang M, Daly (Principal Investigator) MJ, DePristo (Project Leader), Mark A, Handsaker (Project Leader), Robert E, Altshuler DM, Banks E, Bhatia G, del Angel G, Gabriel SB, Genovese G, Gupta N, Li H, Kashin S, Lander ES, McCarroll SA, Nemesh JC, Poplin RE, Yoon (Principal Investigator) SC, Lihm J, Makarov V, Clark (Principal Investigator) AG, Gottipati S, Keinan A, Rodriguez-Flores JL, Korbel (Principal Investigator) JO, Rausch (Project Leader), Tobias, Fritz MH, Stütz AM, Flicek (Principal Investigator) P, Beal K, Clarke L, Datta A, Herrero J, McLaren WM, Ritchie GRS, Smith RE, Zerbino D, Zheng-Bradley X, Sabeti (Principal Investigator) PC, Shlyakhter I, Schaffner SF, Vitti J, Cooper (Principal Investigator) DN, Ball EV, Stenson PD, Bentley (Principal Investigator) DR, Barnes B, Bauer M, Keira Cheetham R, Cox A, Eberle M, Humphray S, Kahn S, Murray L, Peden J, Shaw R, Kenny (Principal Investigator) EE, Batzer (Principal Investigator) MA, Konkel MK, Walker JA, MacArthur (Principal Investigator) DG, Lek M, Sudbrak (Project Leader), Ralf, Amstislavskiy VS, Herwig R, Mardis (Co-Principal Investigator) ER, Ding L, Koboldt DC, Larson D, Ye K, Gravel S, Swaroop A, Chew E, Lappalainen (Principal Investigator) T, Erlich (Principal Investigator) Y, Gymrek M, Frederick Willems T, Simpson JT, Shriver (Principal Investigator) MD, Rosenfeld (Principal Investigator) JA, Bustamante (Principal Investigator) CD, Montgomery (Principal Investigator) SB, De La Vega (Principal Investigator) FM, Byrnes JK, Carroll AW, DeGorter MK, Lacroute P, Maples BK, Martin AR, Moreno-Estrada A, Shringarpure SS, Zakharia F, Halperin (Principal Investigator) E, Baran Y, Lee (Principal Investigator) C, Cerveira E, Hwang J, Malhotra (Co-Project Lead), Ankit, Plewczynski D, Radew K, Romanovitch M, Zhang (Co-Project Lead), Chengsheng, Hyland FCL, Craig (Principal Investigator) DW, Christoforides A, Homer N, Izatt T, Kurdoglu AA, Sinari SA, Squire K, Sherry (Principal Investigator) ST, Xiao C, Sebat (Principal Investigator) J, Antaki D, Gujral M, Noor A, Ye K, Burchard (Principal Investigator) EG, Hernandez (Principal Investigator) RD, Gignoux CR, Haussler (Principal Investigator) D, Katzman SJ, James Kent W, Howie B, Ruiz-Linares (Principal Investigator) A, Dermitzakis (Principal Investigator) ET, Devine (Principal Investigator) SE, Abecasis (Principal Investigator) (Co-Chair) GR, Min Kang (Project Leader), Hyun, Kidd (Principal Investigator) JM, Blackwell T, Caron S, Chen W, Emery S, Fritsche L, Fuchsberger C, Jun G, Li B, Lyons R, Scheller C, Sidore C, Song S, Sliwerska E, Taliun D, Tan A, Welch R, Kate Wing M, Zhan X, Awadalla (Principal Investigator) P, Hodgkinson A, Li Y, Shi (Principal Investigator) X, Quitadamo A, Lunter (Principal Investigator) G, McVean (Principal Investigator) (Co-Chair) GA, Marchini (Principal Investigator) JL, Myers (Principal Investigator) S, Churchhouse C, Delaneau O, Gupta-Hinch A, Kretzschmar W, Iqbal Z, Mathieson I, Menelaou A, Rimmer A, Xifara DK, Oleksyk (Principal Investigator) TK, Fu (Principal Investigator) Y, Liu X, Xiong M, Jorde (Principal Investigator) L, Witherspoon D, Xing J, Eichler (Principal Investigator) EE, Browning (Principal Investigator) BL, Browning (Principal Investigator) SR, Hormozdiari F, Sudmant PH, Khurana (Principal Investigator) E, Durbin (Principal Investigator) RM, Hurles (Principal Investigator) ME, Tyler-Smith (Principal Investigator) C, Albers CA, Ayub Q, Balasubramaniam S, Chen Y, Colonna V, Danecek P, Jostins L, Keane TM, McCarthy S, Walter K, Xue Y, Gerstein (Principal Investigator) MB, Abyzov A, Balasubramanian S, Chen J, Clarke D, Fu Y, Harmanci AO, Jin M, Lee D, Liu J, Jasmine Mu X, Zhang J, Zhang Y, Li Y, Luo R, Zhu H, Alkan C, Dal E, Kahveci F, Marth (Principal Investigator) GT, Garrison EP, Kural D, Lee W-P, Ward AN, Wu J, Zhang M, McCarroll (Principal Investigator) SA, Handsaker (Project Leader), Robert E, Altshuler DM, Banks E, del Angel G, Genovese G, Hartl C, Li H, Kashin S, Nemesh JC, Shakir K, Yoon (Principal Investigator) SC, Lihm J, Makarov V, Degenhardt J, Korbel (Principal Investigator) (Co-Chair) JO, Fritz MH, Meiers S, Raeder B, Rausch T, Stütz AM, Flicek (Principal Investigator) P, Paolo Casale F, Clarke L, Smith RE, Stegle O, Zheng-Bradley X, Bentley (Principal Investigator) DR, Barnes B, Keira Cheetham R, Eberle M, Humphray S, Kahn S, Murray L, Shaw R, Lameijer E-W, Batzer (Principal Investigator) MA, Konkel MK, Walker JA, Ding (Principal Investigator) L, Hall I, Ye K, Lacroute P, Lee (Principal Investigator) (Co-Chair) C, Cerveira E, Malhotra A, Hwang J, Plewczynski D, Radew K, Romanovitch M, Zhang C, Craig (Principal Investigator) DW, Homer N, Church D, Xiao C, Sebat (Principal Investigator) J, Antaki D, Bafna V, Michaelson J, Ye K, Devine (Principal Investigator) SE, Gardner (Project Leader), Eugene J, Abecasis (Principal Investigator) GR, Kidd (Principal Investigator) JM, Mills (Principal Investigator) RE, Dayama G, Emery S, Jun G, Shi (Principal Investigator) X, Quitadamo A, Lunter (Principal Investigator) G, McVean (Principal Investigator) GA, Chen (Principle Investigator) K, Fan X, Chong Z, Chen T, Witherspoon D, Xing J, Eichler (Principal Investigator) (Co-Chair) EE, Chaisson MJ, Hormozdiari F, Huddleston J, Malig M, Nelson BJ, Sudmant PH, Parrish NF, Khurana (Principal Investigator) E, Hurles (Principal Investigator) ME, Blackburne B, Lindsay SJ, Ning Z, Walter K, Zhang Y, Gerstein (Principal Investigator) MB, Abyzov A, Chen J, Clarke D, Lam H, Jasmine Mu X, Sisu C, Zhang J, Zhang Y, Gibbs (Principal Investigator) (Co-Chair) RA, Yu (Project Leader), Fuli, Bainbridge M, Challis D, Evani US, Kovar C, Lu J, Muzny D, Nagaswamy U, Reid JG, Sabo A, Yu J, Guo X, Li W, Li Y, Wu R, Marth (Principal Investigator) (Co-Chair) GT, Garrison EP, Fung Leong W, Ward AN, del Angel G, DePristo MA, Gabriel SB, Gupta N, Hartl C, Poplin RE, Clark (Principal Investigator) AG, Rodriguez-Flores JL, Flicek (Principal Investigator) P, Clarke L, Smith RE, Zheng-Bradley X, MacArthur (Principal Investigator) DG, Mardis (Principal Investigator) ER, Fulton R, Koboldt DC, Gravel S, Bustamante (Principal Investigator) CD, Craig (Principal Investigator) DW, Christoforides A, Homer N, Izatt T, Sherry (Principal Investigator) ST, Xiao C, Dermitzakis (Principal Investigator) ET, Abecasis (Principal Investigator) GR, Min Kang H, McVean (Principal Investigator) GA, Gerstein (Principal Investigator) MB, Balasubramanian S, Habegger L, Yu (Principal Investigator) H, Flicek (Principal Investigator) P, Clarke L, Cunningham F, Dunham I, Zerbino D, Zheng-Bradley X, Lage (Principal Investigator) K, Berg Jespersen J, Horn H, Montgomery (Principal Investigator) SB, DeGorter MK, Khurana (Principal Investigator) E, Tyler-Smith (Principal Investigator) (Co-Chair) C, Chen Y, Colonna V, Xue Y, Gerstein (Principal Investigator) (Co-Chair) MB, Balasubramanian S, Fu Y, Kim D, Auton (Principal Investigator) A, Marcketta A, Desalle R, Narechania A, Wilson Sayres MA, Garrison EP, Handsaker RE, Kashin S, McCarroll SA, Rodriguez-Flores JL, Flicek (Principal Investigator) P, Clarke L, Zheng-Bradley X, Erlich Y, Gymrek M, Frederick Willems T, Bustamante (Principal Investigator) (Co-Chair) CD, Mendez FL, David Poznik G, Underhill PA, Lee C, Cerveira E, Malhotra A, Romanovitch M, Zhang C, Abecasis (Principal Investigator) GR, Coin (Principal Investigator) L, Shao H, Mittelman D, Tyler-Smith (Principal Investigator) (Co-Chair) C, Ayub Q, Banerjee R, Cerezo M, Chen Y, Fitzgerald TW, Louzada S, Massaia A, McCarthy S, Ritchie GR, Xue Y, Yang F, Gibbs (Principal Investigator) RA, Kovar C, Kalra D, Hale W, Muzny D, Reid JG, Wang (Principal Investigator) J, Dan X, Guo X, Li G, Li Y, Ye C, Zheng X, Altshuler DM, Flicek (Principal Investigator) (Co-Chair) P, Clarke (Project Lead), Laura, Zheng-Bradley X, Bentley (Principal Investigator) DR, Cox A, Humphray S, Kahn S, Sudbrak (Project Lead), Ralf, Albrecht MW, Lienhard M, Larson D, Craig (Principal Investigator) DW, Izatt T, Kurdoglu AA, Sherry (Principal Investigator) (Co-Chair) ST, Xiao C, Haussler (Principal Investigator) D, Abecasis (Principal Investigator) GR, McVean (Principal Investigator) GA, Durbin (Principal Investigator) RM, Balasubramaniam S, Keane TM, McCarthy S, Stalker J, Chakravarti (Co-Chair) A, Knoppers (Co-Chair) BM, Abecasis GR, Barnes KC, Beiswanger C, Burchard EG, Bustamante CD, Cai H, Cao H, Durbin RM, Gerry NP, Gharani N, Gibbs RA, Gignoux CR, Gravel S, Henn B, Jones D, Jorde L, Kaye JS, Keinan A, Kent A, Kerasidou A, Li Y, Mathias R, McVean GA, Moreno-Estrada A, Ossorio PN, Parker M, Resch AM, Rotimi CN, Royal, Charmaine D, Sandoval K, Su Y, Sudbrak R, Tian Z, Tishkoff S, Toji LH, Tyler-Smith C, Via M, Wang Y, Yang H, Yang L, Zhu J, Bodmer W, Bedoya G, Ruiz-Linares A, Cai Z, Gao Y, Chu J, Peltonen L, Garcia-Montero A, Orfao A, Dutil J, Martinez-Cruzado JC, Oleksyk TK, Barnes KC, Mathias RA, Hennis A, Watson H, McKenzie C, Qadri F, LaRocque R, Sabeti PC, Zhu J, Deng X, Sabeti PC, Asogun D, Folarin O, Happi C, Omoniwa O, Stremlau M, Tariyal R, Jallow M, Sisay Joof F, Corrah T, Rockett K, Kwiatkowski D, Kooner J, Tịnh Hiê’n T, Dunstan SJ, Thuy Hang N, Fonnie R, Garry R, Kanneh L, Moses L, Sabeti PC, Schieffelin J, Grant DS, Gallo C, Poletti G, Saleheen D, Rasheed A, Brooks LD, Felsenfeld AL, McEwen JE, Vaydylevich Y, Green ED, Duncanson A, Dunn M, Schloss JA, Wang J, Yang H, Auton A, Brooks LD, Durbin RM, Garrison EP, Min Kang H, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68.Google Scholar
  24. 24.
    Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E, et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci Data. 2017;4:170115.CrossRefGoogle Scholar
  25. 25.
    Turnbull C, Scott RH, Thomas E, Jones L, Murugaesu N, Pretty FB, et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. BMJ. 2018;361:k1687.CrossRefGoogle Scholar
  26. 26.
    Anonymous (2018) EU countries will cooperate in linking genomic databases across borders - digital single market - European Commission. In: Digital single market - European Commission. https://ec.europa.eu/digital-single-market/en/news/eu-countries-will-cooperate-linking-genomic-databases-across-borders. Accessed 1 Jul 2019.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Sven Fillinger
    • 1
  • Luis de la Garza
    • 1
  • Alexander Peltzer
    • 1
  • Oliver Kohlbacher
    • 2
    • 3
    • 4
    • 5
  • Sven Nahnsen
    • 1
    Email author
  1. 1.Quantitative Biology Center (QBiC)University of TübingenTübingenGermany
  2. 2.Center for BioinformaticsUniversity of TübingenTübingenGermany
  3. 3.Applied Bioinformatics, Department of Computer ScienceTübingenGermany
  4. 4.Institute for Translational BioinformaticsUniversity Hospital of TübingenTübingenGermany
  5. 5.Biomolecular InteractionsMax Planck Institute for Developmental BiologyTübingenGermany

Personalised recommendations