Abstract
Data intensive disciplines, such as life sciences and medicine, are promoting vivid research activities in the area of data science. Modern technologies, such as high-throughput mass-spectrometry and sequencing, micro-arrays, high-resolution imaging, etc., produce enormous and continuously increasing amounts of data. Huge public databases provide access to aggregated and consolidated data on genome and protein sequences, biological pathways, diseases, anatomy atlases, and scientific literature. There has never been before more potentially available data to study biomedical systems, ranging from single cells to complete organisms. However, it is a non-trivial task to transform the vast amount of biomedical data into actionable, useful and usable information, triggering scientific progress and supporting patient management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andersen, T.F., Madsen, M., Jorgensen, J., Mellemkjaer, L., Olsen, J.H.: The Danish national hospital register - a valuable source of data for modern health sciences. Dan. Med. Bull. 46, 263–268 (1999)
Sathyanarayana, A., Srivastava, J., Fernandez-Luque, L.: The science of sweet dreams: predicting sleep efficiency from wearable device data. Computer 50, 30–38 (2017)
Holzinger, A.: Trends in interactive knowledge discovery for personalized medicine: cognitive science meets machine learning. IEEE Intell. Inform. Bull. 15, 6–14 (2014)
Trusheim, M.R., Berndt, E.R., Douglas, F.L.: Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat. Rev. Drug Discovery 6, 287–293 (2007)
Su, X., Kang, J., Fan, J., Levine, R.A., Yan, X.: Facilitating score and causal inference trees for large observational studies. J. Mach. Learn. Res. 13, 2955–2994 (2012)
Huppertz, B., Holzinger, A.: Biobanks – a source of large biological data sets: open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 317–330. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_18
Schulam, P., Saria, S.: Integrative analysis using coupled latent variable models for individualizing prognoses. J. Mach. Learn. Res. 17, 1–35 (2016)
Rost, B., Radivojac, P., Bromberg, Y.: Protein function in precision medicine: deep understanding with machine learning. FEBS Lett. 590, 2327–2341 (2016)
Ghahramani, Z.: Bayesian non-parametrics and the probabilistic approach to modelling. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 371, 20110553 (2013)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)
Houlsby, N., Huszar, F., Ghahramani, Z., Hernández-lobato, J.M.: Collaborative Gaussian processes for preference learning. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems (NIPS 2012), pp. 2096–2104 (2012)
McDermott, J.E., Wang, J., Mitchell, H., Webb-Robertson, B.J., Hafen, R., Ramey, J., Rodland, K.D.: Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data. Expert Opin. Med. Diagn. 7, 37–51 (2013)
Libbrecht, M.W., Noble, W.S.: Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015)
Holzinger, A.: Machine learning for health informatics. In: Holzinger, A. (ed.) Machine Learning for Health Informatics: State-of-the-Art and Future Challenges. LNCS, vol. 9605, pp. 1–24. Springer, Cham (2016). doi:10.1007/978-3-319-50478-0_1
Varshney, U., Chang, C.K.: Smart health and well-being. Computer 49, 11–13 (2016)
Tang, L., Song, P.X.: Fused lasso approach in regression coefficients clustering - learning parameter heterogeneity in data integration. J. Mach. Learn. Res. 17, 1–23 (2016)
Jeanquartier, F., Jean-Quartier, C., Schreck, T., Cemernek, D., Holzinger, A.: Integrating open data on cancer in support to tumor growth analysis. In: Renda, M.E., Bursa, M., Holzinger, A., Khuri, S. (eds.) ITBAM 2016. LNCS, vol. 9832, pp. 49–66. Springer, Cham (2016). doi:10.1007/978-3-319-43949-5_4
Gottweis, H., Zatloukal, K.: Biobank governance: trends and perspectives. Pathobiology 74, 206–211 (2007)
Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. (CSUR) 41, 1–41 (2008)
Lafon, S., Keller, Y., Coifman, R.R.: Data fusion and multicue data matching by diffusion maps. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1784–1797 (2006)
Pellegrini, M., Renda, M.E., Vecchio, A.: Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases. BMC Bioinformatics 13, S3 (2012)
Blanchet, L., Smolinska, A.: Data fusion in metabolomics and proteomics for biomarker discovery. In: Jung, K. (ed.) Statistical Analysis in Proteomics, pp. 209–223. Springer, New York (2016)
Holzinger, A.: Introduction to Machine Learning and Knowledge Extraction (MAKE). Mach. Learn. Knowl. Extr. 1(1), 1–20 (2017). doi:10.3390/make1010001
Acknowledgments
We are grateful for the great support of the ITBAM program committee, and particularly for the excellent work of Gabriela Wagner.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Holzinger, A., Bursa, M., Khuri, S., Renda, M.E. (2017). IT in Biology & Medical Informatics: On the Challenge of Understanding the Data Ecosystem. In: Bursa, M., Holzinger, A., Renda, M., Khuri, S. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2017. Lecture Notes in Computer Science(), vol 10443. Springer, Cham. https://doi.org/10.1007/978-3-319-64265-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-64265-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64264-2
Online ISBN: 978-3-319-64265-9
eBook Packages: Computer ScienceComputer Science (R0)