Real-time Analysis of Patient Cohorts

Schüler, Ricarda

doi:10.1007/978-3-319-03035-7_6

Ricarda Schüler⁴

Part of the book series: In-Memory Data Management Research ((IMDM))

2544 Accesses

Abstract

In today’s medicine, many complex diseases, such as cancer, are treated with a standardized therapy plan. However, this standard plan will usually not work in the same way for everybody. Researchers assume that this might be due to certain variants in patients’ genes. Currently, researchers analyze large groups of patients with the statistical language R to investigate such cases further. But R is only a language and cannot store data, so that the data has to be loaded and often needs to be converted into the right format. In-memory databases could provide an alternative since they can store and handle a huge amount of data and allowanalyzing them using the Structured Query Language. In this work, I propose to use the integration of statistical analysis in the database in a cohort analysis environment. I will show how an in-memory database can be used to analyze patient groups on the basis of k-means and hierarchical clustering. Therefore, I will compare the performance of clustering algorithms executed within an in-memory database and the statistics environment R. To give researchers the possibility to use the algorithms I developed a prototype, which provides a visualization of the clustering results and parallel clustering with several genes. In the future, it could be possible to perform a statistical analysis in the patient cohort analysis field using an in-memory database and save time for loading and preparing data. Thus, patient cohorts could be analyzed directly in the database without a system change.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bell D et al. (2011) Integrated Genomic Analyses of Ovarian Carcinoma. https://tcga-data.nci.nih.gov/docs/publications/ov_2011/. Accessed Sep 23, 2013
Brazma A et al. (2000) Gene expression data analysis. FEBS letters 480(1):17–24
Article PubMed CAS Google Scholar
Chen L (2006) Ranking-Based Methods for Gene Selection in Microarray Data. PhD thesis, University of South Florida
Google Scholar
Färber F et al. (2012) SAP HANA Database: Data Management for Modern Business Applications. Sigmod Record 40(4):45–51
Article Google Scholar
Ganzer R et al. (2013) Fourteen-year oncological and functional outcomes of high-intensity focused ultrasound in localized prostate cancer. BJU International
Google Scholar
Glenn ND (2005) Cohort analysis, vol 5. SAGE Publications, Incorporated
Google Scholar
JiangD, Tang C, ZhangA (2004) Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans on Knowl and Data Eng 16(11):1370–1386
Google Scholar
Johnson SC (1967) Hierarchical Clustering Schemes. Psychometrika 2:241–254
Article Google Scholar
Kaatsch P et al. (2012) Krebs in Deutschland 2007/2008. http://www.krebsdaten.de/Krebs/DE/Content/Publikationen/Krebs_in_Deutschland/kid_2012/krebs_in_deutschland_2012.pdf?__blob=publicationFile. Accessed Sep 23, 2013
Kandoth C et al. (2013) Integrated genomic characterization of endometrial carcinoma. https://tcga-data.nci.nih.gov/docs/publications/ucec_2013/.Accessed Sep 23, 2013
Kanungo T et al. (2002) An efficient k-means clustering algorithm: analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24
Google Scholar
Keller G (2011) Mathematik in den Life Sciences. UTB GmbH
Google Scholar
MacQueen JB (1967) Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam LML, Neyman J (eds) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, vol 1, pp 281–297
Google Scholar
Muzny DM et al. (2012) Comprehensive molecular characterization of human colon and rectal cancer. https://tcga-data.nci.nih.gov/docs/publications/coadread_2012/. Accessed Sep 23, 2013
Network CGAR (2012) Comprehensive genomic characterization of squamous cell lung cancers. https://tcga-data.nci.nih.gov/docs/publications/lusc_2012/. Accessed Sep 23, 2013
Plattner H (2013) A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases. Springer
Google Scholar
Poeggel G (2009) Kurzlehrbuch Biologie -, 2nd edn. Georg Thieme Verlag, Stuttgart
Google Scholar
SAP AG (2012) SAP HANA Database - SQL Reference Manual. http://help.sap.com/hana/SAP_HANA_SQL_and_System_Views_Reference_en.pdf. Accessed Sep 23, 2013
SAP AG (2013) SAP HANA Predictive Analysis Library (PAL) Reference. http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf. Accessed Sep 23, 2013
Sherlock G (2001) Analysis of Large-scale Gene Expression Data. Briefings in Bioinformatics 2(4):350–362
Article PubMed CAS Google Scholar
The R Foundation for Statistical Computing (2006) What is R? http://www.r-project.org/about.html. Accessed Sep 23, 2013
Xu R et al. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 16(3):645–678
Article PubMed Google Scholar
Yeung KY et al. (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Article PubMed CAS Google Scholar
ZhaoW, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proceedings of the 1st International Conference on Cloud Computing, Springer-Verlag, Berlin, Heidelberg, pp 674–679
Google Scholar
Zheng CH et al. (2008) Gene Expression Data Classification Using Consensus Independent Component Analysis. Genomics, Proteomics & Bioinformatics 6(2):74–82
Article CAS Google Scholar
Zvelebil M, Baum J (2007) Understanding Bioinformatics. Garland Science
Google Scholar

Download references

Author information

Authors and Affiliations

Potsdam, Germany
Ricarda Schüler

Authors

Ricarda Schüler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricarda Schüler .

Editor information

Editors and Affiliations

Enterprise Platform and Integration Concepts, Hasso-Plattner-Institute, Potsdam, Germany
Hasso Plattner
Enterprise Platform and Integration Concepts Chair, Hasso Plattner Institute, Potsdam, Germany
Matthieu-P. Schapranow

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schüler, R. (2014). Real-time Analysis of Patient Cohorts. In: Plattner, H., Schapranow, MP. (eds) High-Performance In-Memory Genome Data Analysis. In-Memory Data Management Research. Springer, Cham. https://doi.org/10.1007/978-3-319-03035-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-03035-7_6
Published: 19 November 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03034-0
Online ISBN: 978-3-319-03035-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics