Abstract
In today’s medicine, many complex diseases, such as cancer, are treated with a standardized therapy plan. However, this standard plan will usually not work in the same way for everybody. Researchers assume that this might be due to certain variants in patients’ genes. Currently, researchers analyze large groups of patients with the statistical language R to investigate such cases further. But R is only a language and cannot store data, so that the data has to be loaded and often needs to be converted into the right format. In-memory databases could provide an alternative since they can store and handle a huge amount of data and allowanalyzing them using the Structured Query Language. In this work, I propose to use the integration of statistical analysis in the database in a cohort analysis environment. I will show how an in-memory database can be used to analyze patient groups on the basis of k-means and hierarchical clustering. Therefore, I will compare the performance of clustering algorithms executed within an in-memory database and the statistics environment R. To give researchers the possibility to use the algorithms I developed a prototype, which provides a visualization of the clustering results and parallel clustering with several genes. In the future, it could be possible to perform a statistical analysis in the patient cohort analysis field using an in-memory database and save time for loading and preparing data. Thus, patient cohorts could be analyzed directly in the database without a system change.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bell D et al. (2011) Integrated Genomic Analyses of Ovarian Carcinoma. https://tcga-data.nci.nih.gov/docs/publications/ov_2011/. Accessed Sep 23, 2013
Brazma A et al. (2000) Gene expression data analysis. FEBS letters 480(1):17–24
Chen L (2006) Ranking-Based Methods for Gene Selection in Microarray Data. PhD thesis, University of South Florida
Färber F et al. (2012) SAP HANA Database: Data Management for Modern Business Applications. Sigmod Record 40(4):45–51
Ganzer R et al. (2013) Fourteen-year oncological and functional outcomes of high-intensity focused ultrasound in localized prostate cancer. BJU International
Glenn ND (2005) Cohort analysis, vol 5. SAGE Publications, Incorporated
JiangD, Tang C, ZhangA (2004) Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans on Knowl and Data Eng 16(11):1370–1386
Johnson SC (1967) Hierarchical Clustering Schemes. Psychometrika 2:241–254
Kaatsch P et al. (2012) Krebs in Deutschland 2007/2008. http://www.krebsdaten.de/Krebs/DE/Content/Publikationen/Krebs_in_Deutschland/kid_2012/krebs_in_deutschland_2012.pdf?__blob=publicationFile. Accessed Sep 23, 2013
Kandoth C et al. (2013) Integrated genomic characterization of endometrial carcinoma. https://tcga-data.nci.nih.gov/docs/publications/ucec_2013/.Accessed Sep 23, 2013
Kanungo T et al. (2002) An efficient k-means clustering algorithm: analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24
Keller G (2011) Mathematik in den Life Sciences. UTB GmbH
MacQueen JB (1967) Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam LML, Neyman J (eds) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, vol 1, pp 281–297
Muzny DM et al. (2012) Comprehensive molecular characterization of human colon and rectal cancer. https://tcga-data.nci.nih.gov/docs/publications/coadread_2012/. Accessed Sep 23, 2013
Network CGAR (2012) Comprehensive genomic characterization of squamous cell lung cancers. https://tcga-data.nci.nih.gov/docs/publications/lusc_2012/. Accessed Sep 23, 2013
Plattner H (2013) A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases. Springer
Poeggel G (2009) Kurzlehrbuch Biologie -, 2nd edn. Georg Thieme Verlag, Stuttgart
SAP AG (2012) SAP HANA Database - SQL Reference Manual. http://help.sap.com/hana/SAP_HANA_SQL_and_System_Views_Reference_en.pdf. Accessed Sep 23, 2013
SAP AG (2013) SAP HANA Predictive Analysis Library (PAL) Reference. http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf. Accessed Sep 23, 2013
Sherlock G (2001) Analysis of Large-scale Gene Expression Data. Briefings in Bioinformatics 2(4):350–362
The R Foundation for Statistical Computing (2006) What is R? http://www.r-project.org/about.html. Accessed Sep 23, 2013
Xu R et al. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 16(3):645–678
Yeung KY et al. (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
ZhaoW, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proceedings of the 1st International Conference on Cloud Computing, Springer-Verlag, Berlin, Heidelberg, pp 674–679
Zheng CH et al. (2008) Gene Expression Data Classification Using Consensus Independent Component Analysis. Genomics, Proteomics & Bioinformatics 6(2):74–82
Zvelebil M, Baum J (2007) Understanding Bioinformatics. Garland Science
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Schüler, R. (2014). Real-time Analysis of Patient Cohorts. In: Plattner, H., Schapranow, MP. (eds) High-Performance In-Memory Genome Data Analysis. In-Memory Data Management Research. Springer, Cham. https://doi.org/10.1007/978-3-319-03035-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-03035-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03034-0
Online ISBN: 978-3-319-03035-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)