Skip to main content

Real-time Analysis of Patient Cohorts

  • Chapter
  • First Online:
High-Performance In-Memory Genome Data Analysis

Part of the book series: In-Memory Data Management Research ((IMDM))

  • 2544 Accesses

Abstract

In today’s medicine, many complex diseases, such as cancer, are treated with a standardized therapy plan. However, this standard plan will usually not work in the same way for everybody. Researchers assume that this might be due to certain variants in patients’ genes. Currently, researchers analyze large groups of patients with the statistical language R to investigate such cases further. But R is only a language and cannot store data, so that the data has to be loaded and often needs to be converted into the right format. In-memory databases could provide an alternative since they can store and handle a huge amount of data and allowanalyzing them using the Structured Query Language. In this work, I propose to use the integration of statistical analysis in the database in a cohort analysis environment. I will show how an in-memory database can be used to analyze patient groups on the basis of k-means and hierarchical clustering. Therefore, I will compare the performance of clustering algorithms executed within an in-memory database and the statistics environment R. To give researchers the possibility to use the algorithms I developed a prototype, which provides a visualization of the clustering results and parallel clustering with several genes. In the future, it could be possible to perform a statistical analysis in the patient cohort analysis field using an in-memory database and save time for loading and preparing data. Thus, patient cohorts could be analyzed directly in the database without a system change.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bell D et al. (2011) Integrated Genomic Analyses of Ovarian Carcinoma. https://tcga-data.nci.nih.gov/docs/publications/ov_2011/. Accessed Sep 23, 2013

  2. Brazma A et al. (2000) Gene expression data analysis. FEBS letters 480(1):17–24

    Article  PubMed  CAS  Google Scholar 

  3. Chen L (2006) Ranking-Based Methods for Gene Selection in Microarray Data. PhD thesis, University of South Florida

    Google Scholar 

  4. Färber F et al. (2012) SAP HANA Database: Data Management for Modern Business Applications. Sigmod Record 40(4):45–51

    Article  Google Scholar 

  5. Ganzer R et al. (2013) Fourteen-year oncological and functional outcomes of high-intensity focused ultrasound in localized prostate cancer. BJU International

    Google Scholar 

  6. Glenn ND (2005) Cohort analysis, vol 5. SAGE Publications, Incorporated

    Google Scholar 

  7. JiangD, Tang C, ZhangA (2004) Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans on Knowl and Data Eng 16(11):1370–1386

    Google Scholar 

  8. Johnson SC (1967) Hierarchical Clustering Schemes. Psychometrika 2:241–254

    Article  Google Scholar 

  9. Kaatsch P et al. (2012) Krebs in Deutschland 2007/2008. http://www.krebsdaten.de/Krebs/DE/Content/Publikationen/Krebs_in_Deutschland/kid_2012/krebs_in_deutschland_2012.pdf?__blob=publicationFile. Accessed Sep 23, 2013

  10. Kandoth C et al. (2013) Integrated genomic characterization of endometrial carcinoma. https://tcga-data.nci.nih.gov/docs/publications/ucec_2013/.Accessed Sep 23, 2013

  11. Kanungo T et al. (2002) An efficient k-means clustering algorithm: analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24

    Google Scholar 

  12. Keller G (2011) Mathematik in den Life Sciences. UTB GmbH

    Google Scholar 

  13. MacQueen JB (1967) Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam LML, Neyman J (eds) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, vol 1, pp 281–297

    Google Scholar 

  14. Muzny DM et al. (2012) Comprehensive molecular characterization of human colon and rectal cancer. https://tcga-data.nci.nih.gov/docs/publications/coadread_2012/. Accessed Sep 23, 2013

  15. Network CGAR (2012) Comprehensive genomic characterization of squamous cell lung cancers. https://tcga-data.nci.nih.gov/docs/publications/lusc_2012/. Accessed Sep 23, 2013

  16. Plattner H (2013) A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases. Springer

    Google Scholar 

  17. Poeggel G (2009) Kurzlehrbuch Biologie -, 2nd edn. Georg Thieme Verlag, Stuttgart

    Google Scholar 

  18. SAP AG (2012) SAP HANA Database - SQL Reference Manual. http://help.sap.com/hana/SAP_HANA_SQL_and_System_Views_Reference_en.pdf. Accessed Sep 23, 2013

  19. SAP AG (2013) SAP HANA Predictive Analysis Library (PAL) Reference. http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf. Accessed Sep 23, 2013

  20. Sherlock G (2001) Analysis of Large-scale Gene Expression Data. Briefings in Bioinformatics 2(4):350–362

    Article  PubMed  CAS  Google Scholar 

  21. The R Foundation for Statistical Computing (2006) What is R? http://www.r-project.org/about.html. Accessed Sep 23, 2013

  22. Xu R et al. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 16(3):645–678

    Article  PubMed  Google Scholar 

  23. Yeung KY et al. (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987

    Article  PubMed  CAS  Google Scholar 

  24. ZhaoW, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proceedings of the 1st International Conference on Cloud Computing, Springer-Verlag, Berlin, Heidelberg, pp 674–679

    Google Scholar 

  25. Zheng CH et al. (2008) Gene Expression Data Classification Using Consensus Independent Component Analysis. Genomics, Proteomics & Bioinformatics 6(2):74–82

    Article  CAS  Google Scholar 

  26. Zvelebil M, Baum J (2007) Understanding Bioinformatics. Garland Science

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricarda Schüler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Schüler, R. (2014). Real-time Analysis of Patient Cohorts. In: Plattner, H., Schapranow, MP. (eds) High-Performance In-Memory Genome Data Analysis. In-Memory Data Management Research. Springer, Cham. https://doi.org/10.1007/978-3-319-03035-7_6

Download citation

Publish with us

Policies and ethics