Skip to main content

Gene Set Approaches and Prognostic Subgroup Prediction

  • Chapter
  • First Online:
Genome Data Analysis

Part of the book series: Learning Materials in Biosciences ((LMB))

Abstract

In this chapter, we implement Gene Set Enrichment Analysis (GSEA) to analyze microarray data. We perform Kaplan-Meier survival analysis for the clustered genes obtained by microarray data clustering analysis and test the statistical significance of different prognoses between clusters. It provides an understanding of the correlation between biological interpretation and GO and pathway analysis of the clustered genes and an interpretation with GSEA of the clustered genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Theoretically, once the number of genes is decided, the possible number of gene sets is defined by combinatorics, i.e., “Stirling numbers.” However, this number is a very big number, resulting in a “computationally unfeasible” state. In addition, not all combinations are reasonable enough to have a biological meaning. Therefore, practically, a small portion of all Stirling numbers that are meaningful are testable.

  2. 2.

    ASCII (American Standard Code for Information Interchange) is a 7-bit character code to present an English character in computer. It has 128 codes. Codes 0 to 31 are used to control peripherals, such as printers, codes 32–47 are for all the characters, 48–57 are for numbers, and 65–90 are for alphabets.

  3. 3.

    Auto-formatting function can lead to “auto-error” when a gene name is entered in the Excel file (Zeeberg et al. 2004).

  4. 4.

    CR/LF, line feed format is OS-dependent.

  5. 5.

    Do not use hyphen “-” in the file name. It cannot be recognized in the GSEA input window due to some JAVA libraries.

  6. 6.

    Excel sends a warning that it has features unable to support tab-delimited files. Nevertheless, please select “Yes” to save.

Bibliography

  1. Cancer Genome Atlas Research Network (2011) Integrated genomic analyses of ovarian carcinoma. Nature 474(7353):609–615. https://doi.org/10.1038/nature10166

    Article  CAS  Google Scholar 

  2. Chiaretti S et al (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103(7):2771–2778. Epub 2003 Dec 18

    Article  CAS  PubMed  Google Scholar 

  3. Jacobsen A (2017) cgdsr: R-Based API for Accessing the MSKCC Cancer Genomics Data Server (CGDS). R package version 1.2.6. https://CRAN.R-project.org/package=cgdsr

  4. Liberzon A et al (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27(12):1739–1740. https://doi.org/10.1093/bioinformatics/btr260. Epub 2011 May 5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Mootha VK et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273

    Article  CAS  PubMed  Google Scholar 

  6. Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Therneau T (2015) _A package for survival analysis in S_. version 2.38, <URL: https://CRAN.R-project.org/package=survival>

  8. Zeeberg BR et al (2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics. BMC Bioinformatics 5:80

    Article  PubMed  PubMed Central  Google Scholar 

  9. Zeeberg BR et al (2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics. BMC Bioinformatics 5:80

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

1 Electronic Supplementary Material

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kim, J.H. (2019). Gene Set Approaches and Prognostic Subgroup Prediction. In: Genome Data Analysis. Learning Materials in Biosciences. Springer, Singapore. https://doi.org/10.1007/978-981-13-1942-6_8

Download citation

Publish with us

Policies and ethics