Abstract
In this chapter, we implement Gene Set Enrichment Analysis (GSEA) to analyze microarray data. We perform Kaplan-Meier survival analysis for the clustered genes obtained by microarray data clustering analysis and test the statistical significance of different prognoses between clusters. It provides an understanding of the correlation between biological interpretation and GO and pathway analysis of the clustered genes and an interpretation with GSEA of the clustered genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Theoretically, once the number of genes is decided, the possible number of gene sets is defined by combinatorics, i.e., “Stirling numbers.” However, this number is a very big number, resulting in a “computationally unfeasible” state. In addition, not all combinations are reasonable enough to have a biological meaning. Therefore, practically, a small portion of all Stirling numbers that are meaningful are testable.
- 2.
ASCII (American Standard Code for Information Interchange) is a 7-bit character code to present an English character in computer. It has 128 codes. Codes 0 to 31 are used to control peripherals, such as printers, codes 32–47 are for all the characters, 48–57 are for numbers, and 65–90 are for alphabets.
- 3.
Auto-formatting function can lead to “auto-error” when a gene name is entered in the Excel file (Zeeberg et al. 2004).
- 4.
CR/LF, line feed format is OS-dependent.
- 5.
Do not use hyphen “-” in the file name. It cannot be recognized in the GSEA input window due to some JAVA libraries.
- 6.
Excel sends a warning that it has features unable to support tab-delimited files. Nevertheless, please select “Yes” to save.
Bibliography
Cancer Genome Atlas Research Network (2011) Integrated genomic analyses of ovarian carcinoma. Nature 474(7353):609–615. https://doi.org/10.1038/nature10166
Chiaretti S et al (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103(7):2771–2778. Epub 2003 Dec 18
Jacobsen A (2017) cgdsr: R-Based API for Accessing the MSKCC Cancer Genomics Data Server (CGDS). R package version 1.2.6. https://CRAN.R-project.org/package=cgdsr
Liberzon A et al (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27(12):1739–1740. https://doi.org/10.1093/bioinformatics/btr260. Epub 2011 May 5
Mootha VK et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273
Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550
Therneau T (2015) _A package for survival analysis in S_. version 2.38, <URL: https://CRAN.R-project.org/package=survival>
Zeeberg BR et al (2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics. BMC Bioinformatics 5:80
Zeeberg BR et al (2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics. BMC Bioinformatics 5:80
Author information
Authors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Kim, J.H. (2019). Gene Set Approaches and Prognostic Subgroup Prediction. In: Genome Data Analysis. Learning Materials in Biosciences. Springer, Singapore. https://doi.org/10.1007/978-981-13-1942-6_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-1942-6_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1941-9
Online ISBN: 978-981-13-1942-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)