Abstract
Two-way clustering techniques—such as hierarchical clustering, K-means clustering, tree-structured vector quantization, self-organizing maps, and principal components analysis—have been used to organize genes into groups or “clusters“ with similar behavior across relevant tissue samples or cell lines. However, these procedures seek a single global reordering of the samples or cell lines for all genes, and although they are effective in uncovering gross global structure, they are much less effective when applied to more complex clustering patterns (for example, where there are overlapping gene clusters). This chapter describes gene shaving (Hastie et al., 2000), a simple but effective method for identifying subsets of genes with coherent expression patterns and large variations across samples or conditions. After summarizing the gene-shaving methodology, we describe two software packages implementing the method: a small package written in S (usable in either S-Plus or R) and a considerably faster, mixed-language implementation with a graphical user interface intended for more applied use. The package can perform unsupervised, fully supervised, or partially supervised gene shaving, and the user is able to specify various parameters pertinent to the algorithm. The package outputs graphical representations of the extracted clusters (as colored heat maps) and diagnostic statistics. We then demonstrate how the latter tool can be used to analyze two published datasets (the Alon colon data and the NCI60 data).
Keywords
- Data Frame
- Gene Expression Matrix
- Super Gene
- Color Insert
- Principal Component Weight
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA, 96:6745–6750.
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559–584.
Dudoit S, Fridlyand J, Speed TP (2001). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(1):77–87.
Getz G, Levine E, Domany E (2000). Coupled two-way clustering analysis of gene microarray data. Cell Biology, 97:12079–12084.
Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan W, Botstein D, Brown P (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1:research0003.1–0003.21.
McLachlan GJ, Bean RW, Peel D (2002). EMMIX-GENE: A mixture-model based approach to the clustering of microarray expression data. Bioinformatics, 18:413–422.
Ross DT, Scherf U, Eisen MB, Perou CM, Reese C, Spellman P, Iyer V, Jeffrey SS, de Rijn MV, Waltham M, Pergamenschikov A, Lee JCF, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24:227–235.
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN (2000). A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24:236–244.
Tibshirani R, Walther G, Hastie T (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63:411–423.
Tusher VG, Tibshirani R, Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA, 98(9):5116–5121.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Do, KA., Broom, B., Wen, S. (2003). GeneClust. In: Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds) The Analysis of Gene Expression Data. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/0-387-21679-0_15
Download citation
DOI: https://doi.org/10.1007/0-387-21679-0_15
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-95577-3
Online ISBN: 978-0-387-21679-9
eBook Packages: Springer Book Archive