GeneClust

Do, Kim-Anh; Broom, Bradley; Wen, Sijin

doi:10.1007/0-387-21679-0_15

Kim-Anh Do,
Bradley Broom &
Sijin Wen

Part of the book series: Statistics for Biology and Health ((SBH))

1640 Accesses
5 Citations

Abstract

Two-way clustering techniques—such as hierarchical clustering, K-means clustering, tree-structured vector quantization, self-organizing maps, and principal components analysis—have been used to organize genes into groups or “clusters“ with similar behavior across relevant tissue samples or cell lines. However, these procedures seek a single global reordering of the samples or cell lines for all genes, and although they are effective in uncovering gross global structure, they are much less effective when applied to more complex clustering patterns (for example, where there are overlapping gene clusters). This chapter describes gene shaving (Hastie et al., 2000), a simple but effective method for identifying subsets of genes with coherent expression patterns and large variations across samples or conditions. After summarizing the gene-shaving methodology, we describe two software packages implementing the method: a small package written in S (usable in either S-Plus or R) and a considerably faster, mixed-language implementation with a graphical user interface intended for more applied use. The package can perform unsupervised, fully supervised, or partially supervised gene shaving, and the user is able to specify various parameters pertinent to the algorithm. The package outputs graphical representations of the extracted clusters (as colored heat maps) and diagnostic statistics. We then demonstrate how the latter tool can be used to analyze two published datasets (the Alon colon data and the NCI60 data).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA, 96:6745–6750.
Article Google Scholar
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559–584.
Article Google Scholar
Dudoit S, Fridlyand J, Speed TP (2001). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(1):77–87.
MathSciNet Google Scholar
Getz G, Levine E, Domany E (2000). Coupled two-way clustering analysis of gene microarray data. Cell Biology, 97:12079–12084.
Google Scholar
Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan W, Botstein D, Brown P (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1:research0003.1–0003.21.
Google Scholar
McLachlan GJ, Bean RW, Peel D (2002). EMMIX-GENE: A mixture-model based approach to the clustering of microarray expression data. Bioinformatics, 18:413–422.
Article Google Scholar
Ross DT, Scherf U, Eisen MB, Perou CM, Reese C, Spellman P, Iyer V, Jeffrey SS, de Rijn MV, Waltham M, Pergamenschikov A, Lee JCF, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24:227–235.
Article Google Scholar
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN (2000). A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24:236–244.
Article Google Scholar
Tibshirani R, Walther G, Hastie T (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63:411–423.
Article MATH MathSciNet Google Scholar
Tusher VG, Tibshirani R, Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA, 98(9):5116–5121.
Article MATH Google Scholar

Download references

Authors

Kim-Anh Do
View author publications
You can also search for this author in PubMed Google Scholar
Bradley Broom
View author publications
You can also search for this author in PubMed Google Scholar
Sijin Wen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departments of Oncology, Biostatistics,and Pathology, Johns Hopkins University, Baltimore, MD, 21205-2013, USA
Giovanni Parmigiani
Departments of Oncology and Biostatistics, Johns Hopkins University, Baltimore, MD, 21205-2013, USA
Elizabeth S. Garrett
Departments of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205-2013, USA
Rafael A. Irizarry
Departments of Biostatistics and Epidemiology, Johns Hopkins University, Baltimore, MD, 21205-2013, USA
Scott L. Zeger

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Do, KA., Broom, B., Wen, S. (2003). GeneClust. In: Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds) The Analysis of Gene Expression Data. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/0-387-21679-0_15

Download citation

DOI: https://doi.org/10.1007/0-387-21679-0_15
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-95577-3
Online ISBN: 978-0-387-21679-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics