Skip to main content

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

Two-way clustering techniques—such as hierarchical clustering, K-means clustering, tree-structured vector quantization, self-organizing maps, and principal components analysis—have been used to organize genes into groups or “clusters“ with similar behavior across relevant tissue samples or cell lines. However, these procedures seek a single global reordering of the samples or cell lines for all genes, and although they are effective in uncovering gross global structure, they are much less effective when applied to more complex clustering patterns (for example, where there are overlapping gene clusters). This chapter describes gene shaving (Hastie et al., 2000), a simple but effective method for identifying subsets of genes with coherent expression patterns and large variations across samples or conditions. After summarizing the gene-shaving methodology, we describe two software packages implementing the method: a small package written in S (usable in either S-Plus or R) and a considerably faster, mixed-language implementation with a graphical user interface intended for more applied use. The package can perform unsupervised, fully supervised, or partially supervised gene shaving, and the user is able to specify various parameters pertinent to the algorithm. The package outputs graphical representations of the extracted clusters (as colored heat maps) and diagnostic statistics. We then demonstrate how the latter tool can be used to analyze two published datasets (the Alon colon data and the NCI60 data).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA, 96:6745–6750.

    Article  Google Scholar 

  • Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559–584.

    Article  Google Scholar 

  • Dudoit S, Fridlyand J, Speed TP (2001). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(1):77–87.

    MathSciNet  Google Scholar 

  • Getz G, Levine E, Domany E (2000). Coupled two-way clustering analysis of gene microarray data. Cell Biology, 97:12079–12084.

    Google Scholar 

  • Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan W, Botstein D, Brown P (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1:research0003.1–0003.21.

    Google Scholar 

  • McLachlan GJ, Bean RW, Peel D (2002). EMMIX-GENE: A mixture-model based approach to the clustering of microarray expression data. Bioinformatics, 18:413–422.

    Article  Google Scholar 

  • Ross DT, Scherf U, Eisen MB, Perou CM, Reese C, Spellman P, Iyer V, Jeffrey SS, de Rijn MV, Waltham M, Pergamenschikov A, Lee JCF, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24:227–235.

    Article  Google Scholar 

  • Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN (2000). A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24:236–244.

    Article  Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63:411–423.

    Article  MATH  MathSciNet  Google Scholar 

  • Tusher VG, Tibshirani R, Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA, 98(9):5116–5121.

    Article  MATH  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Do, KA., Broom, B., Wen, S. (2003). GeneClust. In: Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds) The Analysis of Gene Expression Data. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/0-387-21679-0_15

Download citation

  • DOI: https://doi.org/10.1007/0-387-21679-0_15

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-95577-3

  • Online ISBN: 978-0-387-21679-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics