Skip to main content


  • Chapter

Part of the Statistics for Biology and Health book series (SBH)


Two-way clustering techniques—such as hierarchical clustering, K-means clustering, tree-structured vector quantization, self-organizing maps, and principal components analysis—have been used to organize genes into groups or “clusters“ with similar behavior across relevant tissue samples or cell lines. However, these procedures seek a single global reordering of the samples or cell lines for all genes, and although they are effective in uncovering gross global structure, they are much less effective when applied to more complex clustering patterns (for example, where there are overlapping gene clusters). This chapter describes gene shaving (Hastie et al., 2000), a simple but effective method for identifying subsets of genes with coherent expression patterns and large variations across samples or conditions. After summarizing the gene-shaving methodology, we describe two software packages implementing the method: a small package written in S (usable in either S-Plus or R) and a considerably faster, mixed-language implementation with a graphical user interface intended for more applied use. The package can perform unsupervised, fully supervised, or partially supervised gene shaving, and the user is able to specify various parameters pertinent to the algorithm. The package outputs graphical representations of the extracted clusters (as colored heat maps) and diagnostic statistics. We then demonstrate how the latter tool can be used to analyze two published datasets (the Alon colon data and the NCI60 data).


  • Data Frame
  • Gene Expression Matrix
  • Super Gene
  • Color Insert
  • Principal Component Weight

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/0-387-21679-0_15
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-0-387-21679-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   219.99
Price excludes VAT (USA)


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA, 96:6745–6750.

    CrossRef  Google Scholar 

  • Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559–584.

    CrossRef  Google Scholar 

  • Dudoit S, Fridlyand J, Speed TP (2001). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(1):77–87.

    MathSciNet  Google Scholar 

  • Getz G, Levine E, Domany E (2000). Coupled two-way clustering analysis of gene microarray data. Cell Biology, 97:12079–12084.

    Google Scholar 

  • Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan W, Botstein D, Brown P (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1:research0003.1–0003.21.

    Google Scholar 

  • McLachlan GJ, Bean RW, Peel D (2002). EMMIX-GENE: A mixture-model based approach to the clustering of microarray expression data. Bioinformatics, 18:413–422.

    CrossRef  Google Scholar 

  • Ross DT, Scherf U, Eisen MB, Perou CM, Reese C, Spellman P, Iyer V, Jeffrey SS, de Rijn MV, Waltham M, Pergamenschikov A, Lee JCF, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24:227–235.

    CrossRef  Google Scholar 

  • Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN (2000). A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24:236–244.

    CrossRef  Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63:411–423.

    CrossRef  MATH  MathSciNet  Google Scholar 

  • Tusher VG, Tibshirani R, Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA, 98(9):5116–5121.

    CrossRef  MATH  Google Scholar 

Download references


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2003 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Do, KA., Broom, B., Wen, S. (2003). GeneClust. In: Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds) The Analysis of Gene Expression Data. Statistics for Biology and Health. Springer, New York, NY.

Download citation

  • DOI:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-95577-3

  • Online ISBN: 978-0-387-21679-9

  • eBook Packages: Springer Book Archive