Skip to main content

Clustering Analysis of SAGE Transcription Profiles Using a Poisson Approach

  • Protocol
Book cover Serial Analysis of Gene Expression (SAGE)

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 387))

Summary

To gain insights into the biological function and relevance of genes using serial analysis of gene expression (SAGE) transcription profiles, one essential method is to perform clustering analysis on genes. A successful clustering analysis depends on the use of effective distance or similarity measures. For this purpose, by considering the specific properties of SAGE technology, we modeled the SAGE data by Poisson statistics and developed two Poisson-based measures to assess similarity of gene expression profiles. By employing these two distances into a K-means clustering procedure, we further developed a software package to perform clustering analysis on SAGE data. The software implementing our Poisson-based algorithms can be downloaded from http://genome.dfci.harvard.edu/sager. Our algorithm is guaranteed to converge to a local maximum when Poisson likelihood-based measure is used. The results from simulation and experimental mouse retina data demonstrate that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reference

  • Blackshaw, S., Fraioli, R. E., Furukawa. T., and Cepko, C. L. (2001) Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes. Cell 107, 579–589.

    Google Scholar 

  • Zhang, L., Zhou, W., Velculescu, V. E., et al. (1997) Gene expression profiles in normal and cancer cells. Science 276, 1268–1272.

    Google Scholar 

  • Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. (1995) Serial analysis of gene expression. Science 270, 484–487.

    Google Scholar 

  • Buckhaults, P., Zhang, Z., Chen, Y. C., et al. (2003) Identifying tumor origin using a gene expression-based classification map. Cancer Res. 63, 4144–4149.

    Google Scholar 

  • Porter, D., Weremowicz, S., Chin, K., et al. (2003) A neural survival factor is a candidate oncogene in breast cancer. Proc Natl Acad Sci USA. 100, 10,931–10,936.

    Google Scholar 

  • Margulies, E. H. and Innis, J. W. (2000) eSAGE: managing and analysing data generated with serial analysis of gene expression (SAGE). Bioinformatics 16, 650–651.

    Google Scholar 

  • van Ruissen, F., Jansen, B. J., de Jongh, G. J., van Vlijmen-Willems, I. M., and Schalkwijk, J. (2002) Differential gene expression in premalignant human epidermis revealed by cluster analysis of serial analysis of gene expression (SAGE) libraries. FASEB J. 16, 246–248.

    Google Scholar 

  • Audic, S. and Claverie, J. M. (1997) The significance of digital gene expression profiles. Genome Res. 7, 986–995.

    Google Scholar 

  • Madden, S. L., Galella, E. A., Zhu, J., Bertelsen, A. H., and Beaudry, G. A. (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene, 15, 1079–1085.

    Google Scholar 

  • Man, M. Z., Wang, X., and Wang, Y. (2000) POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics. 16, 953–959.

    Google Scholar 

  • Blackshaw, S., Kuo, W. P., Park, P. J., et al. (2003) MicroSAGE is highly representative and reproducible but reveals major differences in gene expression among samples obtained from similar tissues. Genome Biol. 4, R17.

    Google Scholar 

  • Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427.

    Google Scholar 

  • Fraley, C. (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20, 270–281.

    Google Scholar 

  • Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. (2001) Model-based clustering and data transformation for gene expression data. Bioinformatics 17, 977–987.

    Google Scholar 

  • Fraley, C. and Raftery, A. E. (2002) Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97, 611–631.

    Google Scholar 

  • Cai, L., Huang, H., Blackshaw, S., Liu, J. S., Cepko, C. L., and Wong, W. H. (2004) Clustering analysis of SAGE data using a Poisson approach. Genome Biol. 5, R51.

    Google Scholar 

  • Ewens, W. J. and Grant, G. R. (2001) Statistical Methods in Bioinformatics. Springer Verlag, Germany.

    Google Scholar 

  • Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.

    Google Scholar 

  • Hartigan, J. (1975) Clustering Algorithms. Wiley, New York.

    Google Scholar 

  • Celeux, G. and Govaert, G. (1992) A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis 14, 315–332.

    Google Scholar 

  • de Hoon, M. J. L., Imoto, S., Nolan, J., and Miyano, S. (2004) Open source clustering software. Bioinformatics 20, 1453–1454.

    Google Scholar 

  • Blackshaw, S., Harpavat, S., Trimarchi, J., et al. (2004) Genomic analysis of mouse retinal development. PLoS Biology 2, E247.

    Google Scholar 

  • Tseng, G. C. and Wong, W. H. (2004) A resampling method for tight clustering: with an application to microarray analysis. Biometrics 61, 10–16.

    Google Scholar 

  • Lash, A. E., Tolstoshev, C. M., Wagner, L., et al. (2000) SAGEmap: a public gene expression resource. Genome Res. 10, 1051–1060.

    Google Scholar 

  • Beissbarth, T., Hyde, L., Smyth, G. K., et al. (2004) Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 4(Suppl 20) 1:I31–I39.

    Google Scholar 

Download references

Acknowledgements

HeadingAcknowledgments The method described in this chapter is based on the original research paper published in Genome Biology (16). We thank Kyungpil Kim for help in generating the figure and tables.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Huang, H., Cai, L., Wong, W.H. (2008). Clustering Analysis of SAGE Transcription Profiles Using a Poisson Approach. In: Nielsen, K.L. (eds) Serial Analysis of Gene Expression (SAGE). Methods in Molecular Biology™, vol 387. Humana Press. https://doi.org/10.1007/978-1-59745-454-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-454-4_14

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-676-4

  • Online ISBN: 978-1-59745-454-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics