Clustering Analysis of SAGE Transcription Profiles Using a Poisson Approach

Huang, Haiyan; Cai, Li; Wong, Wing H.

doi:10.1007/978-1-59745-454-4_14

Haiyan Huang²,
Li Cai³ &
Wing H. Wong⁴

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 387))

669 Accesses
7 Citations

Summary

To gain insights into the biological function and relevance of genes using serial analysis of gene expression (SAGE) transcription profiles, one essential method is to perform clustering analysis on genes. A successful clustering analysis depends on the use of effective distance or similarity measures. For this purpose, by considering the specific properties of SAGE technology, we modeled the SAGE data by Poisson statistics and developed two Poisson-based measures to assess similarity of gene expression profiles. By employing these two distances into a K-means clustering procedure, we further developed a software package to perform clustering analysis on SAGE data. The software implementing our Poisson-based algorithms can be downloaded from http://genome.dfci.harvard.edu/sager. Our algorithm is guaranteed to converge to a local maximum when Poisson likelihood-based measure is used. The results from simulation and experimental mouse retina data demonstrate that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reference

Blackshaw, S., Fraioli, R. E., Furukawa. T., and Cepko, C. L. (2001) Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes. Cell 107, 579–589.
Google Scholar
Zhang, L., Zhou, W., Velculescu, V. E., et al. (1997) Gene expression profiles in normal and cancer cells. Science 276, 1268–1272.
Google Scholar
Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. (1995) Serial analysis of gene expression. Science 270, 484–487.
Google Scholar
Buckhaults, P., Zhang, Z., Chen, Y. C., et al. (2003) Identifying tumor origin using a gene expression-based classification map. Cancer Res. 63, 4144–4149.
Google Scholar
Porter, D., Weremowicz, S., Chin, K., et al. (2003) A neural survival factor is a candidate oncogene in breast cancer. Proc Natl Acad Sci USA. 100, 10,931–10,936.
Google Scholar
Margulies, E. H. and Innis, J. W. (2000) eSAGE: managing and analysing data generated with serial analysis of gene expression (SAGE). Bioinformatics 16, 650–651.
Google Scholar
van Ruissen, F., Jansen, B. J., de Jongh, G. J., van Vlijmen-Willems, I. M., and Schalkwijk, J. (2002) Differential gene expression in premalignant human epidermis revealed by cluster analysis of serial analysis of gene expression (SAGE) libraries. FASEB J. 16, 246–248.
Google Scholar
Audic, S. and Claverie, J. M. (1997) The significance of digital gene expression profiles. Genome Res. 7, 986–995.
Google Scholar
Madden, S. L., Galella, E. A., Zhu, J., Bertelsen, A. H., and Beaudry, G. A. (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene, 15, 1079–1085.
Google Scholar
Man, M. Z., Wang, X., and Wang, Y. (2000) POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics. 16, 953–959.
Google Scholar
Blackshaw, S., Kuo, W. P., Park, P. J., et al. (2003) MicroSAGE is highly representative and reproducible but reveals major differences in gene expression among samples obtained from similar tissues. Genome Biol. 4, R17.
Google Scholar
Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427.
Google Scholar
Fraley, C. (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20, 270–281.
Google Scholar
Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. (2001) Model-based clustering and data transformation for gene expression data. Bioinformatics 17, 977–987.
Google Scholar
Fraley, C. and Raftery, A. E. (2002) Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97, 611–631.
Google Scholar
Cai, L., Huang, H., Blackshaw, S., Liu, J. S., Cepko, C. L., and Wong, W. H. (2004) Clustering analysis of SAGE data using a Poisson approach. Genome Biol. 5, R51.
Google Scholar
Ewens, W. J. and Grant, G. R. (2001) Statistical Methods in Bioinformatics. Springer Verlag, Germany.
Google Scholar
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.
Google Scholar
Hartigan, J. (1975) Clustering Algorithms. Wiley, New York.
Google Scholar
Celeux, G. and Govaert, G. (1992) A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis 14, 315–332.
Google Scholar
de Hoon, M. J. L., Imoto, S., Nolan, J., and Miyano, S. (2004) Open source clustering software. Bioinformatics 20, 1453–1454.
Google Scholar
Blackshaw, S., Harpavat, S., Trimarchi, J., et al. (2004) Genomic analysis of mouse retinal development. PLoS Biology 2, E247.
Google Scholar
Tseng, G. C. and Wong, W. H. (2004) A resampling method for tight clustering: with an application to microarray analysis. Biometrics 61, 10–16.
Google Scholar
Lash, A. E., Tolstoshev, C. M., Wagner, L., et al. (2000) SAGEmap: a public gene expression resource. Genome Res. 10, 1051–1060.
Google Scholar
Beissbarth, T., Hyde, L., Smyth, G. K., et al. (2004) Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 4(Suppl 20) 1:I31–I39.
Google Scholar

Download references

Acknowledgements

HeadingAcknowledgments The method described in this chapter is based on the original research paper published in Genome Biology (16). We thank Kyungpil Kim for help in generating the figure and tables.

Author information

Authors and Affiliations

Department of Statistics, University of California at Berkeley, Berkeley, CA
Haiyan Huang
University of California at Berkeley, Berkeley, CA
Li Cai
University of California at Berkeley, Berkeley, CA
Wing H. Wong

Authors

Haiyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Li Cai
View author publications
You can also search for this author in PubMed Google Scholar
Wing H. Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Life Sciences, Aalborg University, Aalborg, Denmark
Kåre Lehmann Nielsen

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Huang, H., Cai, L., Wong, W.H. (2008). Clustering Analysis of SAGE Transcription Profiles Using a Poisson Approach. In: Nielsen, K.L. (eds) Serial Analysis of Gene Expression (SAGE). Methods in Molecular Biology™, vol 387. Humana Press. https://doi.org/10.1007/978-1-59745-454-4_14

Download citation

DOI: https://doi.org/10.1007/978-1-59745-454-4_14
Publisher Name: Humana Press
Print ISBN: 978-1-58829-676-4
Online ISBN: 978-1-59745-454-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics