Summary
To gain insights into the biological function and relevance of genes using serial analysis of gene expression (SAGE) transcription profiles, one essential method is to perform clustering analysis on genes. A successful clustering analysis depends on the use of effective distance or similarity measures. For this purpose, by considering the specific properties of SAGE technology, we modeled the SAGE data by Poisson statistics and developed two Poisson-based measures to assess similarity of gene expression profiles. By employing these two distances into a K-means clustering procedure, we further developed a software package to perform clustering analysis on SAGE data. The software implementing our Poisson-based algorithms can be downloaded from http://genome.dfci.harvard.edu/sager. Our algorithm is guaranteed to converge to a local maximum when Poisson likelihood-based measure is used. The results from simulation and experimental mouse retina data demonstrate that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Reference
Blackshaw, S., Fraioli, R. E., Furukawa. T., and Cepko, C. L. (2001) Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes. Cell 107, 579–589.
Zhang, L., Zhou, W., Velculescu, V. E., et al. (1997) Gene expression profiles in normal and cancer cells. Science 276, 1268–1272.
Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. (1995) Serial analysis of gene expression. Science 270, 484–487.
Buckhaults, P., Zhang, Z., Chen, Y. C., et al. (2003) Identifying tumor origin using a gene expression-based classification map. Cancer Res. 63, 4144–4149.
Porter, D., Weremowicz, S., Chin, K., et al. (2003) A neural survival factor is a candidate oncogene in breast cancer. Proc Natl Acad Sci USA. 100, 10,931–10,936.
Margulies, E. H. and Innis, J. W. (2000) eSAGE: managing and analysing data generated with serial analysis of gene expression (SAGE). Bioinformatics 16, 650–651.
van Ruissen, F., Jansen, B. J., de Jongh, G. J., van Vlijmen-Willems, I. M., and Schalkwijk, J. (2002) Differential gene expression in premalignant human epidermis revealed by cluster analysis of serial analysis of gene expression (SAGE) libraries. FASEB J. 16, 246–248.
Audic, S. and Claverie, J. M. (1997) The significance of digital gene expression profiles. Genome Res. 7, 986–995.
Madden, S. L., Galella, E. A., Zhu, J., Bertelsen, A. H., and Beaudry, G. A. (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene, 15, 1079–1085.
Man, M. Z., Wang, X., and Wang, Y. (2000) POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics. 16, 953–959.
Blackshaw, S., Kuo, W. P., Park, P. J., et al. (2003) MicroSAGE is highly representative and reproducible but reveals major differences in gene expression among samples obtained from similar tissues. Genome Biol. 4, R17.
Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427.
Fraley, C. (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20, 270–281.
Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. (2001) Model-based clustering and data transformation for gene expression data. Bioinformatics 17, 977–987.
Fraley, C. and Raftery, A. E. (2002) Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97, 611–631.
Cai, L., Huang, H., Blackshaw, S., Liu, J. S., Cepko, C. L., and Wong, W. H. (2004) Clustering analysis of SAGE data using a Poisson approach. Genome Biol. 5, R51.
Ewens, W. J. and Grant, G. R. (2001) Statistical Methods in Bioinformatics. Springer Verlag, Germany.
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.
Hartigan, J. (1975) Clustering Algorithms. Wiley, New York.
Celeux, G. and Govaert, G. (1992) A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis 14, 315–332.
de Hoon, M. J. L., Imoto, S., Nolan, J., and Miyano, S. (2004) Open source clustering software. Bioinformatics 20, 1453–1454.
Blackshaw, S., Harpavat, S., Trimarchi, J., et al. (2004) Genomic analysis of mouse retinal development. PLoS Biology 2, E247.
Tseng, G. C. and Wong, W. H. (2004) A resampling method for tight clustering: with an application to microarray analysis. Biometrics 61, 10–16.
Lash, A. E., Tolstoshev, C. M., Wagner, L., et al. (2000) SAGEmap: a public gene expression resource. Genome Res. 10, 1051–1060.
Beissbarth, T., Hyde, L., Smyth, G. K., et al. (2004) Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 4(Suppl 20) 1:I31–I39.
Acknowledgements
HeadingAcknowledgments The method described in this chapter is based on the original research paper published in Genome Biology (16). We thank Kyungpil Kim for help in generating the figure and tables.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Huang, H., Cai, L., Wong, W.H. (2008). Clustering Analysis of SAGE Transcription Profiles Using a Poisson Approach. In: Nielsen, K.L. (eds) Serial Analysis of Gene Expression (SAGE). Methods in Molecular Biology™, vol 387. Humana Press. https://doi.org/10.1007/978-1-59745-454-4_14
Download citation
DOI: https://doi.org/10.1007/978-1-59745-454-4_14
Publisher Name: Humana Press
Print ISBN: 978-1-58829-676-4
Online ISBN: 978-1-59745-454-4
eBook Packages: Springer Protocols