Abstract
We propose a method for global validation of gene clusterings. The method selects a set of informative and non-redundant GO terms through an exploration of the Gene Ontology structure guided by mutual information. Our approach yields a global assessment of the clustering quality, and a higher level interpretation for the clusters, as it relates GO terms with specific clusters. We show that in two gene expression data sets our method offers an improvement over previous approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alexa, A., Rahnenfuhrer, J., Lengauer, T.: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13), 1600–1607 (2006)
Ashburner, M.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Beissbarth, T., Speed, T.P.: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20(9), 1464–1465 (2004)
Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO:TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20(18), 3710–3715 (2004)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2(1), 65–73 (1998)
Costa, I.G., Schliep, A.: On external indices for mixtures: validating mixtures of genes. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nurnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 662–669. Springer, Heidelberg (2005)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley - Interscience, Chichester (1991)
D’haeseleer, P.: How does gene expression clustering work? Nat. Biothech. 24(12), 1499–1501 (2005)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95(25), 14863–14868 (1998)
Gibbons, F.D., Roth, F.P.: Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation. Genome Res. 12(10), 1574–1581 (2002)
Grossmann, S., Bauer, S., Robinson, P.N., Vingron, M.: An improved statistic for detecting over-represented gene ontology annotations in gene sets. In: Research in Computational Molecular Biology, pp. 85–98. Springer, Heidelberg (2006)
Hubbert, L.J., Arabie, P.: Comparing partitions. Journal of Classification 2, 63–76 (1985)
Jia, M.H., LaRossa, R.A., Lee, J.-M., Rafalski, A., DeRose, E., Gonye, G., Xue, Z.: Global expression profiling of yeast treated with an inhibitor of amino acid biosynthesis, sulfometuron methyl. Physiol. Genomics 3(2), 83–92 (2000)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley, New York (2000)
McQueen, J.: Some methods of classification and analysis of multivariate observations. In: 5th Berkeley Symposium in Mathematics, Statistics and Probability, pp. 281–297 (1967)
Schliep, A., Costa, I.G., Steinhoff, C., Schonhuth, A.: Analyzing gene expression time-courses. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(3), 179–193 (2005)
Steuer, R., Humburg, P., Selbig, J.: Validation and functional annotation of expression-based clusters based on gene ontology. BMC Bioinformatics 7(1), 380 (2006)
Westfall, P., Young, S.: Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley-Interscience, Chichester (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Costa, I.G., de Souto, M.C.P., Schliep, A. (2007). Validating Gene Clusterings by Selecting Informative Gene Ontology Terms with Mutual Information. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-73731-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73730-8
Online ISBN: 978-3-540-73731-5
eBook Packages: Computer ScienceComputer Science (R0)