Abstract
With the invention of biotechnological high throughput methods like DNA microarrays and the analysis of the resulting huge amounts of biological data, clustering algorithms gain new popularity. In practice the question arises, which clustering algorithm as well as which parameter set generates the most promising results. Little work is addressed to the question of evaluating and comparing the clustering results, especially according to their biological relevance, as well on distinguishing biologically interesting clusters from less interesting ones. This paper presents two cluster validity indices intended to evaluate clusterings of gene expression data in a biological manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Azuaje, F.: A cluster validity framework for genome expression data. Bioinformatics 18(2), 319–320 (2001)
Beibarth, T., Speed, T.: GOstat: find statistically overexpressed Gene Ontologies within groups of genes. Bioinformatics 20(9), 1464–1465 (2004)
Bolshakova, N., Azuaje, F., Cunningham, P.: An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21(4), 451–455 (2004)
Cho, R.J., Huang, M., Campbell, M.J., Dong, H., Steinmetz, L., Sapinoso, L., Hampton, G., Elledge, S.J., Davis, R.W., Lockhart, D.J.: Transcriptional regulation and function during the human cell cycle. Nature Genetics 27(1), 48–54 (2001)
Davies, J.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Eisen, M., Spellman, P., Botstein, D., Brown, P.: Cluster analysis and display of genome-wide expression patterns. In: Proceedings of the National Academy of Sciences, USA, vol. 95, pp. 14863–14867 (1998)
Gene Lynx (2004), http://www.genelynx.org
Hvidsten, T.R., Laegreid, A., Komorowski, J.: Learning rule-based models of biological process from gene expression time profiles using Gene Ontology. Bioinformatics 19(9), 1116–1123 (2003)
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C.F., Trent, J.M., Staudt, L.M., Hudson Jr., J., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., Brown, P.O.: The transcriptional program in response of human fibroblasts to serum. Science 283, 83–87 (1999)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, Taiwan. ROCLING X (1998)
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Semantic similarity measures as tools for exploring the gene ontology. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 601–612 (2003)
Merz, P.: Clustering gene expression profiles with memetic algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 811–820. Springer, Heidelberg (2002)
Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2(6), 418–427 (2001)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)
Raychaudhuri, S., Altman, R.B.: A literature-based method for assessing the functional coherence of a gene group. Bioinformatics 19(3), 396–401 (2003)
Robinson, P.N., Wollstein, A., Böhme, U., Beattie, B.: Ontologizing gene-expression microarray data: characterizing clusters with gene ontology. Bioinformatics 20(6), 979–981 (2003)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational Applications in Math 20, 53–65 (1987)
Shah, N.H., Fedoroff, N.V.: CLENCH: a program for calculating Cluster ENriCHment using Gene Ontology. Bioinformatics 20(7), 1196–1197 (2004)
Speer, N., Merz, P., Spieth, C., Zell, A.: Clustering gene expression data with memetic algorithms based on minimum spanning trees. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2003), vol. 3, pp. 1848–1855. IEEE Press, Los Alamitos (2003)
Speer, N., Spieth, C., Zell, A.: A memetic clustering algorithm for the functional partition of genes based on the Gene Ontology. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), pp. 252–259. IEEE Press, Los Alamitos (2004)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. In: Proceedings of the National Academy of Sciences, USA, vol. 96, pp. 2907–2912 (1999)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
The Gene Ontology Consortium. The gene ontology (GO) database and informatics resource. Nucleic Acids Research 32, D258–D261 (2004)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Speer, N., Spiet, C., Zell, A. (2005). Biological Cluster Validity Indices Based on the Gene Ontology. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_39
Download citation
DOI: https://doi.org/10.1007/11552253_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)