Abstract
Discovering groups of genes that share common expression profiles is an important problem in DNA microarray analysis. Unfortunately, standard bi-clustering algorithms often fail to retrieve common expression groups because (1) genes only exhibit similar behaviors over a subset of conditions, and (2) genes may participate in more than one functional process and therefore belong to multiple groups. Many algorithms have been proposed to address these problems in the past decade; however, in addition to the above challenges most such algorithms are unable to discover linear coherent bi-clusters—a strict generalization of additive and multiplicative bi-clustering models. In this paper, we propose a novel bi-clustering algorithm that discovers linear coherent bi-clusters, based on first detecting linear correlations between pairs of gene expression profiles, then identifying groups by sample majority voting. Our experimental results on both synthetic and two real datasets, Saccharomyces cerevisiae and Arabidopsis thaliana, show significant performance improvements over previous methods. One intriguing aspect of our approach is that it can easily be extended to identify bi-clusters of more complex gene-gene correlations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering Local Structure in Gene Expression Data: The Order-Preserving Sub-Matrix Problem. In: Proc. of the 6th Annual International Conference on Computational Biology, pp. 49–57 (2002)
Berriz, G.F., King, O.D., Bryant, B., Sander, C., Roth, F.P.: Characterizing Gene Sets with FuncAssociate. BioInformatics 19, 2502–2504 (2003)
Causton, H.C., Quackenbush, J., Brazma, A.: Microarray Gene Expression Data Analysis: A Beginner’s Guide. Blackwell Publishing, Malden (2003)
Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103 (2000)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proceedings of the National Academy of Sciences of the United States of America 95, 14863–14868 (1998)
Gan, X., Liew, A.W.-C., Yan, H.: Discovering Biclusters in Gene Expression Data based on High-dimensional Linear Geometries. BMC Bioinformatics 9, 209 (2008)
Hartigan, J.A.: Direct Clustering of a Data Matrix. Journal of the American Statistical Association 67, 123–129 (1972)
Hartigan, J.A., Wong, M.A.: A K-means Clustering Algorithm. Applied Statistics 28, 100–108 (1979)
Ihmels, J., Bergmann, S., Barkai, N.: Defining Transcription Modules Using Large Scale Gene Expression Data. Bioinformatics 20, 1993–2003 (2004)
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing Modular Organization in the Yeast Transcriptional Network. Nature Genetics 31, 370–377 (2002)
Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions. Genome Res. 13, 703–716 (2003)
Liu, X., Wang, L.: Computing the Maximum Similarity Bi-clusters of Gene Expression Data. Bioinformatics 23, 50–56 (2006)
Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. Computational Biology and Bioinformatics 1, 24–45 (2004)
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Publishers, Dordrecht (1996)
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A.: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics 22(9), 1122–1129 (2006)
Sheng, Q., Moreau, Y., De Moor, B.: Biclustering Microarray Data by Gibbs Sampling. Bioinformatics 19, 196–205 (2003)
Sokal, R.R., Michener, C.D.: A Statistical Method for Evaluating Systematic Relationships. University of Kansas Science Bulletin 38, 1409–1438 (1958)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting Patterns of Gene Expression with Self-organizing Maps: Methods and Application to Hematopoietic Differentiation. Proceedings of the National Academy of Sciences of the United States of America 96, 2907–2912 (1999)
Tanay, A., Sharan, R., Shamir, R.: Discovering Statistically Significant Biclusters in Gene Expression Data. Bioinformatics 18, 136–144 (2002)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic Determination of Genetic Network Architecture. Nature Genetics 22, 281–285 (1999)
Westfall, P.H., Young, S.S.: Resampling-Based Multiple Testing. Wiley, New York (1993)
Zhou, X., Su, Z.: EasyGO: Gene Ontology-Based Annotation and Functional Enrichment Analysis Tool for Agronomical Species. BMC Genomics 8, 246 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shi, Y., Cai, Z., Lin, G., Schuurmans, D. (2009). Linear Coherent Bi-cluster Discovery via Line Detection and Sample Majority Voting. In: Du, DZ., Hu, X., Pardalos, P.M. (eds) Combinatorial Optimization and Applications. COCOA 2009. Lecture Notes in Computer Science, vol 5573. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02026-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-02026-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02025-4
Online ISBN: 978-3-642-02026-1
eBook Packages: Computer ScienceComputer Science (R0)