Linear Coherent Bi-cluster Discovery via Line Detection and Sample Majority Voting

Shi, Yi; Cai, Zhipeng; Lin, Guohui; Schuurmans, Dale

doi:10.1007/978-3-642-02026-1_7

Yi Shi¹⁹,
Zhipeng Cai¹⁹,
Guohui Lin¹⁹ &
…
Dale Schuurmans¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5573))

Included in the following conference series:

International Conference on Combinatorial Optimization and Applications

1189 Accesses
2 Citations

Abstract

Discovering groups of genes that share common expression profiles is an important problem in DNA microarray analysis. Unfortunately, standard bi-clustering algorithms often fail to retrieve common expression groups because (1) genes only exhibit similar behaviors over a subset of conditions, and (2) genes may participate in more than one functional process and therefore belong to multiple groups. Many algorithms have been proposed to address these problems in the past decade; however, in addition to the above challenges most such algorithms are unable to discover linear coherent bi-clusters—a strict generalization of additive and multiplicative bi-clustering models. In this paper, we propose a novel bi-clustering algorithm that discovers linear coherent bi-clusters, based on first detecting linear correlations between pairs of gene expression profiles, then identifying groups by sample majority voting. Our experimental results on both synthetic and two real datasets, Saccharomyces cerevisiae and Arabidopsis thaliana, show significant performance improvements over previous methods. One intriguing aspect of our approach is that it can easily be extended to identify bi-clusters of more complex gene-gene correlations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering Local Structure in Gene Expression Data: The Order-Preserving Sub-Matrix Problem. In: Proc. of the 6th Annual International Conference on Computational Biology, pp. 49–57 (2002)
Google Scholar
Berriz, G.F., King, O.D., Bryant, B., Sander, C., Roth, F.P.: Characterizing Gene Sets with FuncAssociate. BioInformatics 19, 2502–2504 (2003)
Article Google Scholar
Causton, H.C., Quackenbush, J., Brazma, A.: Microarray Gene Expression Data Analysis: A Beginner’s Guide. Blackwell Publishing, Malden (2003)
Google Scholar
Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103 (2000)
Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proceedings of the National Academy of Sciences of the United States of America 95, 14863–14868 (1998)
Article Google Scholar
Gan, X., Liew, A.W.-C., Yan, H.: Discovering Biclusters in Gene Expression Data based on High-dimensional Linear Geometries. BMC Bioinformatics 9, 209 (2008)
Article Google Scholar
Hartigan, J.A.: Direct Clustering of a Data Matrix. Journal of the American Statistical Association 67, 123–129 (1972)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: A K-means Clustering Algorithm. Applied Statistics 28, 100–108 (1979)
Article MATH Google Scholar
Ihmels, J., Bergmann, S., Barkai, N.: Defining Transcription Modules Using Large Scale Gene Expression Data. Bioinformatics 20, 1993–2003 (2004)
Article Google Scholar
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing Modular Organization in the Yeast Transcriptional Network. Nature Genetics 31, 370–377 (2002)
Google Scholar
Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions. Genome Res. 13, 703–716 (2003)
Article Google Scholar
Liu, X., Wang, L.: Computing the Maximum Similarity Bi-clusters of Gene Expression Data. Bioinformatics 23, 50–56 (2006)
Article Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. Computational Biology and Bioinformatics 1, 24–45 (2004)
Google Scholar
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Publishers, Dordrecht (1996)
Book MATH Google Scholar
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A.: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Sheng, Q., Moreau, Y., De Moor, B.: Biclustering Microarray Data by Gibbs Sampling. Bioinformatics 19, 196–205 (2003)
Article Google Scholar
Sokal, R.R., Michener, C.D.: A Statistical Method for Evaluating Systematic Relationships. University of Kansas Science Bulletin 38, 1409–1438 (1958)
Google Scholar
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting Patterns of Gene Expression with Self-organizing Maps: Methods and Application to Hematopoietic Differentiation. Proceedings of the National Academy of Sciences of the United States of America 96, 2907–2912 (1999)
Article Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering Statistically Significant Biclusters in Gene Expression Data. Bioinformatics 18, 136–144 (2002)
Article Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic Determination of Genetic Network Architecture. Nature Genetics 22, 281–285 (1999)
Article Google Scholar
Westfall, P.H., Young, S.S.: Resampling-Based Multiple Testing. Wiley, New York (1993)
MATH Google Scholar
Zhou, X., Su, Z.: EasyGO: Gene Ontology-Based Annotation and Functional Enrichment Analysis Tool for Agronomical Species. BMC Genomics 8, 246 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada
Yi Shi, Zhipeng Cai, Guohui Lin & Dale Schuurmans

Authors

Yi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Guohui Lin
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schuurmans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at Dallas, 800 West Campbell Road, TX 75080-3021, Richardson, USA
Ding-Zhu Du
Chinese Academy of Sciences, Institute of Applied Mathematics, Zhong Guan Cun Dong Lu 55, 100190, Beijing, P. R. China
Xiaodong Hu
Department of Industrial and Systems Engineering, University of Florida, 303 Weil Hall, P.O. Box 116595, FL 32611-6595, Gainesville, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, Y., Cai, Z., Lin, G., Schuurmans, D. (2009). Linear Coherent Bi-cluster Discovery via Line Detection and Sample Majority Voting. In: Du, DZ., Hu, X., Pardalos, P.M. (eds) Combinatorial Optimization and Applications. COCOA 2009. Lecture Notes in Computer Science, vol 5573. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02026-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-02026-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02025-4
Online ISBN: 978-3-642-02026-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics