Finding Additive Biclusters with Random Background

Xiao, Jing; Wang, Lusheng; Liu, Xiaowen; Jiang, Tao

doi:10.1007/978-3-540-69068-9_25

Finding Additive Biclusters with Random Background

(Extended Abstract)

Jing Xiao¹,
Lusheng Wang²,
Xiaowen Liu³ &
…
Tao Jiang⁴

Conference paper

534 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5029))

Abstract

The biclustering problem has been extensively studied in many areas including e-commerce, data mining, machine learning, pattern recognition, statistics, and more recently in computational biology. Given an n ×m matrix A (n ≥ m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard under various mathematical models. In this paper, we study a probabilistic model of the implanted additive bicluster problem, where each element in the n×m background matrix is a random number from [0, L − 1], and a k×k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L − 1] with probability θ. We propose an O(n ² m) time voting algorithm to solve the problem. We show that for any constant δ such that \((1-\delta)(1-\theta)^2 -\frac 1 L >0\), when \(k \ge \max \left\{\frac 8 \alpha \sqrt{n\log n},~ \frac {8 \log n} c + \log (2L)\right\}\), where c is a constant number, the voting algorithm can correctly find the implanted bicluster with probability at least \(1 - \frac{9}{n^{2}}\). We also implement our algorithm as a software tool for finding novel biclusters in microarray gene expression data, called VOTE. The implementation incorporates several nontrivial ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with multiple (and overlapping) implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon, N., Krivelevich, M., Sudakov, B.: Finding a Large Hidden Clique in a Random Graph. Random Structures and Algorithms 13(3-4), 457–466 (1998)
Article MATH MathSciNet Google Scholar
Barkow, S., Bleuler, S., Prelić, A., Zimmermann, P., Zitzler, E.: BicAT: a biclustering analysis toolbox. Bioinformatics 22(10), 1282–1283 (2006)
Article Google Scholar
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of Sixth International Conference on Computational Molecular Biology (RECOMB), pp. 45–55. ACM Press, New York (2002)
Google Scholar
Berriz, G.F., King, O.D., Bryant, B., Sander, C., Roth, F.P.: Charactering gene sets with FuncAssociate. Bioinformatics 19, 2502–2504 (2003)
Article Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB 2000), pp. 93–103. AAAI Press, Menlo Park (2000)
Google Scholar
Feige, U., Krauthgamer, R.: Finding and certifying a large hidden clique in a semirandom graph. Random Structures and Algorithms 16(2), 195–208 (2000)
Article MATH MathSciNet Google Scholar
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to enviormental changes. Molecular Biology of the Cell 11, 4241–4257 (2000)
Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. J. of the American Statistical Association 67, 123–129 (1972)
Article Google Scholar
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing modular organization in the yeast transcriptional network. Nature Genetics 31, 370–377 (2002)
Google Scholar
Ihmels, J., Bergmann, S., Barkai, N.: Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13), 1993–2003 (2004)
Article Google Scholar
Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 13, 703–716 (2003)
Article Google Scholar
Kucera, L.: Expected complexity of graph partitioning problems. Disc. Appl. Math. 57, 193–212 (1995)
Article MATH MathSciNet Google Scholar
Li, H., Chen, X., Zhang, K., Jiang, T.: A general framework for biclustering gene expression data. Journal of Bioinformatics and Computational Biology 4(4), 911–933 (2006)
Article Google Scholar
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)
Article MathSciNet Google Scholar
Liu, X., Wang, L.: Computing the maximum similarity biclusters of gene expression data. Bioinformatics 23(1), 50–56 (2007)
Article Google Scholar
Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. In: Proceedings of the Fifteenth Annual Symposium on Combinatorial Pattern Matching, pp. 102–116 (2004)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)
Article Google Scholar
Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, Cambridge (1995)
MATH Google Scholar
Peeters, R.: The maximum edge biclique problem is NP-complete. Disc. Appl. Math. 131(3), 651–654 (2003)
Article MATH MathSciNet Google Scholar
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., Elkon, R.: EXPANDER - an integrative program suite for microarray data analysis. BMC Bioinformatics 6, 232 (2005)
Article Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, suppl. 1, 136–144 (2002)
Google Scholar
Westfall, P.H., Young, S.S.: Resampling-based multiple testing. Wiley, New York (1993)
Google Scholar
Yang, J., Wang, W., Wang, H., Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering, pp. 517–528 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University,
Jing Xiao
Department of Computer Science, City University of Hong Kong, Hong Kong
Lusheng Wang
Department of Computer Science, University of Western Ontario, London, Ontario, Canada, N6A 5B7
Xiaowen Liu
Department of Computer Science and Engineering, University of California, Riverside
Tao Jiang

Authors

Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Lusheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Paolo Ferragina Gad M. Landau

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, J., Wang, L., Liu, X., Jiang, T. (2008). Finding Additive Biclusters with Random Background. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-69068-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69066-5
Online ISBN: 978-3-540-69068-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics