Skip to main content

Finding Additive Biclusters with Random Background

(Extended Abstract)

  • Conference paper
  • 534 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5029))

Abstract

The biclustering problem has been extensively studied in many areas including e-commerce, data mining, machine learning, pattern recognition, statistics, and more recently in computational biology. Given an n ×m matrix A (n ≥ m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard under various mathematical models. In this paper, we study a probabilistic model of the implanted additive bicluster problem, where each element in the n×m background matrix is a random number from [0, L − 1], and a k×k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L − 1] with probability θ. We propose an O(n 2 m) time voting algorithm to solve the problem. We show that for any constant δ such that \((1-\delta)(1-\theta)^2 -\frac 1 L >0\), when \(k \ge \max \left\{\frac 8 \alpha \sqrt{n\log n},~ \frac {8 \log n} c + \log (2L)\right\}\), where c is a constant number, the voting algorithm can correctly find the implanted bicluster with probability at least \(1 - \frac{9}{n^{2}}\). We also implement our algorithm as a software tool for finding novel biclusters in microarray gene expression data, called VOTE. The implementation incorporates several nontrivial ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with multiple (and overlapping) implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Krivelevich, M., Sudakov, B.: Finding a Large Hidden Clique in a Random Graph. Random Structures and Algorithms 13(3-4), 457–466 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  2. Barkow, S., Bleuler, S., Prelić, A., Zimmermann, P., Zitzler, E.: BicAT: a biclustering analysis toolbox. Bioinformatics 22(10), 1282–1283 (2006)

    Article  Google Scholar 

  3. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of Sixth International Conference on Computational Molecular Biology (RECOMB), pp. 45–55. ACM Press, New York (2002)

    Google Scholar 

  4. Berriz, G.F., King, O.D., Bryant, B., Sander, C., Roth, F.P.: Charactering gene sets with FuncAssociate. Bioinformatics 19, 2502–2504 (2003)

    Article  Google Scholar 

  5. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB 2000), pp. 93–103. AAAI Press, Menlo Park (2000)

    Google Scholar 

  6. Feige, U., Krauthgamer, R.: Finding and certifying a large hidden clique in a semirandom graph. Random Structures and Algorithms 16(2), 195–208 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  7. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to enviormental changes. Molecular Biology of the Cell 11, 4241–4257 (2000)

    Google Scholar 

  8. Hartigan, J.A.: Direct clustering of a data matrix. J. of the American Statistical Association 67, 123–129 (1972)

    Article  Google Scholar 

  9. Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing modular organization in the yeast transcriptional network. Nature Genetics 31, 370–377 (2002)

    Google Scholar 

  10. Ihmels, J., Bergmann, S., Barkai, N.: Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13), 1993–2003 (2004)

    Article  Google Scholar 

  11. Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 13, 703–716 (2003)

    Article  Google Scholar 

  12. Kucera, L.: Expected complexity of graph partitioning problems. Disc. Appl. Math. 57, 193–212 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  13. Li, H., Chen, X., Zhang, K., Jiang, T.: A general framework for biclustering gene expression data. Journal of Bioinformatics and Computational Biology 4(4), 911–933 (2006)

    Article  Google Scholar 

  14. Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)

    Article  MathSciNet  Google Scholar 

  15. Liu, X., Wang, L.: Computing the maximum similarity biclusters of gene expression data. Bioinformatics 23(1), 50–56 (2007)

    Article  Google Scholar 

  16. Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. In: Proceedings of the Fifteenth Annual Symposium on Combinatorial Pattern Matching, pp. 102–116 (2004)

    Google Scholar 

  17. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)

    Article  Google Scholar 

  18. Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  19. Peeters, R.: The maximum edge biclique problem is NP-complete. Disc. Appl. Math. 131(3), 651–654 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  20. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  21. Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., Elkon, R.: EXPANDER - an integrative program suite for microarray data analysis. BMC Bioinformatics 6, 232 (2005)

    Article  Google Scholar 

  22. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, suppl. 1, 136–144 (2002)

    Google Scholar 

  23. Westfall, P.H., Young, S.S.: Resampling-based multiple testing. Wiley, New York (1993)

    Google Scholar 

  24. Yang, J., Wang, W., Wang, H., Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering, pp. 517–528 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paolo Ferragina Gad M. Landau

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xiao, J., Wang, L., Liu, X., Jiang, T. (2008). Finding Additive Biclusters with Random Background. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69068-9_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69066-5

  • Online ISBN: 978-3-540-69068-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics