Skip to main content
Log in

Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Madeira S C, Oliveira A L. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1(1): 24-45.

    Article  Google Scholar 

  2. Cheng Y, Church G M. Biclustering of expression data. In Proc. the 8th Int. Conf. Intelligent Systems for Molecular Biology, Aug. 2000, pp.93-103.

  3. Yang J, Wang W, Wang H, Yu P S. Enhanced biclustering on expression data. In Proc. the 3rd IEEE Symposium on Bioinformatics and Bioengineering, Mar. 2000, pp.321-327.

  4. Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. Bioinformatics, 2003, 19(suppl. 2): 196-205.

    Google Scholar 

  5. Tang C, Zhang L, Zhang A, Ramanathan M. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In Proc. the 2nd IEEE Int. Symposium on Bioinformatics and Bioengineering, Nov. 2001, pp.41-48.

  6. Divina F, Aguilar-Ruize J. Biclustering of expression data with evolutionary computation. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(5): 590-602.

    Article  Google Scholar 

  7. Aggarwal C C, Procopiuc C, Wolf J L, Yu P S, Park J S. Fast algorithm for projected clustering. ACM SIGMOD Record, 1999, 28(2): 61-72.

    Article  Google Scholar 

  8. Yip K Y, Cheng D W, Ng M K. HARP: A practical projected clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1387-1397.

    Article  Google Scholar 

  9. Bouguessa M, Wang S. PCGEN: A practical approach to projected clustering and its application to gene expression data. In Proc. the IEEE Symposium on Computational Intelligence and Data Mining, April 2007, pp.661-667.

  10. Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 2002, 18(suppl. 1): 136-144.

    Article  Google Scholar 

  11. Ayadi W, Elloumi M, Hao J K. BicFinder: A biclustering algorithm for microarray data analysis. Knowledge and Information Systems, 2012, 30(2): 341-358.

    Article  Google Scholar 

  12. Vukićević M, Kirchner K, Delibašić B, Jovanović M, Ruhland J, SuknovićM. Finding best algorithmic components for clustering microarray data. Knowledge and Information Systems, 2013, 35(1): 111-130.

    Article  Google Scholar 

  13. Leyton-Brown K. Resource allocation in competitive multi-agent systems [Ph.D. Thesis]. Stanford University, 2003.

  14. Rothkopf M, Pekec A, Harstad R. Computationally manageable combinatorial auctions. Management Science, 1998, 44(8): 1131-1147.

    Article  MATH  Google Scholar 

  15. de Vries S, Vohra R. Combinatorial auctions: A survey. INFORMS Journal on Computing, 2003, 15(3): 284-309.

    Article  MATH  MathSciNet  Google Scholar 

  16. Nisan N. Bidding and allocation in combinatorial auctions. In Proc. the 2nd ACM Conference on Electronic Commerce, Oct. 2000, pp.1-12.

  17. Tenenholtz M. Some tractable cominatorial auctions. In Proc. the AAAI/IAAI, Jul. 2000, pp.98-103.

  18. Sandholm T. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 2002, 135(1/2): 1-54.

    Article  MATH  MathSciNet  Google Scholar 

  19. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001, 63(2): 411-423.

  20. Mohajer M, Englmeier K H, Schmid V J. A comparison of Gap statistic definitions with and without logarithm function. arXiv:1103.4767v1 [Stat ME], 2011.

  21. Armstrong S A, Stauton J E, Silveman L B et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 2002, 30(1): 41-47.

    Article  Google Scholar 

  22. Gordon G, Jensen R, Hsiao L et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62(17): 4963-4967.

    Google Scholar 

  23. Hubert L, Araie P. Comparing partitions. Journal of Classificastion, 1985, 2(1): 193-218.

    Article  Google Scholar 

  24. Duan K B, Rajapakse J C, Wang H, Azuaje F. Multiple SVM-RFE for gene selection in cancer classification with expression Amichai Painsky et al.: Exclusive Row Biclustering via Optimal Set Cover data. IEEE Transactions on NanoBioscience, 2005, 4(3): 228-234.

  25. Alba E, Garcia-Nieto J, Jourdan L, Talbi E G. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In Proc. the IEEE Congress on Evolutionary Computation, Sept. 2007, pp.284-290.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amichai Painsky.

Additional information

This research was funded in part by Israeli Science Foundation under Grant No. 1227/09 and by a grant to Amichai Painsky from the Israeli Center for Absorption in Science.

A preliminary version of the paper was published in the Proceedings of ICDM 2012.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 77 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Painsky, A., Rosset, S. Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression. J. Comput. Sci. Technol. 29, 423–435 (2014). https://doi.org/10.1007/s11390-014-1440-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1440-y

Keywords

Navigation