SUBic: A Scalable Unsupervised Framework for Discovering High Quality Biclusters

Lee, Jooil; Jin, Yanhua; Lee, Won Suk

doi:10.1007/s11390-013-1364-y

SUBic: A Scalable Unsupervised Framework for Discovering High Quality Biclusters

Regular Paper
Published: 05 July 2013

Volume 28, pages 636–646, (2013)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jooil Lee¹,
Yanhua Jin¹ &
Won Suk Lee¹

88 Accesses
Explore all metrics

Abstract

A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are very sensitive to input parameters and show poor scalability. This paper proposes a scalable unsupervised biclustering framework, SUBic, to find high quality constant-row biclusters in an expression matrix effectively. A one-dimensional clustering algorithm is proposed to partition the attributes, that is, columns of an expression matrix into disjoint groups based on the similarity of expression values. These groups form a set of short transactions and are used to discover a set of frequent itemsets each of which corresponds to a bicluster. However, a bicluster may include any attribute whose expression value is not similar enough to others, so a bicluster refinement is used to enhance the quality of a bicluster by removing those attributes based on its distribution of expression values. The performance of the proposed method is comparatively analyzed through a series of experiments on synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data

Article Open access 22 March 2016

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering

References

Pandey G, Kumar V, Steinbach M. Computational approaches for protein function prediction. In Bioinformatics: Computational Techniques and Engineering, Pan Y, Zomaya A Y (eds.), Wiley, 2010.
Pu S Y, Ronen K, Vlasblom J, Greenblatt J, Wodak S J. Local coherence in genetic interaction patterns reveals prevalent functional versatility. Bioinformatics, 2008, 24(20): 2376–2383.
Article Google Scholar
Abraham V C, Taylor D L, Haskins J R. High content screening applied to large-scale cell biology. Trends in Biotechnology, 2004, 22(1): 15–22.
Article Google Scholar
Bleicher K H, Bohm H J, Muller K, Alanine A I. Hit and lead generation: Beyond high-throughput screening. Nature Review Drug Discovery, 2003, 2(5): 369–378.
Article Google Scholar
Cheng Y, Church G M. Biclustering of expression data. In Proc. the 8th International Conference on Intelligent Systems for Molecular Biology, August 2000, pp. 93–103.
Kotsiantis S B, Pintelas P E. Recent advances in clustering: A brief survey. WSEAS Transactions on Information Science and Applications, 2004, 1(1): 73–81.
Google Scholar
Madeira S C, Oliveira A L. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transcations on Computational Biology and Bioinformatics, 2004, 1(1): 24–45.
Article Google Scholar
Dalal M A, Harale N D. A survey on clustering in data mining. In Proc. International Conference and Workshop on Emerging Trends in Technology, February 2011, pp. 559–562.
Kantardzic M. Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, 2003, pp. 115–123.
Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 2006, 22(9): 1122–1129.
Article Google Scholar
Ben-Dor A, Chor B, Karp R, Yakhini Z. Discovering local structure in gene expression data: The order-preserving sub-matrix problem. In Proc. the 6th Annual International Conference on Computational Biology, April 2002, pp. 49–57.
Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proceeding of the National Academy of Sciences of the United States of America, 2000, 97(22): 12 079–12 084.
Article Google Scholar
Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E, 2003, 67(3): 031 902.
Article Google Scholar
Okada Y, Fujibuchi W, Horton P. Module discovery in gene expression data using closed itemset mining algorithm. In Proc. the 17th International Conference on Genome Informatics, December 2006.
Pandey G, Atluri G, Steinbach M, Myers C L, Kumar V. An association analysis approach to biclustering. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.677-686.
Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th International Conference on Very Large Data Bases, September 1994, pp.487-499.
Tang C, Zhang L, Zhang A, Ramanathan M. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In Proc. the 2nd International Symposium on Bioinformatics and Bioengineering Conference, November 2001, pp.41-48.
Busygin S, Jacobsen G, Krämer E. Double conjugated clustering applied to Leukemia microarray data. In Proc. the 2nd SIAM ICDM Workshop on Clustering High-Dimensional Data and its Applications, April 2002.
Yang J, Jiong Y, Wang H, Wang W, Yu P. Enhanced biclustering on expression data. In Proc. the 3rd IEEE Symposium on Bioinformatics and Bioengineering, March 2003, pp.321-327.
Mahfouz M A, Ismail M A. BIDENS: Iterative density based biclustering algorithm with application to gene expression analysis. Proc. World Academy of Science, Engineering and Technology, 2009, 37: 342–348.
Google Scholar
Gupta N, Aggarwal S. SISA: Seeded iterative signature algorithm for biclustering gene expression data. In Proc. IADIS European Conference on Data Mining, July 2008, pp.124-128.
Duffy D, Quiroz A. A permutation-based algorithm for block clustering. Journal of Classification, 1991, 8(1): 65–91.
Article MathSciNet Google Scholar
Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 2002, 18(Suppl.1): 136–144.
Article Google Scholar
Uno T, Asai T, Uchida Y, Arimura H. An efficient algorithm for enumerating closed patterns in transaction databases. In Lecture Notes in Computer Science 3245, Suzuki E, Arikawa S (eds.), Springer-Verlag, 2004, pp.16-31.
Li G J, Ma Q, Tang H B, Paterson A H, Xu Y. QUBIC: A qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Research, 2009, 37(15): e101.
Article Google Scholar
Gupta R, Rao N, Kumar V. Discovery of error-tolerant biclusters from noisy gene expression data. BMC Bioinformatics, 2011, 12(12).
Gasch A P, Huang M, Metzner S et al. Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Molecular Biology of the Cell, 2001, 12(10): 2987–3003.
Article Google Scholar
Triola M F, Goodman W M, Law R. Elementary Statistics (4th edition). Addison-Weslay, 1999.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, Seoul, 120-749, Korea
Jooil Lee, Yanhua Jin & Won Suk Lee

Authors

Jooil Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yanhua Jin
View author publications
You can also search for this author in PubMed Google Scholar
Won Suk Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Won Suk Lee.

Additional information

This work was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (MEST) of Korea under Grant No. 2011–0016648.

The preliminary version of the paper was published in the Proceedings of EDB2012.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOC 36 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, J., Jin, Y. & Lee, W.S. SUBic: A Scalable Unsupervised Framework for Discovering High Quality Biclusters. J. Comput. Sci. Technol. 28, 636–646 (2013). https://doi.org/10.1007/s11390-013-1364-y

Download citation

Received: 27 September 2012
Revised: 21 May 2013
Published: 05 July 2013
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11390-013-1364-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SUBic: A Scalable Unsupervised Framework for Discovering High Quality Biclusters

Abstract

Access this article

Similar content being viewed by others

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(DOC 36 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SUBic: A Scalable Unsupervised Framework for Discovering High Quality Biclusters

Abstract

Access this article

Similar content being viewed by others

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(DOC 36 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation