COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification

  • Ankush MaindEmail author
  • Shital Raut


Biclustering is an increasingly used data mining technique for searching groups of co-expressed genes across the subset of experimental conditions from the gene-expression data. The group of co-expressed genes is present in the form of various patterns called a bicluster. A bicluster provides significant insights related to the functionality of genes and plays an important role in various clinical applications such as drug discovery, biomarker discovery, gene network analysis, gene identification, disease diagnosis, pathway analysis etc. This paper presents a novel unsupervised approach ‘COmprehensive Search for Column-Coherent Evolution Biclusters (COSCEB)’ for a comprehensive search of biologically significant column-coherent evolution biclusters. The concept of column subspace extraction from each gene pair and Longest Common Contiguous Subsequence (LCCS) is employed to identify significant biclusters. The experiments have been performed on both synthetic as well as real datasets. The performance of COSCEB is evaluated with the help of key issues. The issues are comprehensive search, Deep OPSM bicluster, bicluster types, bicluster accuracy, bicluster size, noise, overlapping, output nature, computational complexity and biologically significant biclusters. The performance of COSCEB is compared with six all-time famous biclustering algorithms SAMBA, OPSM, xMotif, Bimax, Deep OPSM- and UniBic. The result shows that the proposed approach performs effectively on most of the issues and extracts all possible biologically significant column-coherent evolution biclusters which are far more than other biclustering algorithms. Along with the proposed approach, we have also presented the case study which shows the application of significant biclusters for hub gene identification.


Biclustering bioinformatics coherent evolution bicluster hub gene gene-expression data machine learning 



The authors are thankful to the Department of Computer Science and Engineering, VNIT, Nagpur (MS), India, for providing the resources and support during the course of this research. They are also very thankful to the Ministry of Electronics and Information Technology (MeitY), Government of India, for financial assistance.

Supplementary material

12038_2019_9862_MOESM1_ESM.docx (33 kb)
Supplementary material 1 (DOCX 32 kb)


  1. Anand P, Kunnumakara AB, Sundaram C, Harikumar KB, Tharakan ST, Lai OS, Sung B and Aggarwal BB 2008 Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25 2097–2116CrossRefGoogle Scholar
  2. Baldi P and Hatfield GW 2011 DNA microarrays and gene expression: From experiments to data analysis and modeling (Cambridge: Cambridge University Press)Google Scholar
  3. Barkow S, Bleuler S, Prelic A, Zimmermann P and Zitzler E 2006 BicAT: A biclustering analysis toolbox. Bioinformatics 22 1282–1283CrossRefGoogle Scholar
  4. Behjati S and Tarpey PS 2013 What is next generation sequencing? Arch. Dis. Child.-Educ. Pract. 98 236–238CrossRefGoogle Scholar
  5. Ben-Dor A, Chor B, Karp R and Yakhini Z 2003 Discovering local structure in gene expression data: The order-preserving submatrix problem. J. Comput. Biol. 10 373–384CrossRefGoogle Scholar
  6. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM and Sherlock G 2004 GO: termFinder – open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20 3710–3715CrossRefGoogle Scholar
  7. Cheng Y and Church GM 2000 Biclustering of expression data. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8 93–103PubMedGoogle Scholar
  8. Cheng KO, Law NF, Siu WC and Liew AWC 2008 Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinf. 9 210CrossRefGoogle Scholar
  9. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ and Davis RW 1998 A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2 65–73CrossRefGoogle Scholar
  10. Gao BJ, Griffith OL, Ester M, Xiong H, Zhao Q and Jones SJ 2012 On the deep order-preserving submatrix problem: a best effort approach. IEEE Trans. Knowl. Data Eng. 24 309–325CrossRefGoogle Scholar
  11. Gao C, McDowell IC, Zhao S, Brown CD and Engelhardt BE 2016 Context specific and differential gene co-expression networks via Bayesian biclustering. PLoS Comput. Biol. 12 1004791CrossRefGoogle Scholar
  12. Gaur P and Chaturvedi A 2017 Clustering and candidate motif detection in exosomal miRNAs by application of machine learning algorithms. Interdiscip. Sci.: Comput. Life Sci. 1–9Google Scholar
  13. Hanna EM, Zaki N and Amin A 2015 Detecting protein complexes in protein interaction networks modeled as gene expression biclusters. PloS One 10 p.e0144163CrossRefGoogle Scholar
  14. Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W and Bijnens L 2010 FABIA: factor analysis for bicluster acquisition. Bioinformatics 26 1520–1527CrossRefGoogle Scholar
  15. Jagannatam A 2008 Mersenne Twister – A Pseudo Random Number Generator and its variants. George Mason University, Department of Electrical and Computer Engineering.Google Scholar
  16. Kaiser S and Leisch F 2008 A toolbox for bicluster analysis in R, Compstat 2008 – Proceedings in Computational Statistics, 2008 Heidelberg Physica Verlag, pp 201–208Google Scholar
  17. Liu W and Ye H 2014 Co-expression network analysis identifies transcriptional modules in the mouse liver. Mol. Genet. Genomics 289 847–853CrossRefGoogle Scholar
  18. Madeira SC and Oliveira AL 2004 Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 1 24–45CrossRefGoogle Scholar
  19. Mahanta P, Ahmed HA, Bhattacharyya DK and Ghosh A 2014 FUMET: A fuzzy network module extraction technique for gene expression data. J. Biosci. 39 351–364CrossRefGoogle Scholar
  20. Maind A and Raut S 2017 Computational analysis of biclustering algorithms for identification of co-expressed genes. Int. J. Data Min. Bioinform. 19 243–269CrossRefGoogle Scholar
  21. Maind A and Raut S 2018 Comparative analysis and evaluation of biclustering algorithms for microarray data; in Networking communication and data knowledge engineering (Singapore: Springer) pp. 159–171Google Scholar
  22. Maind A and Raut S 2019 Identifying condition specific key genes from basal-like breast cancer gene expression data. Comput. Biol. Chem. 78 367–374CrossRefGoogle Scholar
  23. Mazel J 2011 Unsupervised network anomaly detection (Doctoral dissertation, INSA de Toulouse)Google Scholar
  24. Murali TM and Kasif S 2002 Extracting conserved gene expression motifs from gene expression data; in Biocomputing (Washington, D.C.: World Scientific)Google Scholar
  25. Niu BF, Lang XY, Lu ZH and Chi XB 2009 Parallel algorithm research on several important open problems in bioinformatics. Interdisciplinary Sciences: Computational Life Sciences 1 187–195Google Scholar
  26. Ozsolak F and Milos PM 2011 RNA sequencing: Advances, challenges and opportunities. Nat. Rev. Genet. 12 87CrossRefGoogle Scholar
  27. Padilha VA and Campello RJ 2017 A systematic comparative evaluation of biclustering techniques. BMC Bioinf. 18 55CrossRefGoogle Scholar
  28. Pansombut T, Hendrix W, Jacob Gao Z, Harrison BE and Samatova NF 2011 Biclustering-driven ensemble of Bayesian belief network classifiers for underdetermined problems; In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Spain, Vol. 22, No. 1, p. 1439Google Scholar
  29. Pontes B, Giraldez R and Aguilar-Ruiz JS 2015 Biclustering on expression data: A review. J. Biomed. Inform. 57 163–180CrossRefGoogle Scholar
  30. Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L and Zitzler E 2006 A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22 1122–1129CrossRefGoogle Scholar
  31. Raut SA, Sathe SR and Raut A 2010 Bioinformatics: Trends in gene expression analysis; In Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on IEEE, Chengdu, China, pp. 97–100Google Scholar
  32. Reymond P, Weber H, Damond M and Farmer EE 2000 Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant. Cell. 12 707–719CrossRefGoogle Scholar
  33. Sadhu A and Bhattacharyya B 2017 Common subcluster mining in microarray data for molecular biomarker discovery (Interdisciplinary Sciences: Computational Life Sciences, Springer Nature Switzerland) pp. 1–12Google Scholar
  34. Szklarczyk R, Megchelenbrink W, Cizek P, Ledent M, Velemans G, Szklarczyk D and Huynen MA 2015 WeGET: Predicting new genes for molecular systems by weighted co-expression. Nucleic Acids Res. 44 D567–D573CrossRefGoogle Scholar
  35. Tanay A, Sharan R and Shamir R 2002 Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 S136–S144CrossRefGoogle Scholar
  36. Ulitsky I, Maron-Katz A, Shavit S, Sagir D, Linhart C, Elkon R, Tanay A, Sharan R, Shiloh Y and Shamir R 2010 Expander: From expression microarrays to networks and functions. Nat. Protoc. 5 303CrossRefGoogle Scholar
  37. Wang Z, Li G, Robinson RW and Huang X 2016 Unibic: sequential row-based biclustering algorithm for analysis of gene expression data. Sci. Rep. 6 23466CrossRefGoogle Scholar
  38. Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL and Somogyi R 1998 Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95 334–339CrossRefGoogle Scholar

Copyright information

© Indian Academy of Sciences 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringVisvesvaraya National Institute of TechnologyNagpurIndia

Personalised recommendations