Qualitative assessment of functional module detectors on microarray and RNASeq data

  • Monica Jha
  • Pietro. H. Guzzi
  • Swarup RoyEmail author
Review Article


A set of correlated and co-expressed genes, often referred as a functional module, play a synergistic role during any disease or any biological activities. Genes participating in a common module may cause clinically similar diseases and share a common genetic origin of their associated disease phenotypes. Identifying such modules may be helpful in system-level understanding of biological and cellular processes or pathophysiologic basis of associated diseases. As a result detecting such functional modules is an active research issue in the area of computational biology. Some techniques have been proposed so far to find functional modules based on gene co-regulation or co-expression data. These methods are broadly categorized into non-network based gene expression clustering techniques and network-based methods that extract modules from gene co-expression networks using expression data sources. We survey main approaches for obtaining modules, and we evaluate their performance regarding finding biologically significant gene modules in light of both microarray and RNASeq data. No prior effort, other than independent assessment, has been made so far to evaluate their performances in an integrated way in the light of both microarray and RNASeq data. We assess the significance of the modules in terms of gene ontology and pathway analysis. We select a few of the best performers to access their capability in finding disease-specific modules. Our comparison reveals that no single algorithm is a winner in all respects. Moreover, performances vary widely with microarray and RNASeq data. Relatively, biclustering performs better, when we consider microarray expression data, but fails to perform well in case of RNASeq data. Network-based techniques work better in RNASeq.


Co-expression RNA sequence Clustering Biclustering Network module Gene ontology Disease pathways 


  1. Ahmad W, Khokhar A (2008) Phoenix: privacy preserving biclustering on horizontally partitioned data. Privacy, Security, and Trust in KDD pp. 14–32Google Scholar
  2. Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):2Google Scholar
  3. Barage SH, Sonawane KD (2015) Amyloid cascade hypothesis: pathogenesis and therapeutic strategies in alzheimer’s disease. Neuropeptides 52:1–18Google Scholar
  4. Barkow S, Bleuler S, Prelić A, Zimmermann P, Zitzler E (2006) Bicat: a biclustering analysis toolbox. Bioinformatics 22(10):1282–1283Google Scholar
  5. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B (Methodological) 57(1):289–300MathSciNetzbMATHGoogle Scholar
  6. Berriz GF, King OD, Bryant B, Sander C, Roth FP (2003) Characterizing gene sets with funcassociate. Bioinformatics 19(18):2502–2504Google Scholar
  7. Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366Google Scholar
  8. Brohee S, Van Helden J (2006) Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinform 7(1):488Google Scholar
  9. Bye CR, Jönsson ME, Björklund A, Parish CL, Thompson LH (2015) Transcriptome analysis reveals transmembrane targets on transplantable midbrain dopamine progenitors. Proc Natl Acad Sci 112(15):E1946–E1955Google Scholar
  10. Cannataro M, Guzzi PH, Veltri P (2010) Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput Surveys (CSUR) 43(1):1Google Scholar
  11. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the International Conference on Intelligent Systems for Molecular Biology,  pp 93–103Google Scholar
  12. Cho YR, Mina M, Lu Y, Kwon N, Guzzi PH (2013) M-finder: uncovering functionally associated proteins from interactome data integrated with go annotations. Proteome Sci. 11(1):S3Google Scholar
  13. van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP (2018) Gene co-expression analysis for functional classification and gene–disease predictions. Brief Bioinform 19(4):575–592Google Scholar
  14. Davidson E, Levin M (2005) Gene regulatory networks. Proc Nati Acad Sci USA 102(14):4935Google Scholar
  15. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 269–274. ACMGoogle Scholar
  16. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575–1584Google Scholar
  17. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174MathSciNetGoogle Scholar
  18. George T, Merugu S (2005) A scalable collaborative filtering framework based on co-clustering. In: ICDM '05 proceedings of the fifth IEEE International Conference on Data Mining, IEEE Computer Society Washington, DC, USA, pp 625–628Google Scholar
  19. Gibbons FD, Roth FP (2002) Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res 12(10):1574–1581Google Scholar
  20. Gonçalves JP, Madeira SC, Oliveira AL (2009) Biggests: integrated environment for biclustering analysis of time series gene expression data. BMC Res Notes 2(1):124Google Scholar
  21. Gremalschi S, Altun G, Astrovskaya I, Zelikovsky A (2009) Mean square residue biclustering with missing data and row inversions. In: International symposium on bioinformatics research and applications. Springer, Berlin, pp 28–39Google Scholar
  22. Guzzi PH (2016) Microarray data analysis: methods and applications. Humana Press, New York CityzbMATHGoogle Scholar
  23. Guzzi PH, Milenković T (2017) Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 19(3):472–481Google Scholar
  24. Guzzi PH, Masciari E, Mazzeo GM, Zaniolo C (2014) A discussion on the biological relevance of clustering results. In: Information technology in bio- and medical informatics—5th international conference, ITBAM 2014, Munich, Germany, September 2, 2014. Proceedings, pp 30–44Google Scholar
  25. Hartigan JA, Hartigan J (1975) Clustering algorithms, vol 209. Wiley, New YorkzbMATHGoogle Scholar
  26. Henriques R, Ferreira FL, Madeira SC (2017) Bicpams: software for biological data analysis with pattern-based biclustering. BMC Bioinform 18(1):82Google Scholar
  27. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols 4(1):44–57Google Scholar
  28. Immermann F, Huang Y (2003) An introduction to cluster analysis. In: Burczynski ME (ed) An introduction to toxicogenomics, vol 200. CRC Press, Boca Raton, pp 45–78Google Scholar
  29. Jiang D, Pei J, Zhang A (2003) Dhc: a density-based hierarchical clustering method for time series gene expression data. In: Proceedings. Third IEEE symposium on bioinformatics and bioengineering, 2003, pp 393–400. IEEEGoogle Scholar
  30. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386Google Scholar
  31. Langfelder P, Horvath S (2008) Wgcna: an R package for weighted correlation network analysis. BMC Bioinform 9(1):559Google Scholar
  32. Liu R, Cheng Y, Yu J, Lv QL, Zhou HH (2015) Identification and validation of gene module associated with lung cancer through coexpression network analysis. Gene 563(1):56–62Google Scholar
  33. Liu Z, Song Yq, Xie Ch, Tang Z (2016) A new clustering method of gene expression data based on multivariate gaussian mixture models. Signal Image Video Process 10(2):359–368Google Scholar
  34. MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1. pp 281–297Google Scholar
  35. Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 1(1):24–45Google Scholar
  36. Mahanta P, Ahmed HA, Bhattacharyya DK, Ghosh A (2014) Fumet: a fuzzy network module extraction technique for gene expression data. J Biosci 39(3):351–364Google Scholar
  37. Mahanta P, Ahmed HA, Bhattacharyya DK, Kalita JK (2012) An effective method for network module extraction from microarray data. BMC Bioinf 13(13):S4Google Scholar
  38. Manners HN, Jha M, Guzzi PH, Veltri P, Roy S (2016) Computational methods for detecting functional modules from gene regulatory network. In: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, ACM, p 3:1–3:6Google Scholar
  39. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141Google Scholar
  40. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A (2006) Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform 7(1):S7Google Scholar
  41. Masellis M, Collinson S, Freeman N, Tampakeras M, Levy J, Tchelet A, Eyal E, Berkovich E, Eliaz RE, Abler V et al (2016) Dopamine d2 receptor gene variants and response to rasagiline in early parkinsons disease: a pharmacogenetic study. Brain 139(7):2050–2062Google Scholar
  42. Montojo J, Zuberi K, Rodriguez H, Kazi F, Wright G, Donaldson SL, Morris Q, Bader GD (2010) Genemania cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics 26(22):2927–2928Google Scholar
  43. Newman AM, Cooper JB (2010) Autosome: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinform 11(1):1Google Scholar
  44. O’Brien RJ, Wong PC (2011) Amyloid precursor protein processing and Alzheimer’s disease. Annu Rev Neurosci 34:185–204Google Scholar
  45. Orilieri E, Cappellano G, Clementi R, Cometa A, Ferretti M, Cerutti E, Cadario F, Martinetti M, Larizza D, Calcaterra V et al (2008) Variations of the perforin gene in patients with type 1 diabetes. Diabetes 57(4):1078–1083Google Scholar
  46. Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129Google Scholar
  47. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555Google Scholar
  48. Reiss DJ, Baliga NS, Bonneau R (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform 7(1):280Google Scholar
  49. Reiss DJ, Plaisier CL, Wu WJ, Baliga NS (2015) cMonkey2: automated, systematic, integrated detection of co-regulated gene modules for any organism. Nucleic Acids Res 43(13):e87Google Scholar
  50. Richard H, Schulz MH, Sultan M, Nurnberger A, Schrinner S, Balzereit D, Dagand E, Rasche A, Lehrach H, Vingron M (2010) Prediction of alternative isoforms from exon expression levels in RNA-seq experiments. Nucleic Acids Res 38(10):e112–e112Google Scholar
  51. Roy S, Bhattacharyya DK, Kalita JK (2013) Cobi: pattern based co-regulated biclustering of gene expression data. Pattern Recognit Lett 34(14):1669–1678Google Scholar
  52. Roy S, Bhattacharyya DK, Kalita JK (2014) Reconstruction of gene co-expression network from microarray data using local expression patterns. BMC Bioinform 15(7):S10Google Scholar
  53. Roy S, Bhattacharyya DK, Kalita JK (2015) Analysis of gene expression patterns using biclustering. In: Microarray Data Analysis. Humana Press, New York, pp 91–103Google Scholar
  54. Ruan J, Zhang W (2007) Identification and evaluation of functional modules in gene co-expression networks. In: Ideker T, Bafna V (eds) Systems Biology and Computational Proteomics. RSB 2006, RCP 2006, vol 4532. Lecture Notes in Computer Science. Springer, Berlin, HeidelbergGoogle Scholar
  55. Shamir R, Maron-Katz A, Tanay A, Linhart C, Steinfeld I, Sharan R, Shiloh Y, Elkon R (2005) Expander-an integrative program suite for microarray data analysis. BMC Bioinform 6(1):232Google Scholar
  56. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504Google Scholar
  57. Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. In: Proceedings of the international conference on intelligent systems for molecular biology, pp 307–316Google Scholar
  58. Sherlock G (2000) Analysis of large-scale gene expression data. Curr Opin Immunol 12(2):201–205MathSciNetGoogle Scholar
  59. Shiba-Fukushima K, Ishikawa KI, Inoshita T, Izawa N, Takanashi M, Sato S, Onodera O, Akamatsu W, Okano H, Imai Y, Hattori N (2017) Evidence that phosphorylated ubiquitin signaling is involved in the etiology of Parkinson’s disease. Hum Mol Genet 26(16):3172–3185Google Scholar
  60. Solinas G, Becattini B (2017) JNK at the crossroad of obesity, insulin resistance, and cell stress response. Mole Metab 6(2):174Google Scholar
  61. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 96(6):2907–2912Google Scholar
  62. Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl 1):S136–S144Google Scholar
  63. Tang MX, Stern Y, Marder K, Bell K, Gurland B, Lantigua R, Andrews H, Feng L, Tycko B, Mayeux R (1998) The apoe- 4 allele and the risk of Alzheimer disease among African Americans, Whites, and Hispanics. JAMA 279(10):751–755Google Scholar
  64. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19):2405–2412Google Scholar
  65. Van Dongen SM (2000) Graph clustering by flow simulation (Doctoral dissertation)Google Scholar
  66. Veugelen S, Saito T, Saido TC, Chávez-Gutiérrez L, De Strooper B (2016) Familial alzheimers disease mutations in presenilin generate amyloidogenic a\(\beta\) peptide seeds. Neuron 90(2):410–416Google Scholar
  67. Wang Z, Gerstein M, Snyder M (2009) Rna-seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63Google Scholar
  68. Weissmann L, Quaresma PG, Santos AC, de Matos AH, Pascoal VDB, Zanotto TM, Castro G, Guadagnini D, da Silva JM, Velloso LA et al (2014) Ikk\(\varepsilon\) is key to induction of insulin resistance in the hypothalamus, and its inhibition reverses obesity. Diabetes 63(10):3334–3345Google Scholar
  69. Wu Fx (2008) Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinform 9(6):S12Google Scholar
  70. Wu G, Stein L (2012) A network module-based method for identifying cancer prognostic signatures. Genome Biol 13(12):R112Google Scholar
  71. Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17(4):309–318Google Scholar
  72. Zhang Y, Nam CS, Zhou G, Jin J, Wang X, Cichocki A (2018) Temporally constrained sparse group spatial patterns for motor imagery bci. IEEE Trans Cybern 99:1–11Google Scholar
  73. Zhao Y, Li H, Fang S, Kang Y, Hao Y, Li Z, Bu D, Sun N, Zhang MQ, Chen R (2016) Noncode 2016: an informative and valuable data source of long non-coding RNAs. Nucleic Acids Res 44(D1):D203–D208Google Scholar
  74. Zhou G, Zhao Q, Zhang Y, Adalı T, Xie S, Cichocki A (2016) Linked component analysis from matrices to high-order tensors: applications to biomedical data. Proc IEEE 104(2):310–331Google Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Information TechnologyNorth Eastern Hill UniversityShillongIndia
  2. 2.Department of Surgical and Medical SciencesUniversity of CatanzaroCatanzaroItaly
  3. 3.Department of Computer ApplicationsSikkim UniversityGangtokIndia

Personalised recommendations