A Graph-Based Method for Clustering of Gene Expression Data with Detection of Functionally Inactive Genes and Noise

  • Girish Chandra
  • Akshay Deepak
  • Sudhakar Tripathi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 705)


Noise that presents in gene expression data creates trouble in clustering for many clustering algorithms, and it is also observed that some non-functional genes may be present in the gene expression data that should not be the part of any cluster. A solution of this problem first removes the functionally inactive genes or noise and then clusters the remaining genes. Based on this solution, a graph-based clustering algorithm is proposed in this article which first identified the functionally inactive genes or noise and after that clustered the remaining genes of gene expression data. The proposed method is applied to a cell cycle data of yeast, and the results show that it performs well in identification of highly co-expressed gene clusters in the presence of functionally inactive genes and noise.


Clustering Gene expression data Data mining 


  1. 1.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  2. 2.
    Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)CrossRefGoogle Scholar
  3. 3.
    Young, W.C., Yeung, K.Y., Raftery, A.E.: Model-based clustering with data correction for removing artifacts in gene expression data. arXiv:1602.06316
  4. 4.
    Yun, T., Hwang, T., Cha, K., Yi, G.-S.: Clic: clustering analysis of large microarray datasets with individual dimension-based clustering. Nucleic Acids Res. 38(suppl 2), W246–W253 (2010)CrossRefGoogle Scholar
  5. 5.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22(3), 281–285 (1999)CrossRefGoogle Scholar
  6. 6.
    Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)CrossRefGoogle Scholar
  7. 7.
    Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96(6), 2907–2912 (1999)CrossRefGoogle Scholar
  8. 8.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  9. 9.
    Sharan, R., Shamir, R.: Click: a clustering algorithm with applications to gene expression analysis. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, vol. 8, p. 16 (2000)Google Scholar
  10. 10.
    Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)CrossRefGoogle Scholar
  11. 11.
    Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23(21), 2859–2865 (2007)CrossRefGoogle Scholar
  12. 12.
    Ma, P.C., Chan, K.C.: A novel approach for discovering overlapping clusters in gene expression data. IEEE Trans. Biomed. Eng. 56(7), 1803–1809 (2009)CrossRefGoogle Scholar
  13. 13.
    Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)CrossRefGoogle Scholar
  14. 14.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)CrossRefGoogle Scholar
  15. 15.
    Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Brock, G., Pihur, V., Datta, S., Datta, S., et al.: clvalid, an r package for cluster validation. J. Stat. Softw. (2008)Google Scholar
  17. 17.
    Shen, J., Chang, S.I., Lee, E.S., Deng, Y., Brown, S.J.: Determination of cluster number in clustering microarray data. Appl. Math. Comput. 169(2), 1172–1185 (2005)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Hosseininasab, S.M.E., Ershadi, M.J.: Optimization of the number of clusters: a case study on multivariate quality control results of segment installation. Int. J. Adv. Manuf. Technol. 1–7 (2013)Google Scholar
  19. 19.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Girish Chandra
    • 1
  • Akshay Deepak
    • 1
  • Sudhakar Tripathi
    • 1
  1. 1.Department of Computer Science and EngineeringNational Institute of Technology PatnaBiharIndia

Personalised recommendations