Density-Based Clustering of Functionally Similar Genes Using Biological Knowledge

  • Namrata Pant
  • Sushmita PaulEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11942)


Clustering is used to identify natural groups present in the data. It has been applied widely for analyzing gene expression data to discover gene clusters that might be involved in same biological processes. This information is very important for analyzing data of fatal diseases like cancers and identifying potential diagnostic and prognostic markers. Existing clustering methods used in this regard are computationally efficient, but do not always produce biologically meaningful results. Additionally, they have one or the other shortcomings; either they are not able to deal with arbitrary-shaped clusters, require number of clusters to be specified previously or are not efficient in dealing with noise present in biological data. In this study, a new density-based clustering method specific for gene expression data is introduced that overcomes the above shortcomings and produces biologically enriched clusters of functionally similar genes by incorporating biological information from Gene Ontology (GO). The proposed method integrates the GO semantic similarity information and the correlation information between the genes for obtaining clusters. The clusters are further validated for their biological relevance using Disease Ontology, KEGG Pathway enrichment and protein-protein interaction network analysis.


Clustering Gene expression Cancer biomarkers 



This work was partially supported by the Department of Science and Technology, Government of India, New Delhi (grant no. ECR/2016/001917).


  1. 1.
    Andersen, C.L., et al.: Active estrogen receptor-alpha signaling in ovarian cancer models and clinical specimens. Clin. Cancer Res. 23, 3802–3812 (2017)CrossRefGoogle Scholar
  2. 2.
    Chen, C.F., Wang, J.Z., Yu, P.S., Payattakool, R., Du, Z.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10), 1274–1281 (2007)CrossRefGoogle Scholar
  3. 3.
    Daxin, J., Chun, T., Aidong, Z.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  4. 4.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)Google Scholar
  5. 5.
    Gasco, M., Shami, S., Crook, T.: The p53 pathway in breast cancer. Breast Cancer Res. 4, 70 (2002)CrossRefGoogle Scholar
  6. 6.
    Hassanein, M., Callison, J.C., Callaway-Lane, C., Aldrich, M.C., Grogan, E.L., Massion, P.P.: The state of molecular biomarkers for the early detection of lung cancer. Cancer Prev. Res. 5(8), 992–1006 (2012)CrossRefGoogle Scholar
  7. 7.
    Li, F., Yu, G., Wang, S., Bo, X., Wu, Y., Qin, Y.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)CrossRefGoogle Scholar
  8. 8.
    Maji, P., Paul, S.: Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 286–299 (2013)CrossRefGoogle Scholar
  9. 9.
    Naegle, K.M., Jimenez, N., Sloutsky, R., Swamidass, S.J.: Accounting for noise when clustering biological data. Brief. Bioinform. 14, 423–436 (2012)Google Scholar
  10. 10.
    Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96(6), 2907–2912 (1999)CrossRefGoogle Scholar
  11. 11.
    Wang, J., Li, M., Chen, J., Pan, Y.: A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(3), 607–620 (2011)CrossRefGoogle Scholar
  12. 12.
    Yu, G., Wang, L.G., Han, Y., He, Q.Y.: ClusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: J. Integr. Biol. 16(5), 284–287 (2012)CrossRefGoogle Scholar
  13. 13.
    Yu, G., Yan, G.R., Wang, L.G., He, Q.Y.: DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31(4), 608–609 (2014)CrossRefGoogle Scholar
  14. 14.
    Zhang, Y., Cao, L., Nguyen, D., Lu, H.: Tp53 mutations in epithelial ovarian cancer. Transl. Cancer Res. 5(6), 650 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Bioscience and BioengineeringIndian Institute of TechnologyJodhpurIndia

Personalised recommendations