A Column-Wise Distance-Based Approach for Clustering of Gene Expression Data with Detection of Functionally Inactive Genes and Noise

  • Girish Chandra
  • Sudhakar Tripathi
Part of the Studies in Computational Intelligence book series (SCI, volume 687)


Due to uncertainty and inherent noise present in gene expression data, clustering of the data is a challenging task. The common assumption of many clustering algorithms is that each gene belongs to a cluster. However, few genes are functionally inactive, i.e. not participate in any biological process during experimental conditions and should be segregated from clusters. Based on this observation, a clustering method is proposed in this article that clusters co-expressed genes and segregates functionally inactive genes and noise. The proposed method formed a cluster if the difference in expression levels of genes with a specified gene is less than a threshold t in each experimental condition; otherwise, the specified gene is marked as functionally inactive or noise. The proposed method is applied on 10 yeast gene expression data, and the result shows that it performs well over existing one.


Gene expression data Clustering Data mining 


  1. 1.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22(3), 281–285 (1999)CrossRefGoogle Scholar
  2. 2.
    Dembele, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)CrossRefGoogle Scholar
  3. 3.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sci. USA. 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  4. 4.
    Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA. 96(6) 2907–2912 (1999)CrossRefGoogle Scholar
  5. 5.
    Sharan, R., Shamir, R., CLICK: A clustering algorithm with applications to gene expression analysis. In: Proceedings of the Intelligent Systems for Molecular (ISMB), pp. 307–316 (2000)Google Scholar
  6. 6.
    Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23(21), 2859–2865 (2007)CrossRefGoogle Scholar
  7. 7.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  8. 8.
    Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)CrossRefGoogle Scholar
  9. 9.
    Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83(4), 825–833 (2003)CrossRefzbMATHGoogle Scholar
  11. 11.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  12. 12.
    Brock, G., Pihur, V., Datta, S., Datta, S.: clValid, an R package for cluster validation. J. Stat. Softw (Brock et al. March 2008) (2011)Google Scholar
  13. 13.
    Maji, P., Paul, S.: Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(2), 286–299 (2013)CrossRefGoogle Scholar
  14. 14.
    Nieweglowski, L., Nieweglowski, M.L.: Package ‘clv’ (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringNational Institute of Technology PatnaPatnaIndia

Personalised recommendations