A Column-Wise Distance-Based Approach for Clustering of Gene Expression Data with Detection of Functionally Inactive Genes and Noise
Due to uncertainty and inherent noise present in gene expression data, clustering of the data is a challenging task. The common assumption of many clustering algorithms is that each gene belongs to a cluster. However, few genes are functionally inactive, i.e. not participate in any biological process during experimental conditions and should be segregated from clusters. Based on this observation, a clustering method is proposed in this article that clusters co-expressed genes and segregates functionally inactive genes and noise. The proposed method formed a cluster if the difference in expression levels of genes with a specified gene is less than a threshold t in each experimental condition; otherwise, the specified gene is marked as functionally inactive or noise. The proposed method is applied on 10 yeast gene expression data, and the result shows that it performs well over existing one.
KeywordsGene expression data Clustering Data mining
- 4.Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA. 96(6) 2907–2912 (1999)CrossRefGoogle Scholar
- 5.Sharan, R., Shamir, R., CLICK: A clustering algorithm with applications to gene expression analysis. In: Proceedings of the Intelligent Systems for Molecular (ISMB), pp. 307–316 (2000)Google Scholar
- 12.Brock, G., Pihur, V., Datta, S., Datta, S.: clValid, an R package for cluster validation. J. Stat. Softw (Brock et al. March 2008) (2011)Google Scholar
- 14.Nieweglowski, L., Nieweglowski, M.L.: Package ‘clv’ (2015)Google Scholar