Abstract
Noise that presents in gene expression data creates trouble in clustering for many clustering algorithms, and it is also observed that some non-functional genes may be present in the gene expression data that should not be the part of any cluster. A solution of this problem first removes the functionally inactive genes or noise and then clusters the remaining genes. Based on this solution, a graph-based clustering algorithm is proposed in this article which first identified the functionally inactive genes or noise and after that clustered the remaining genes of gene expression data. The proposed method is applied to a cell cycle data of yeast, and the results show that it performs well in identification of highly co-expressed gene clusters in the presence of functionally inactive genes and noise.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)
Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)
Young, W.C., Yeung, K.Y., Raftery, A.E.: Model-based clustering with data correction for removing artifacts in gene expression data. arXiv:1602.06316
Yun, T., Hwang, T., Cha, K., Yi, G.-S.: Clic: clustering analysis of large microarray datasets with individual dimension-based clustering. Nucleic Acids Res. 38(suppl 2), W246–W253 (2010)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22(3), 281–285 (1999)
Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96(6), 2907–2912 (1999)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)
Sharan, R., Shamir, R.: Click: a clustering algorithm with applications to gene expression analysis. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, vol. 8, p. 16 (2000)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)
Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23(21), 2859–2865 (2007)
Ma, P.C., Chan, K.C.: A novel approach for discovering overlapping clusters in gene expression data. IEEE Trans. Biomed. Eng. 56(7), 1803–1809 (2009)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Brock, G., Pihur, V., Datta, S., Datta, S., et al.: clvalid, an r package for cluster validation. J. Stat. Softw. (2008)
Shen, J., Chang, S.I., Lee, E.S., Deng, Y., Brown, S.J.: Determination of cluster number in clustering microarray data. Appl. Math. Comput. 169(2), 1172–1185 (2005)
Hosseininasab, S.M.E., Ershadi, M.J.: Optimization of the number of clusters: a case study on multivariate quality control results of segment installation. Int. J. Adv. Manuf. Technol. 1–7 (2013)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chandra, G., Deepak, A., Tripathi, S. (2018). A Graph-Based Method for Clustering of Gene Expression Data with Detection of Functionally Inactive Genes and Noise. In: Reddy Edla, D., Lingras, P., Venkatanareshbabu K. (eds) Advances in Machine Learning and Data Science. Advances in Intelligent Systems and Computing, vol 705. Springer, Singapore. https://doi.org/10.1007/978-981-10-8569-7_22
Download citation
DOI: https://doi.org/10.1007/978-981-10-8569-7_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8568-0
Online ISBN: 978-981-10-8569-7
eBook Packages: EngineeringEngineering (R0)