Abstract
The main aspect of bioinformatics is to make an understanding between microarray data with biological processes as much as possible to ensure the development and application of data mining techniques. Microarray dataset is high voluminous containing huge genes, most of these are irrelevant regarding cancer classification. These irrelevant genes should be filtered out from the dataset before applying it in cancer classification system. In this paper, a clustering algorithm is used to group the genes whose similar expressions suggest that they may be co-regulated. Once the clusters are obtained, the biological knowledge is investigated for the genes associated with the clusters. A quality-based partition is determined by the co-expressed genes that have been incorporated with similar biological knowledge. Gene Ontology (GO) annotations are used to link the clusters to identify the biologically meaningful genes within the clusters. In the next phase, the fold-change method is used to pick up the differentially expressed genes from selected biologically meaningful genes within the clusters. These selected genes are termed as informative genes. The efficiency of the method is investigated on publicly accessible microarray data with the help of some popular classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, X., Zhang, A.: Selecting informative genes from microarray dataset by incorporating gene ontology. In: Fifth IEEE symposium on bioinformatics and bioengineering. BIBE2005, pp. 241–245 (2005)
Das, A.K., Pati, S.K.: Rough set and statistical method for both way reduction of microarray cancer dataset. Int. J. Inf. Process. 6(3), 55–66 (2012)
Pati, S.K., Das, A.K.: Missing value estimation for microarray data through cluster analysis. Knowl. Inf. Syst. Springer 52(3), 709–750 (2017)
Zhang, Z.H., Jhaveri, D.J., Marshall, V.M., et al.: A comparative study of techniques for differential expression analysis on RNA-seq data. PLOS 9(8), 1–11 (2014)
Rhee, S.Y., Wood, V., Dolinski, K., Draghici, S.: Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 9, 509–515 (2008)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)
Singh, U., Hasan, S.: Survey paper on document classification and classifiers. Int. J. Comput. Sci. Trends Technol. 3(2), 83–87 (2015)
Kuzminov, A.: DNA replication meets genetic exchange: Chromosomal damage and its repair by homologous recombination. Proc. Natl. Acad. Sci. U.S.A. 98(15), 8461–8468 (2001)
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Wu, B.: Differential gene expression detection and sample classification using penalized linear regression models. Bioinformatics 22(4), 472–476 (2006)
Liu, R., Liu, Y., Li, Y.: An improved method for multi-objective clustering ensemble algorithm. IEEE Congr. Evolut. Comput. 1–8 (2012)
Alamuri, M., Surampudi, B.R., Negi, A.: A survey of distance/similarity measures for categorical data. In: International joint conference on neural networks (IJCNN), pp. 1907–1914 (2014)
Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods, a survey. Data Sci. Eng. Springer 1, 265–284 (2016)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9) (2004)
Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 6, 1855–1887 (2005)
Shivakumar, B.L., Porkodi, R.: Finding relationships among gene ontology terms in biological documents using association rule mining and GO annotations. Int. J. Comput. Sci. Inf. Technol. Secur. 2(3), 542–550 (2012)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pati, S.K., Mallick, S., Chakraborty, A., Das, A. (2019). Informative Gene Selection Using Clustering and Gene Ontology. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_37
Download citation
DOI: https://doi.org/10.1007/978-981-13-1498-8_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)