Informative Gene Selection Using Clustering and Gene Ontology

Pati, Soumen K.; Mallick, Subhankar; Chakraborty, Aruna; Das, Ankur

doi:10.1007/978-981-13-1498-8_37

Soumen K. Pati¹⁹,
Subhankar Mallick¹⁹,
Aruna Chakraborty¹⁹ &
…
Ankur Das²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

1146 Accesses
1 Citations

Abstract

The main aspect of bioinformatics is to make an understanding between microarray data with biological processes as much as possible to ensure the development and application of data mining techniques. Microarray dataset is high voluminous containing huge genes, most of these are irrelevant regarding cancer classification. These irrelevant genes should be filtered out from the dataset before applying it in cancer classification system. In this paper, a clustering algorithm is used to group the genes whose similar expressions suggest that they may be co-regulated. Once the clusters are obtained, the biological knowledge is investigated for the genes associated with the clusters. A quality-based partition is determined by the co-expressed genes that have been incorporated with similar biological knowledge. Gene Ontology (GO) annotations are used to link the clusters to identify the biologically meaningful genes within the clusters. In the next phase, the fold-change method is used to pick up the differentially expressed genes from selected biologically meaningful genes within the clusters. These selected genes are termed as informative genes. The efficiency of the method is investigated on publicly accessible microarray data with the help of some popular classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xu, X., Zhang, A.: Selecting informative genes from microarray dataset by incorporating gene ontology. In: Fifth IEEE symposium on bioinformatics and bioengineering. BIBE2005, pp. 241–245 (2005)
Google Scholar
Das, A.K., Pati, S.K.: Rough set and statistical method for both way reduction of microarray cancer dataset. Int. J. Inf. Process. 6(3), 55–66 (2012)
Google Scholar
Pati, S.K., Das, A.K.: Missing value estimation for microarray data through cluster analysis. Knowl. Inf. Syst. Springer 52(3), 709–750 (2017)
Google Scholar
Zhang, Z.H., Jhaveri, D.J., Marshall, V.M., et al.: A comparative study of techniques for differential expression analysis on RNA-seq data. PLOS 9(8), 1–11 (2014)
Google Scholar
Rhee, S.Y., Wood, V., Dolinski, K., Draghici, S.: Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 9, 509–515 (2008)
Article Google Scholar
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)
Article Google Scholar
Singh, U., Hasan, S.: Survey paper on document classification and classifiers. Int. J. Comput. Sci. Trends Technol. 3(2), 83–87 (2015)
Google Scholar
Kuzminov, A.: DNA replication meets genetic exchange: Chromosomal damage and its repair by homologous recombination. Proc. Natl. Acad. Sci. U.S.A. 98(15), 8461–8468 (2001)
Article Google Scholar
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
Wu, B.: Differential gene expression detection and sample classification using penalized linear regression models. Bioinformatics 22(4), 472–476 (2006)
Article Google Scholar
Liu, R., Liu, Y., Li, Y.: An improved method for multi-objective clustering ensemble algorithm. IEEE Congr. Evolut. Comput. 1–8 (2012)
Google Scholar
Alamuri, M., Surampudi, B.R., Negi, A.: A survey of distance/similarity measures for categorical data. In: International joint conference on neural networks (IJCNN), pp. 1907–1914 (2014)
Google Scholar
Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods, a survey. Data Sci. Eng. Springer 1, 265–284 (2016)
Google Scholar
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9) (2004)
Google Scholar
Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 6, 1855–1887 (2005)
MathSciNet MATH Google Scholar
Shivakumar, B.L., Porkodi, R.: Finding relationships among gene ontology terms in biological documents using association rule mining and GO annotations. Int. J. Comput. Sci. Inf. Technol. Secur. 2(3), 542–550 (2012)
Google Scholar
http://arep.med.harvard.edu
http://ailab.si/supp/bi-cancer/
http://www.genome.wi.mit.edu/MPR
http://www.cs.waikato.ac.nz/ml/weka/
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

St. Thomas’ College of Engineering & Technology, Kolkata, 700023, West Bengal, India
Soumen K. Pati, Subhankar Mallick & Aruna Chakraborty
Calcutta Institute of Engineering and Management, Tollygunge, Kolkata, 700040, West Bengal, India
Ankur Das

Authors

Soumen K. Pati
View author publications
You can also search for this author in PubMed Google Scholar
Subhankar Mallick
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Ankur Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Subhankar Mallick .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
Department of Computer and System Sciences, Visva-Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, University of Kalyani, Kalyani, India
Jyotsna Kumar Mandal
Institute of Engineering and Management, Kolkata, West Bengal, India
Abhishek Bhattacharya
Institute of Engineering and Management, Kolkata, West Bengal, India
Soumi Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pati, S.K., Mallick, S., Chakraborty, A., Das, A. (2019). Informative Gene Selection Using Clustering and Gene Ontology. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_37

Download citation

DOI: https://doi.org/10.1007/978-981-13-1498-8_37
Published: 02 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics