Skip to main content

Informative Gene Selection Using Clustering and Gene Ontology

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

Abstract

The main aspect of bioinformatics is to make an understanding between microarray data with biological processes as much as possible to ensure the development and application of data mining techniques. Microarray dataset is high voluminous containing huge genes, most of these are irrelevant regarding cancer classification. These irrelevant genes should be filtered out from the dataset before applying it in cancer classification system. In this paper, a clustering algorithm is used to group the genes whose similar expressions suggest that they may be co-regulated. Once the clusters are obtained, the biological knowledge is investigated for the genes associated with the clusters. A quality-based partition is determined by the co-expressed genes that have been incorporated with similar biological knowledge. Gene Ontology (GO) annotations are used to link the clusters to identify the biologically meaningful genes within the clusters. In the next phase, the fold-change method is used to pick up the differentially expressed genes from selected biologically meaningful genes within the clusters. These selected genes are termed as informative genes. The efficiency of the method is investigated on publicly accessible microarray data with the help of some popular classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xu, X., Zhang, A.: Selecting informative genes from microarray dataset by incorporating gene ontology. In: Fifth IEEE symposium on bioinformatics and bioengineering. BIBE2005, pp. 241–245 (2005)

    Google Scholar 

  2. Das, A.K., Pati, S.K.: Rough set and statistical method for both way reduction of microarray cancer dataset. Int. J. Inf. Process. 6(3), 55–66 (2012)

    Google Scholar 

  3. Pati, S.K., Das, A.K.: Missing value estimation for microarray data through cluster analysis. Knowl. Inf. Syst. Springer 52(3), 709–750 (2017)

    Google Scholar 

  4. Zhang, Z.H., Jhaveri, D.J., Marshall, V.M., et al.: A comparative study of techniques for differential expression analysis on RNA-seq data. PLOS 9(8), 1–11 (2014)

    Google Scholar 

  5. Rhee, S.Y., Wood, V., Dolinski, K., Draghici, S.: Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 9, 509–515 (2008)

    Article  Google Scholar 

  6. Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)

    Article  Google Scholar 

  7. Singh, U., Hasan, S.: Survey paper on document classification and classifiers. Int. J. Comput. Sci. Trends Technol. 3(2), 83–87 (2015)

    Google Scholar 

  8. Kuzminov, A.: DNA replication meets genetic exchange: Chromosomal damage and its repair by homologous recombination. Proc. Natl. Acad. Sci. U.S.A. 98(15), 8461–8468 (2001)

    Article  Google Scholar 

  9. Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)

    Article  Google Scholar 

  10. Wu, B.: Differential gene expression detection and sample classification using penalized linear regression models. Bioinformatics 22(4), 472–476 (2006)

    Article  Google Scholar 

  11. Liu, R., Liu, Y., Li, Y.: An improved method for multi-objective clustering ensemble algorithm. IEEE Congr. Evolut. Comput. 1–8 (2012)

    Google Scholar 

  12. Alamuri, M., Surampudi, B.R., Negi, A.: A survey of distance/similarity measures for categorical data. In: International joint conference on neural networks (IJCNN), pp. 1907–1914 (2014)

    Google Scholar 

  13. Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods, a survey. Data Sci. Eng. Springer 1, 265–284 (2016)

    Google Scholar 

  14. Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9) (2004)

    Google Scholar 

  15. Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 6, 1855–1887 (2005)

    MathSciNet  MATH  Google Scholar 

  16. Shivakumar, B.L., Porkodi, R.: Finding relationships among gene ontology terms in biological documents using association rule mining and GO annotations. Int. J. Comput. Sci. Inf. Technol. Secur. 2(3), 542–550 (2012)

    Google Scholar 

  17. http://arep.med.harvard.edu

  18. http://ailab.si/supp/bi-cancer/

  19. http://www.genome.wi.mit.edu/MPR

  20. http://www.cs.waikato.ac.nz/ml/weka/

  21. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subhankar Mallick .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pati, S.K., Mallick, S., Chakraborty, A., Das, A. (2019). Informative Gene Selection Using Clustering and Gene Ontology. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_37

Download citation

Publish with us

Policies and ethics