Skip to main content

Integrating Biological Context into the Analysis of Gene Expression Data

  • Conference paper
  • First Online:
Book cover Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference (DCAI 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 801))

Abstract

High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e.g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context.

However, gene expression data follows particular patterns – the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinform. 18(1), 513 (2017)

    Article  Google Scholar 

  2. Babu, M.M.: Introduction to microarray data analysis. Comput. Genomics Theory Appl. 17(6), 225–249 (2004)

    Google Scholar 

  3. Bellazzi, R., Zupan, B.: Towards knowledge-based gene expression data mining. J. Biomed. Inform. 40(6), 787–802 (2007)

    Article  Google Scholar 

  4. Gene Ontology Consortium: expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45(D1), D331–D338 (2016)

    Google Scholar 

  5. UniProt Consortium: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2016)

    Google Scholar 

  6. NCBI Resource Coordinators: database resources of the national center for biotechnology information. Nucleic Acids Res. 44(Database issue), D7 (2016)

    Google Scholar 

  7. van Dam, S., Craig, T., de Magalhaes, J.P.: GeneFriends: a human RNA-seq-based gene and transcript co-expression database. Nucleic Acids Res. 43(D1), D1124–D1132 (2014)

    Google Scholar 

  8. Fang, O.H., et al.: An integrative gene selection with association analysis for microarray data classification. Intell. Data Anal. 18(4), 739–758 (2014)

    Article  Google Scholar 

  9. Farkas, I.J., Szántó-Várnagy, Á., Korcsmáros, T.: Linking proteins to signaling pathways for experiment design and evaluation. PloS ONE 7(4), e36202 (2012)

    Article  Google Scholar 

  10. Inza, I., et al.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)

    Article  Google Scholar 

  11. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. (TKDE) 16(11), 1370–1386 (2004)

    Article  Google Scholar 

  12. Kamburov, A., et al.: ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 39(suppl\({\_}\)1), D712–D717 (2010)

    Article  Google Scholar 

  13. Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  14. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  Google Scholar 

  15. Kukurba, K.R., Montgomery, S.B.: RNA sequencing and analysis. Cold Spring Harbor Protocols 2015(11) (2015). pdb–top084970

    Article  Google Scholar 

  16. Lazar, C., et al.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9(4), 1106–1119 (2012)

    Article  Google Scholar 

  17. Mahajan, S., Singh, S., et al.: Review on feature selection approaches using gene expression data. Imperial J. Interdisc. Res. 2(3) (2016)

    Google Scholar 

  18. Okamura, Y., et al.: COXPRESdb in 2015: coexpression database for animal species by dna-microarray and rnaseq-based expression data with multiple quality assessment systems. Nucleic Acids Res. 43(D1), D82–D86 (2014)

    Article  Google Scholar 

  19. Pasquier, N., et al.: Mining gene expression data using domain knowledge. Int. J. Softw. Inform. (IJSI) 2(2), 215–231 (2008)

    Google Scholar 

  20. Piñero, J., et al.: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)

    Google Scholar 

  21. Qi, J., Tang, J.: Integrating gene ontology into discriminative powers of genes for feature selection in microarray data. In: SAC, pp. 430–434. ACM (2007)

    Google Scholar 

  22. Raghu, V.K., et al.: Integrated theory-and data-driven feature selection in gene expression data analysis. In: ICDE, pp. 1525–1532. IEEE (2017)

    Google Scholar 

  23. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  24. Shao, B., Conrad, T.: Epithelial-mesenchymal transition regulatory network-based feature selection in lung cancer prognosis prediction. In: IWBBIO, pp. 135–146. Springer (2016)

    Google Scholar 

  25. Stark, C., et al.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34(suppl\({\_}\)1), D535–D539 (2006)

    Article  Google Scholar 

  26. Szklarczyk, D., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2014)

    Article  Google Scholar 

  27. Uhlén, M., et al.: Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cindy Perscheid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Perscheid, C., Uflacker, M. (2019). Integrating Biological Context into the Analysis of Gene Expression Data. In: Rodríguez, S., et al. Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference. DCAI 2018. Advances in Intelligent Systems and Computing, vol 801. Springer, Cham. https://doi.org/10.1007/978-3-319-99608-0_41

Download citation

Publish with us

Policies and ethics