Abstract
High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e.g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context.
However, gene expression data follows particular patterns – the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinform. 18(1), 513 (2017)
Babu, M.M.: Introduction to microarray data analysis. Comput. Genomics Theory Appl. 17(6), 225–249 (2004)
Bellazzi, R., Zupan, B.: Towards knowledge-based gene expression data mining. J. Biomed. Inform. 40(6), 787–802 (2007)
Gene Ontology Consortium: expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45(D1), D331–D338 (2016)
UniProt Consortium: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2016)
NCBI Resource Coordinators: database resources of the national center for biotechnology information. Nucleic Acids Res. 44(Database issue), D7 (2016)
van Dam, S., Craig, T., de Magalhaes, J.P.: GeneFriends: a human RNA-seq-based gene and transcript co-expression database. Nucleic Acids Res. 43(D1), D1124–D1132 (2014)
Fang, O.H., et al.: An integrative gene selection with association analysis for microarray data classification. Intell. Data Anal. 18(4), 739–758 (2014)
Farkas, I.J., Szántó-Várnagy, Á., Korcsmáros, T.: Linking proteins to signaling pathways for experiment design and evaluation. PloS ONE 7(4), e36202 (2012)
Inza, I., et al.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. (TKDE) 16(11), 1370–1386 (2004)
Kamburov, A., et al.: ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 39(suppl\({\_}\)1), D712–D717 (2010)
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Kukurba, K.R., Montgomery, S.B.: RNA sequencing and analysis. Cold Spring Harbor Protocols 2015(11) (2015). pdb–top084970
Lazar, C., et al.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9(4), 1106–1119 (2012)
Mahajan, S., Singh, S., et al.: Review on feature selection approaches using gene expression data. Imperial J. Interdisc. Res. 2(3) (2016)
Okamura, Y., et al.: COXPRESdb in 2015: coexpression database for animal species by dna-microarray and rnaseq-based expression data with multiple quality assessment systems. Nucleic Acids Res. 43(D1), D82–D86 (2014)
Pasquier, N., et al.: Mining gene expression data using domain knowledge. Int. J. Softw. Inform. (IJSI) 2(2), 215–231 (2008)
Piñero, J., et al.: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)
Qi, J., Tang, J.: Integrating gene ontology into discriminative powers of genes for feature selection in microarray data. In: SAC, pp. 430–434. ACM (2007)
Raghu, V.K., et al.: Integrated theory-and data-driven feature selection in gene expression data analysis. In: ICDE, pp. 1525–1532. IEEE (2017)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Shao, B., Conrad, T.: Epithelial-mesenchymal transition regulatory network-based feature selection in lung cancer prognosis prediction. In: IWBBIO, pp. 135–146. Springer (2016)
Stark, C., et al.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34(suppl\({\_}\)1), D535–D539 (2006)
Szklarczyk, D., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2014)
Uhlén, M., et al.: Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Perscheid, C., Uflacker, M. (2019). Integrating Biological Context into the Analysis of Gene Expression Data. In: Rodríguez, S., et al. Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference. DCAI 2018. Advances in Intelligent Systems and Computing, vol 801. Springer, Cham. https://doi.org/10.1007/978-3-319-99608-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-99608-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99607-3
Online ISBN: 978-3-319-99608-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)