Integrating Biological Context into the Analysis of Gene Expression Data

Perscheid, Cindy; Uflacker, Matthias

doi:10.1007/978-3-319-99608-0_41

Cindy Perscheid²³ &
Matthias Uflacker²³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 801))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

657 Accesses
2 Citations

Abstract

High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e.g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context.

However, gene expression data follows particular patterns – the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinform. 18(1), 513 (2017)
Article Google Scholar
Babu, M.M.: Introduction to microarray data analysis. Comput. Genomics Theory Appl. 17(6), 225–249 (2004)
Google Scholar
Bellazzi, R., Zupan, B.: Towards knowledge-based gene expression data mining. J. Biomed. Inform. 40(6), 787–802 (2007)
Article Google Scholar
Gene Ontology Consortium: expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45(D1), D331–D338 (2016)
Google Scholar
UniProt Consortium: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2016)
Google Scholar
NCBI Resource Coordinators: database resources of the national center for biotechnology information. Nucleic Acids Res. 44(Database issue), D7 (2016)
Google Scholar
van Dam, S., Craig, T., de Magalhaes, J.P.: GeneFriends: a human RNA-seq-based gene and transcript co-expression database. Nucleic Acids Res. 43(D1), D1124–D1132 (2014)
Google Scholar
Fang, O.H., et al.: An integrative gene selection with association analysis for microarray data classification. Intell. Data Anal. 18(4), 739–758 (2014)
Article Google Scholar
Farkas, I.J., Szántó-Várnagy, Á., Korcsmáros, T.: Linking proteins to signaling pathways for experiment design and evaluation. PloS ONE 7(4), e36202 (2012)
Article Google Scholar
Inza, I., et al.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)
Article Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. (TKDE) 16(11), 1370–1386 (2004)
Article Google Scholar
Kamburov, A., et al.: ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 39(suppl\({\_}\)1), D712–D717 (2010)
Article Google Scholar
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article Google Scholar
Kukurba, K.R., Montgomery, S.B.: RNA sequencing and analysis. Cold Spring Harbor Protocols 2015(11) (2015). pdb–top084970
Article Google Scholar
Lazar, C., et al.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9(4), 1106–1119 (2012)
Article Google Scholar
Mahajan, S., Singh, S., et al.: Review on feature selection approaches using gene expression data. Imperial J. Interdisc. Res. 2(3) (2016)
Google Scholar
Okamura, Y., et al.: COXPRESdb in 2015: coexpression database for animal species by dna-microarray and rnaseq-based expression data with multiple quality assessment systems. Nucleic Acids Res. 43(D1), D82–D86 (2014)
Article Google Scholar
Pasquier, N., et al.: Mining gene expression data using domain knowledge. Int. J. Softw. Inform. (IJSI) 2(2), 215–231 (2008)
Google Scholar
Piñero, J., et al.: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)
Google Scholar
Qi, J., Tang, J.: Integrating gene ontology into discriminative powers of genes for feature selection in microarray data. In: SAC, pp. 430–434. ACM (2007)
Google Scholar
Raghu, V.K., et al.: Integrated theory-and data-driven feature selection in gene expression data analysis. In: ICDE, pp. 1525–1532. IEEE (2017)
Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Shao, B., Conrad, T.: Epithelial-mesenchymal transition regulatory network-based feature selection in lung cancer prognosis prediction. In: IWBBIO, pp. 135–146. Springer (2016)
Google Scholar
Stark, C., et al.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34(suppl\({\_}\)1), D535–D539 (2006)
Article Google Scholar
Szklarczyk, D., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2014)
Article Google Scholar
Uhlén, M., et al.: Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
Cindy Perscheid & Matthias Uflacker

Authors

Cindy Perscheid
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Uflacker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cindy Perscheid .

Editor information

Editors and Affiliations

BISITE Digital Innovation Hub, University of Salamanca, Salamanca, Spain
Sara Rodríguez
BISITE Digital Innovation Hub, University of Salamanca, Salamanca, Spain
Javier Prieto
GECAD - Instituto Superior de Engenharia, Porto, Portugal
Pedro Faria
Department of Computer Science and Production Management, University of Zielona Góra, Zielona Góra, Poland
Sławomir Kłos
Computing Science and Artificial Intelligence, Rey Juan Carlos University, Móstoles, Madrid, Spain
Alberto Fernández
Basque Center for Applied Mathematics, Bilbao, Spain
Santiago Mazuelas
Basque Center for Applied Mathematics, Universidad de Alcalá, Alcalá de Henares, Spain
M. Dolores Jiménez-López
Departamento de Informática y Automática, University of Salamanca, Salamanca, Spain
María N. Moreno
Departamento de Sistemas Informáticos, University of Castilla-La Mancha, Albacete, Spain
Elena M. Navarro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perscheid, C., Uflacker, M. (2019). Integrating Biological Context into the Analysis of Gene Expression Data. In: Rodríguez, S., et al. Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference. DCAI 2018. Advances in Intelligent Systems and Computing, vol 801. Springer, Cham. https://doi.org/10.1007/978-3-319-99608-0_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-99608-0_41
Published: 09 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99607-3
Online ISBN: 978-3-319-99608-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics