Abstract
Modern high-throughput technologies based on genome, transcriptome or proteome profiling provide abundance of data that needs to be processed, analyzed and, finally, interpreted. Effective and efficient analysis of data coming from molecular profiling is crucial for a detailed diagnosis, prognosis, and prediction of therapy outcome. Meaningful conclusions can be drawn only by the use of sophisticated methods for biomedical and molecular data analysis and interpretation. In this study we present the approach for functional interpretation of gene or protein sets with clusters of Gene Ontology terms. We analyze transcription profiles of human cell line K562 and we show that clustering allows grouping functionally related GO terms and therefore obtaining more concise and comprehensive description. By applying cluster-specific data aggregation tool we are able to calculate statistics for the individual clusters of GO terms and compare the number of differentially expressed genes between two sample pairs. The presented tool is implemented as a part of annotation module available on the BioTest remote platform for hypothesis testing and analysis of biomedical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afgan, E., et al.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44(W1), gkw343 (2016)
Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Bensz, W., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems, SCI, vol. 642, pp. 399–409. Springer, Cham (2016)
Biggs, J.R., Kraft, A.S.: Myeloid cell differentiation. In: eLS. John Wiley and Sons Ltd., Hoboken (2001)
Birkland, A., Yona, G.: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinf. 7, 70 (2006)
Carmona-Saez, P., et al.: Integrated analysis of gene expression by association rules discovery. BMC Bioinf. 7(9), 54 (2006)
Chow, M.T., Luster, A.D.: Chemokines in cancer. Cancer Immunol. Res. 2(12), 1125–1131 (2014)
Dai, M., et al.: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33(20), e175 (2005)
Do, L.H., Esteves, F., Karten, H., Bier, E.: Booly: a new data integration platform. BMC Bioinf. 11, 513 (2010)
Falcon, S., Gentleman, R.: Using GOstats to test gene lists for GO term association. Bioinformatics 23(2), 257–258 (2007)
Fulda, S., Gorman, A.M., Hori, O., Samali, A.: Cellular stress responses: cell survival and cell death. Int. J. Cell Biol. 2010, 23 (2010). Article no. 214074
Gomez-Cabrero, D., et al.: Data integration in the era of omics: current and future challenges. BMC Syst. Biol. 8(Suppl 2), I1 (2014)
Gruca, A., Kozielski, M., Sikora, M.: Fuzzy clustering and Gene Ontology based decision rules for identification and description of gene groups. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions, AINSC, vol. 59, pp. 141–149. Springer, Heidelberg (2009)
Gruca, A., Sikora, M.: Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets. J. Biomed. Semant. 8(1), 23 (2017)
Gruca, A., Sikora, M., Polanski, A.: RuleGO: a logical rules-based tool for description of gene groups by means of Gene Ontology. Nucleic Acids Res. 39(Web Server issue), W293–W301 (2011)
Huang, D.W., et al.: DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35(Web Server issue), W169–W175 (2007)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: ROCLING X 1997, pp. 19–33, Taiwan (1997)
Kozielski, M., Gruca, A.: Soft approach to identification of cohesive clusters in two gene representations. Procedia Comput. Sci. 35(C), 281–289 (2014)
Lan, C., Chen, Q., Li, J.: Grouping miRNAs of similar functions via weighted information content of Gene Ontology. BMC Bioinf. 17(19), 507 (2016)
Lin, D.: An information-theoretic definition of similarity. In: ICML 1998, pp. 296–304 (1998)
Linger, J.G., Tyler, J.K.: Chromatin disassembly and reassembly during DNA repair. Mutat. Res. - Fundam. Mol. Mech. Mutagen. 618(1–2), 52–64 (2007)
Maere, S., Heymans, K., Kuiper, M.: BiNGO: a cytoscape plugin to assess overrepresentation of Gene Ontology categories in biological networks. Bioinformatics 21(16), 3448–3449 (2005)
Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(2), 209–219 (2016)
Ovaska, K., Laakso, M., Hautaniemi, S.: Fast Gene Ontology based clustering for microarray experiments. BioData Min. 1(1), 11 (2008)
Pesquita, C., et al.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)
Psiuk-Maksymowicz, K., et al.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. CCIS, vol. 613, pp. 449–462. Springer, Cham (2016)
Psiuk-Maksymowicz, K., et al.: Scalability of a genomic data analysis in the biotest platform. In: Nguyen, N., Tojo, S., Nguyen, L., Trawinśki, B. (eds.) Intelligent Information and Database Systems. LNCS, vol. 10192, pp. 741–752. Springer, Cham (2017)
Resnik, P.: Using information content to evalutate semantic similarity in a taxonomy. In: IJCAI 1995, vol. 1, pp. 448–453, Montreal, Canada (1995)
Ritchie, M.D., et al.: Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16(2), 85–97 (2015)
Schoenborn, J., Wilson, C.: Regulation of interferon-\(\gamma \) during innate and adaptive immune responses. Adv. Immunol. 96(96), 41–101 (2007)
Speer, N., et al.: Spectral clustering Gene Ontology terms to group genes by function. In: Casadio, R., Myers, G. (eds.) Algorithms in Bioinformatics. LNCS, vol. 3692, pp. 1–12. Springer, Berlin, Heidelberg (2005)
Wang, J.Z., et al.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10), 1274–1281 (2007)
Yu, G., et al.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)
Acknowledgements
This work was partially supported by The National Centre for Research and Development grant No PBS3/B3/32/2015 and was carried out in part within the statutory research project of the Institute of Informatics (RAU2). Presented system was developed and installed on the infrastructure of the Ziemowit computer cluster (www.ziemowit.hpc.polsl.pl) in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Gruca, A., Jaksik, R., Psiuk-Maksymowicz, K. (2018). Functional Interpretation of Gene Sets: Semantic-Based Clustering of Gene Ontology Terms on the BioTest Platform. In: Gruca, A., CzachĂłrski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-67792-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67791-0
Online ISBN: 978-3-319-67792-7
eBook Packages: EngineeringEngineering (R0)