Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW

Guérin, E.; Marquet, G.; Burgun, A.; Loréal, O.; Berti-Equille, L.; Leser, U.; Moussouni, F.

doi:10.1007/11530084_14

Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW

E. Guérin²¹,
G. Marquet²²,
A. Burgun²²,
O. Loréal²¹,
L. Berti-Equille²³,
U. Leser²⁴ &
…
F. Moussouni²¹

Conference paper

840 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3615))

Abstract

Researchers at the medical research institute Inserm U522, specialized in the liver, use high throughput technologies to diagnose liver disease states. They seek to identify the set of dysregulated genes in different physiopathological situations, along with the molecular regulation mechanisms involved in the occurrence of these diseases, leading at mid-term to new diagnostic and therapeutic tools. To be able to resolve such a complex question, one has to consider both data generated on the genes by in-house transcriptome experiments and annotations extracted from the many publicly available heterogeneous resources in Biomedicine. This paper presents GEDAW, a gene expression data warehouse that has been developed to assist such discovery processes. The distinctive feature of GEDAW is that it systematically integrates gene information from a multitude of structured data sources. Data sources include: i) XML records of GENBANK to annotate gene sequence features, integrated using a schema mapping approach, ii) an inhouse relational database that stores detailed experimental data on the liver genes and is a permanent source for providing expression levels to the warehouse without unnecessary details on the experiments, and iii) a semi-structured data source called BioMeKE-XML that provides for each gene its nomenclature, its functional annotation according to Gene Ontology, and its medical annotation according to the UMLS. Because GEDAW is a liver gene expression data warehouse, we have paid more attention to the medical knowledge to be able to correlate biology mechanisms and medical knowledge with experimental data. The paper discusses the data sources and the transformation process that is applied to resolve syntactic and semantic conflicts between the source format and the GEDAW schema.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Achard, F., Vaysseix, G., Barillot, E.: XML, bioinformatics and data integration. Bioinformatics 17(2), 115–125 (2001)
Article Google Scholar
Babenko, V., Brunk, B., Crabtree, J., Diskin, S., Fischer, S., Grant, G., Kondrahkin, Y., Li, L., Liu, J., Mazzarelli, J., Pinney, D., Pizarro, A., Manduchi, E., McWeeney, S., Schug, J., Stoeckert, C.: GUS The Genomics Unified Schema A Platform for Genomics Databases (2003), http://www.gusdb.org/
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–270 (2004)
Google Scholar
Bodenreider, O., Burgun, A.: Aligning Knowledge Sources in the UMLS: Methods. Quantitative Results, and Applications. Medinfo., 327–331 (2004)
Google Scholar
Bodenreider, O., Mitchell, J.A., McCray, A.T.: Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics. In: Proc AMIA Symp., pp. 61–65 (2002)
Google Scholar
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J., Vingron, M.: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 29(4), 365–371 (2001)
Article Google Scholar
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GENBANK. update, Nucleic Acids Res. 32, 23–26 (2004)
Article Google Scholar
Burgun, A., Bodenreider, O., Le Duff, F., Moussouni, F., Loréal, O.: Representation of roles in biomedical ontologies: a case study in functional genomics. In: JAMIA (supl), Proc. AMIA 2002 Symp, pp. 86–90 (2002)
Google Scholar
Cornell, M., Paton, N.W., Wu, S., Goble, C.A., Miller, C.J., Kirby, P., Eilbeck, K., Brass, A., Hayes, A., Oliver, S.G.: GIMS - a data warehouse for storage and analysis of genome sequence and functional data. In: Proc. 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE), pp. 15–22 (2001)
Google Scholar
Do, H.-H., Rahm, E.: Flexible Integration of Molecular-biological Annotation Data: The GenMapper Approach. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 811–822. Springer, Heidelberg (2004)
Chapter Google Scholar
Fellenberg, K., Hauser, N.C., Brors, B., Hoheisel, J.D., Vingron, M.: Microarray data warehouse allowing for inclusion of experiment annotations in statistical analysis. Bioinformatics 18, 423–433 (2002)
Article Google Scholar
Guerin, E., Marquet, G., Moussouni, F., Burgun, A., Mougin, F., Loréal, O.: Deployment of heterogeneous ressources of genomic, biological and medical knowledge on the liver to build a datawarehouse. In: Proc. ECCB 2003, pp. 59–60 (2003)
Google Scholar
Harris, M.A., et al.: Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32(Database issue), D258–D261 (2004)
Google Scholar
Kashyap, V., Sheth, A.: Schematic and semantic similarities between database objects: a context –based approach. Int. J. Very Large Data Bases 5(4), 276–304 (1996)
Article Google Scholar
Lakshmanan, L., Sadri, F., Subramanian, I.: On the logical Foundation of Schema Integration and Evolution in Heterogeneous Database Systems. In: DOOD International Conference, pp. 81–100 (1993)
Google Scholar
Lenzerini, M.: Data integration: a theoretical perspective. In: Proc. of PODS 2002 (2002)
Google Scholar
Marquet, G., Burgun, A., Moussouni, F., Guerin, E., Le Duff, F., Loreal, O.: BioMeKE: an ontology-based biomedical knowledge extraction system devoted to transcriptome analysis. Stud Health Technol Inform. 95, 80–85 (2003)
Google Scholar
Paton, N.W., Khan, S.A., Hayes, A., Moussouni, F., Brass, A., Eilbeck, K., Goble, C.A., Hubbard, S.J., Oliver, S.G.: Conceptual modelling of genomic information. Bioinformatics 16, 548–557 (2000)
Article Google Scholar
Povey, S., Lovering, R., Bruford, E., Wright, M., Lush, M., Wain, H.: The HUGO Gene Nomenclature Committee (HGNC). Hum Genet. 109(6), 678–680 (2001)
Article Google Scholar
Tuason, O., Chen, L., Liu, H., Blake, J.A., Friedman, C.: Biological nomenclatures: a source of lexical knowledge and ambiguity. In: Pac Symp Biocomput., pp. 238–249 (2004)
Google Scholar
MGED Microarray Gene Expression Data (MGED). A guide to microarray experiments – an open letter to the scientific journals. Lancet. 360(9338), 1019 (2002)
Google Scholar
Galhardas, H., Florescu, D., Sasha, D., Simon, E., Saita, C.-A.: Declarative Data Cleaning: Model, Language, and Algorithms. In: 27th Conference on Very Large Database Systems, Rome, Italy (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

INSERM U522 CHU Pontchaillou, 35033, Rennes, France
E. Guérin, O. Loréal & F. Moussouni
Faculté de Médecine, EA 3888 LIM, 35043, Rennes, France
G. Marquet & A. Burgun
IRISA, Campus Universitaire de Beaulieu, 35042, Rennes, France
L. Berti-Equille
Dep. for Computer Science, Humboldt-Universität, 10099, Berlin, Germany
U. Leser

Authors

E. Guérin
View author publications
You can also search for this author in PubMed Google Scholar
G. Marquet
View author publications
You can also search for this author in PubMed Google Scholar
A. Burgun
View author publications
You can also search for this author in PubMed Google Scholar
O. Loréal
View author publications
You can also search for this author in PubMed Google Scholar
L. Berti-Equille
View author publications
You can also search for this author in PubMed Google Scholar
U. Leser
View author publications
You can also search for this author in PubMed Google Scholar
F. Moussouni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Davis,
Bertram Ludäscher
University of Maryland, College Park, 20742, MD, USA
Louiqa Raschid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guérin, E. et al. (2005). Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_14

Download citation

DOI: https://doi.org/10.1007/11530084_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics