Skip to main content

Hybrid Integration of Molecular-Biological Annotation Data

  • Conference paper
Data Integration in the Life Sciences (DILS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3615))

Included in the following conference series:

Abstract

We present a new approach to integrate annotation data from public sources for the expression analysis of genes and proteins. Expression data is materialized in a data warehouse supporting high performance for data-intensive analysis tasks. On the other hand, annotation data is integrated virtually according to analysis needs. Our virtual integration utilizes the commercial product SRS (Sequence Retrieval System) of LION bioscience. To couple the data warehouse and SRS, we implemented a query mediator exploiting correspondences between molecular-biological objects explicitly captured from public data sources. This hybrid integration approach has been implemented for a large gene expression warehouse and supports functional analysis using annotation data from GeneOntology, Locuslink and Ensembl. The paper motivates the chosen approach, details the integration concept and implementation, and provides results of preliminary performance tests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000), http://www.geneontology.org

    Google Scholar 

  • Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Research 28, 304–305 (2000), http://www.expasy.org/enzyme

  • Birney, E., et al.: An Overview of Ensembl. Genome Research 14, 925–928 (2004)

    Google Scholar 

  • Chen, J., Chung, S.Y., Wong, L.: The Kleisli Query System as a Backbone for Bioinformatics Data Integration and Analysis. In: [LC 2003]: 147-187

    Google Scholar 

  • Cheng, J. et al.: NetAffx gene ontology mining tool: a visual approach for microarray data analysis. Bioinformatics 20(9), 1462-1463, 2004.

    Google Scholar 

  • Do, H.-H., Rahm, E.: Flexible Integration of Molecular-biological Annotation Data: The GenMapper Approach. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 811–822. Springer, Heidelberg (2004)

    Google Scholar 

  • Etzold, T., Harris, H., Beaulah, S.: SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics. In: [LC 2003], pp. 109–145 (2003)

    Google Scholar 

  • Galperin, M.Y.: The Molecular Biology Database Collection - 2004 update. Nucleic Acids Research 32 (Database issue) (2004)

    Google Scholar 

  • Haas, L., et al.: DiscoveryLink – A System for Integrated Access to Life Sciences Data Sources. IBM System Journal 40(2) (2001)

    Google Scholar 

  • Hernandez, T., Kambhampati, S.: Integration of Biological Sources: Current Systems and Challenges Ahead. SIGMOD Record. 33(3) (2004)

    Google Scholar 

  • Kirsten, T., Do, H.-H., Rahm, E.: A Multidimensional Data Warehouse for Gene Expression Analysis. In: Proc. German Conference on Bioinformatics, Munich (2003)

    Google Scholar 

  • Kirsten, T., Do, H.-H., Rahm, E.: A Data Warehouse for Multidimensional Gene Expression Analysis. Technical Report, IZBI, University of Leipzig (2004)

    Google Scholar 

  • Lacroix, Z., et al.: Links and Paths through Life Science Data Sources. In: Ra 2004, pp. 203–211 (2004)

    Google Scholar 

  • Lacroix, Z., Critchlow, T. (Hrsg.): Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  • Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: Proc. 2nd Conference on Innovative Data Systems Research, CIDR 2005 (2005)

    Google Scholar 

  • Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI Gene-centered Resources. Nucleic Acids Research 29(1) (2001), http://www.ncbi.nlm.nih.gov/projects/LocusLink/

  • Potter, S.C., et al.: The Ensembl Analysis Pipeline. Genome Research 14, 934–941 (2004)

    Google Scholar 

  • Rahm, E. (ed.): DILS 2004. LNCS (LNBI), vol. 2994. Springer, Heidelberg (2004)

    Google Scholar 

  • Rother, K., et al.: COLUMBA: Multidimensional Data Integration of Protein Annotations. In: Ra 2004, pp. 156–171 (2004)

    Google Scholar 

  • Stein, L.: Integrating Biological Databases. Nature Review Genetics 4(5), 337–345 (2003)

    Google Scholar 

  • Wheeler, D.L., et al.: Database Resources of the National Center for Biotechnology. Nucleic Acids Research 31, 28–33 (2003), http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene

  • Wong, L.: Kleisli, a Functional Query System. Journal of Functional Programming 1(1), 102–111 (1998)

    Google Scholar 

  • Zdobnov, E.M., et al.: The EBI SRS server – recent developments. Bioinformatics 18, 368–373 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kirsten, T., Do, HH., Körner, C., Rahm, E. (2005). Hybrid Integration of Molecular-Biological Annotation Data. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_17

Download citation

  • DOI: https://doi.org/10.1007/11530084_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27967-9

  • Online ISBN: 978-3-540-31879-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics