Skip to main content

On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB

  • Conference paper
Database and Expert Systems Applications (DEXA 2009)

Abstract

Data intensive applications in Life Sciences extensively use the hidden web as a platform for information sharing. Access to these heterogeneous hidden web resources is limited through the use of predefined web forms and interactive interfaces that users navigate manually, and assume responsibility for reconciling schema heterogeneity, extracting information and piping, transforming formats and so on in order to implement desired query sequences or scientific work flows. In this paper, we present a new data management system, called LifeDB, in which we offer support for currency without view materialization, and autonomous reconciliation of schema heterogeneity in one single platform through a declarative query language called BioFlow. In our approach, schema heterogeneity is resolved at run time by treating the hidden web resources as a virtual warehouses, and by supporting a set of primitives for data integration on-the-fly, extracting information and piping to other resources, and manipulating data in a way similar to traditional database systems to respond to application demands.

Research supported in part by National Science Foundation grants CNS 0521454 and IIS 0612203, and National Institutes of Health NIDA grant 1R03DA026021-01.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amin, M.S., Jamil, H.: FastWrap: An efficient wrapper for tabular data extraction from the web. In: IEEE International Conference on Information Reuse and Integration, Las Vegas, Nevada (August 2009)

    Google Scholar 

  2. Bauckmann, J.: Automatically Integrating Life Science Data Sources. In: VLDB PhD Workshop (2007)

    Google Scholar 

  3. Bhattacharjee, A., Jamil, H.: OntoMatch: A monotonically improving schema matching system for autonomous data integration. In: IEEE International Conference on Information Reuse and Integration, Las Vegas, Nevada (August 2009)

    Google Scholar 

  4. Chang, K., He, B., Zhang, Z.: Toward large scale integration: Building a MetaQuerier over databases on the web. In: CIDR Conference (2005)

    Google Scholar 

  5. Chen, L., Jamil, H.M.: On using remote user defined functions as wrappers for biological database interoperability. International Journal of Cooperative Information Systems 12(2), 161–195 (2003)

    Article  Google Scholar 

  6. Chu, E., Baid, A., Chen, T., Doan, A., Naughton, J.F.: A relational approach to incrementally extracting and querying structure in unstructured data. In: VLDB 2007, Vienna, Austria, pp. 1045–1056 (2007)

    Google Scholar 

  7. Davidson, S.B., Overton, G.C., Tannen, V., Wong, L.: BioKleisli: A digital library for biomedical researchers. International Journal on Digital Libraries 1(1), 36–53 (1997)

    Google Scholar 

  8. Gusfield, D., Stoye, J.: Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593–602 (2006)

    Article  Google Scholar 

  9. Hoon, S., Ratnapu, K.K., Chia, J.-M., Kumarasamy, B., Juguang, X., Clamp, M., Stabenau, A., Potter, S., Clarke, L., Stupka, E.: Biopipe: A flexible framework for protocol-based bioinformatics analysis. Genome Research 13(8), 1904–1915 (2003)

    Google Scholar 

  10. Hossain, S., Jamil, H.: A visual interface for on-the-fly biological database integration and workflow design using VizBuilder. In: 6th International Workshop on Data Integration in the Life Sciences, Manchester, UK (July 2009)

    Google Scholar 

  11. Jamil, H., El-Hajj-Diab, B.: BioFlow: A web-based declarative workflow language for Life Sciences. In: 2nd IEEE Workshop on Scientific Workflows, Honolulu, Hawaii, pp. 453–460. IEEE Computer Society Press, Los Alamitos (2008)

    Google Scholar 

  12. Jamil, H., Islam, A.: The power of declarative languages: A comparative exposition of scientific workflow design using BioFlow and Taverna. In: 3rd IEEE Workshop on Scientific Workflows, Los Angeles, CA, July 2009, IEEE Computer, Los Alamitos (2009)

    Google Scholar 

  13. Laender, A., Ribeiro-Neto, B., da Silva, A.S.: DEByE - date extraction by example. Data Knowl. Eng. 40(2), 121–154 (2002)

    Article  MATH  Google Scholar 

  14. Minton, S.N., Nanjo, C., Knoblock, C.A., Michalowski, M., Michelson, M.: A heterogeneous field matching method for record linkage. In: ICDM, November 2005, vol. 27 (2005)

    Google Scholar 

  15. Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB 2006, pp. 691–702 (2006)

    Google Scholar 

  16. Zhang, Y., Boncz, P.: XRPC: interoperable and efficient distributed XQuery. In: VLDB, pp. 99–110 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bhattacharjee, A. et al. (2009). On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2009. Lecture Notes in Computer Science, vol 5690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03573-9_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03573-9_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03572-2

  • Online ISBN: 978-3-642-03573-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics