Skip to main content

Data Provenance Support in Relational Databases for Stored Procedures

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5667))

Included in the following conference series:

  • 498 Accesses

Abstract

The increasing amounts of data produced by automated scientific instruments require scalable data management platforms for storing, transforming and analyzing scientific data. At the same time, it is paramount for scientific applications to keep track of the provenance information for quality control purposes and to be able to re-trace workflow steps. Relational database systems are designed to efficiently manage and analyze large data volumes, and modern extensible database systems can also host complex data transformations as stored procedures. However, the relational model does not naturally support data provenance or lineage tracking. In this paper, we focus on providing data provenance management in relational databases for stored procedures. Our approach, called PSP, leverages the XML capabilities of SQL:2003 to keep track of the lineage of the data that has been processed by any stored procedure in a relational database as part of a scientific workflow. We show how this approach can be implemented in a state-of-the-art DBMS and discuss how the captured provenance data can be efficiently queried and analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Genomes, "1000 Genomes," 16/10/2008 (2008)

    Google Scholar 

  2. NCBI, "Growth of GenBank," vol. 2008 (2006)

    Google Scholar 

  3. Harvey, B.N., Mark, H.E., John, A.O.: Data-intensive e-science frontier research. Commun. ACM 46, 68–77 (2003)

    Article  Google Scholar 

  4. Yogesh, L.S., Beth, P., Dennis, G.: A survey of data provenance in e-science. SIGMOD Rec. 34, 31–36 (2005)

    Article  Google Scholar 

  5. Röhm, U., Diep, T.-M.: How to BLAST your database — A study of stored procedures for BLAST searches. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 807–816. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Alexander, S.S., Jim, G., Ani, R.T., Peter, Z.K., Tanu, M., Jordan, R., Christopher, S., Jan, v.: The SDSS skyserver: public access to the sloan digital sky server data. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, Wisconsin. ACM, New York (2002)

    Google Scholar 

  7. Stonebraker, M., Becla, J., Lim, K., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for Science Data Bases and SciDB. In: Presented at CIDR, Asilomar, CA, USA (2009)

    Google Scholar 

  8. Röhm, U., Blakeley, J.A.: Data Management for High-Throughput Genomics. In: Presented at CIDR, Asilomar, CA, USA (2009)

    Google Scholar 

  9. Peter, B., Adriane, C., James, C.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, Chicago, IL, USA. ACM, New York (2006)

    Google Scholar 

  10. Laura, C., Wang-Chiew, T., Gaurav, V.: DBNotes: a post-it system for relational databases based on provenance. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, Maryland. ACM, New York (2005)

    Google Scholar 

  11. Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. The VLDB Journal 12, 41–58 (2003)

    Article  Google Scholar 

  12. Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  13. Roger, S.B., Luciano, A.D.: Automatic capture and efficient storage of e-Science experiment provenance. Concurr. Comput.: Pract. Exper. 20, 419–429 (2008)

    Article  Google Scholar 

  14. Deepavali, B., Laura, C., Wang-Chiew, T., Gaurav, V.: An annotation management system for relational databases. In: Proceedings of the Thirtieth international conference on Very large data bases, Toronto, Canada: VLDB Endowment, vol. 30 (2004)

    Google Scholar 

  15. Benjelloun, O., Das Sarma, A., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. The VLDB Journal The International Journal on Very Large Data Bases 17, 243–264 (2008)

    Article  Google Scholar 

  16. Peter, B., James, C., Wang-Chiew, T., Stijn, V.: Curated databases. In: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, Vancouver, Canada. ACM, New York (2008)

    Google Scholar 

  17. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucl. Acids Res. 35, D21–D25 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jurnawan, W., Röhm, U. (2009). Data Provenance Support in Relational Databases for Stored Procedures. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04205-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04205-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04204-1

  • Online ISBN: 978-3-642-04205-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics