Abstract
The increasing amounts of data produced by automated scientific instruments require scalable data management platforms for storing, transforming and analyzing scientific data. At the same time, it is paramount for scientific applications to keep track of the provenance information for quality control purposes and to be able to re-trace workflow steps. Relational database systems are designed to efficiently manage and analyze large data volumes, and modern extensible database systems can also host complex data transformations as stored procedures. However, the relational model does not naturally support data provenance or lineage tracking. In this paper, we focus on providing data provenance management in relational databases for stored procedures. Our approach, called PSP, leverages the XML capabilities of SQL:2003 to keep track of the lineage of the data that has been processed by any stored procedure in a relational database as part of a scientific workflow. We show how this approach can be implemented in a state-of-the-art DBMS and discuss how the captured provenance data can be efficiently queried and analyzed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Genomes, "1000 Genomes," 16/10/2008 (2008)
NCBI, "Growth of GenBank," vol. 2008 (2006)
Harvey, B.N., Mark, H.E., John, A.O.: Data-intensive e-science frontier research. Commun. ACM 46, 68–77 (2003)
Yogesh, L.S., Beth, P., Dennis, G.: A survey of data provenance in e-science. SIGMOD Rec. 34, 31–36 (2005)
Röhm, U., Diep, T.-M.: How to BLAST your database — A study of stored procedures for BLAST searches. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 807–816. Springer, Heidelberg (2006)
Alexander, S.S., Jim, G., Ani, R.T., Peter, Z.K., Tanu, M., Jordan, R., Christopher, S., Jan, v.: The SDSS skyserver: public access to the sloan digital sky server data. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, Wisconsin. ACM, New York (2002)
Stonebraker, M., Becla, J., Lim, K., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for Science Data Bases and SciDB. In: Presented at CIDR, Asilomar, CA, USA (2009)
Röhm, U., Blakeley, J.A.: Data Management for High-Throughput Genomics. In: Presented at CIDR, Asilomar, CA, USA (2009)
Peter, B., Adriane, C., James, C.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, Chicago, IL, USA. ACM, New York (2006)
Laura, C., Wang-Chiew, T., Gaurav, V.: DBNotes: a post-it system for relational databases based on provenance. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, Maryland. ACM, New York (2005)
Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. The VLDB Journal 12, 41–58 (2003)
Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Roger, S.B., Luciano, A.D.: Automatic capture and efficient storage of e-Science experiment provenance. Concurr. Comput.: Pract. Exper. 20, 419–429 (2008)
Deepavali, B., Laura, C., Wang-Chiew, T., Gaurav, V.: An annotation management system for relational databases. In: Proceedings of the Thirtieth international conference on Very large data bases, Toronto, Canada: VLDB Endowment, vol. 30 (2004)
Benjelloun, O., Das Sarma, A., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. The VLDB Journal The International Journal on Very Large Data Bases 17, 243–264 (2008)
Peter, B., James, C., Wang-Chiew, T., Stijn, V.: Curated databases. In: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, Vancouver, Canada. ACM, New York (2008)
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucl. Acids Res. 35, D21–D25 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jurnawan, W., Röhm, U. (2009). Data Provenance Support in Relational Databases for Stored Procedures. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04205-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-04205-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04204-1
Online ISBN: 978-3-642-04205-8
eBook Packages: Computer ScienceComputer Science (R0)