Why and Where: A Characterization of Data Provenance

Buneman, Peter; Khanna, Sanjeev; Wang-Chiew, Tan

doi:10.1007/3-540-44503-X_20

Peter Buneman⁶,
Sanjeev Khanna⁶ &
Tan Wang-Chiew⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1973))

Included in the following conference series:

International Conference on Database Theory

3437 Accesses
356 Citations
17 Altmetric

Abstract

With the proliferation of database views and curated data- bases, the issue of data provenance - where a piece of data came from and the process by which it arrived in the database - is becoming increasingly important, especially in scientific databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query. We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between “why” provenance (refers to the source data that had some influence on the existence of the data) and “where” provenance (refers to the location(s) in the source databases from which the data was extracted).

Supported in part by an Alfred P. Sloan Research Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

INFOBIOGEN. DBCAT, The Public Catalog of Databases. http://www.infobiogen.fr/services/dbcat/, cited 5 June 2000.
A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, pages 91–102, 1997.
Google Scholar
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web. From Relations to Semistructured Data and XML. Morgan Kaufman, 2000.
Google Scholar
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995.
Google Scholar
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The lorel query language for semistructured data. Journal on Digital Libraries, 1(1), 1996.
Google Scholar
P. Buneman, A. Deutsch, and W. Tan. A Deterministic Model for Semistructured Data. In Proc. of the Workshop On Query Processing for Semistructured Data and Non-standard Data Formats, pages 14–19, 1999.
Google Scholar
Y. Cui and J. Widom. Practical lineage tracing in data warehouses. In ICDE, pages 367–378, 2000.
Google Scholar
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A Query Language for XML, 1998. http://www.w3.org/TR/NOTE-xml-ql.
R. Durbin and J. T. Mieg. ACeDB-A C. elegans Database: Syntactic definitions for the ACeDB data base manager, 1992. http://probe.nalusda.gov:8000/acedocs/syntax.html.
H. Liefke and S. Davidson. Efficient View Maintenance in XML Data Warehouses. Technical Report MS-CIS-99-27, University of Pennsylvania, 1999.
Google Scholar
A. Klug. On conjuncitve queries containing inequalities. Journal of the ACM, 1(1):146–160, 1988.
Article MathSciNet Google Scholar
L. Wong. Normal Forms and Conservative Properties for Query Languages over Collection Types. In PODS, Washington, D.C., May 1993.
Google Scholar
P. Buneman and S. Davidson and G. Hillebrand and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In SIGMOD, pages 505–516, 1996.
Google Scholar
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In ICDE, 1996.
Google Scholar
World Wide Web Consortium (W3C). Document Object Model (DOM) Level 1 Specification, 2000. http://www.w3.org/TR/REC-DOM-Level-1.
World Wide Web Consortium (W3C). XML Schema Part 0: Primer, 2000. http://www.w3.org/TR/xmlschema-0/.
Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a warehousing environment. In SIGMOD, pages 316–327, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, 200 South 33rd Street, PA 19104, Philadelphia, USA
Peter Buneman, Sanjeev Khanna & Tan Wang-Chiew

Authors

Peter Buneman
View author publications
You can also search for this author in PubMed Google Scholar
Sanjeev Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Tan Wang-Chiew
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Limburg University (LUC), 3590, Diepenbeek, Belgium
Jan Van den Bussche
Department of Computer Science and Engineering, University of California, 92093-0114, La Jolla, CA, USA
Victor Vianu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buneman, P., Khanna, S., Wang-Chiew, T. (2001). Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_20

Download citation

DOI: https://doi.org/10.1007/3-540-44503-X_20
Published: 12 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41456-8
Online ISBN: 978-3-540-44503-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics