Characterizing Data Provenance

Buneman, Peter

doi:10.1007/3-540-45033-5_12

Characterizing Data Provenance

Peter Buneman⁶

Conference paper
First Online: 11 November 2000

407 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1832))

Abstract

When you see some data on the Web, do you ever wonder how it got there? The chances are that it is in no sense original, but was copied from some other source, which in turn was copied from some other source, and so on. If you are a scientist using a scientific database or some other kind of scholar using a digital library, you will probably be keenly interested in this information because it is crucial to your assessment of the accuracy and timeliness of the data. Data provenance is the understanding of the history of a piece of data: its origins and the process by which it travelled from database to database. Existing database tools give us little or no help in recording provenance; indeed database schemas make it difficult to record this kind of information. I shall report on some recent work that characterizes data provenance. It is based on a model for data, both structured and semistructured, which accounts for both the structure and location of data. Using this model, we can draw a distinction between “why provenance” and “where provenance”. The former expresses all the data in the source databases that contributed to the existence of the data of interest; the latter specifies the locations from which it was drawn. In particular, we can take a query in a generic semistructured query language and use it to provide a formal derivation of both forms of provenance and to derive a number of useful properties of these forms. The work generalizes existing work on relational databases that is limited to why provenance. This is a report of joint work with Sanjeev Khanna and WangChiew Tan.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

University of Pennsylvania, Pennsylvania
Peter Buneman

Authors

Peter Buneman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Exeter, Prince of Wales Road, Exeter, EX4 4PT, UK
Brian Lings
Department for Computation and Information, CLRC Rutherford Appleton Laboratory, Chilton-Didcot, Oxon, OX11 0QX, UK
Keith Jeffery

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buneman, P. (2000). Characterizing Data Provenance. In: Lings, B., Jeffery, K. (eds) Advances in Databases. BNCOD 2000. Lecture Notes in Computer Science, vol 1832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45033-5_12

Download citation

DOI: https://doi.org/10.1007/3-540-45033-5_12
Published: 11 November 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67743-7
Online ISBN: 978-3-540-45033-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics