Skip to main content

Why and Where: A Characterization of Data Provenance

  • Conference paper
  • First Online:
Database Theory — ICDT 2001 (ICDT 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1973))

Included in the following conference series:

Abstract

With the proliferation of database views and curated data- bases, the issue of data provenance - where a piece of data came from and the process by which it arrived in the database - is becoming increasingly important, especially in scientific databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query. We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between “why” provenance (refers to the source data that had some influence on the existence of the data) and “where” provenance (refers to the location(s) in the source databases from which the data was extracted).

Supported in part by an Alfred P. Sloan Research Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. INFOBIOGEN. DBCAT, The Public Catalog of Databases. http://www.infobiogen.fr/services/dbcat/, cited 5 June 2000.

  2. A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, pages 91–102, 1997.

    Google Scholar 

  3. S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web. From Relations to Semistructured Data and XML. Morgan Kaufman, 2000.

    Google Scholar 

  4. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995.

    Google Scholar 

  5. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The lorel query language for semistructured data. Journal on Digital Libraries, 1(1), 1996.

    Google Scholar 

  6. P. Buneman, A. Deutsch, and W. Tan. A Deterministic Model for Semistructured Data. In Proc. of the Workshop On Query Processing for Semistructured Data and Non-standard Data Formats, pages 14–19, 1999.

    Google Scholar 

  7. Y. Cui and J. Widom. Practical lineage tracing in data warehouses. In ICDE, pages 367–378, 2000.

    Google Scholar 

  8. A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A Query Language for XML, 1998. http://www.w3.org/TR/NOTE-xml-ql.

  9. R. Durbin and J. T. Mieg. ACeDB-A C. elegans Database: Syntactic definitions for the ACeDB data base manager, 1992. http://probe.nalusda.gov:8000/acedocs/syntax.html.

  10. H. Liefke and S. Davidson. Efficient View Maintenance in XML Data Warehouses. Technical Report MS-CIS-99-27, University of Pennsylvania, 1999.

    Google Scholar 

  11. A. Klug. On conjuncitve queries containing inequalities. Journal of the ACM, 1(1):146–160, 1988.

    Article  MathSciNet  Google Scholar 

  12. L. Wong. Normal Forms and Conservative Properties for Query Languages over Collection Types. In PODS, Washington, D.C., May 1993.

    Google Scholar 

  13. P. Buneman and S. Davidson and G. Hillebrand and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In SIGMOD, pages 505–516, 1996.

    Google Scholar 

  14. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In ICDE, 1996.

    Google Scholar 

  15. World Wide Web Consortium (W3C). Document Object Model (DOM) Level 1 Specification, 2000. http://www.w3.org/TR/REC-DOM-Level-1.

  16. World Wide Web Consortium (W3C). XML Schema Part 0: Primer, 2000. http://www.w3.org/TR/xmlschema-0/.

  17. Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a warehousing environment. In SIGMOD, pages 316–327, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Buneman, P., Khanna, S., Wang-Chiew, T. (2001). Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-44503-X_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41456-8

  • Online ISBN: 978-3-540-44503-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics