Skip to main content

Provenance Storage

  • Reference work entry
  • First Online:
  • 71 Accesses

Synonyms

History storage; Lineage storage; Pedigree organization; Provenance organization

Definition

Given the provenance of data processing or manipulation (e.g., through ad hoc manipulations, workflows, or database operators), provenance storage defines how the provenance information is stored on disk. Provenance information essentially captures all information describing the history, creation, and modification of a data product. In the context of workflows, for example, relevant information includes but is not limited to the parameters used in each step of the workflow recursively, software versions used, etc. Provenance storage defines where and how this information is stored and organized on disk.

Historical Background

The original academic works on digital provenance focused on provenance within relational databases [16768,16769,3]. However, workflow systems also found a use for provenance and quickly began capturing and storing provenance information [16771,16772,16773,16774,8]....

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Buneman P, Khanna S, Tan W-C Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; p. 316–30.

    Chapter  Google Scholar 

  2. Cui Y, Widom J. Practical lineage tracing in data warehouses. In: Proceedings of the 16th International Conference on Data Engineering; p. 367–78.

    Google Scholar 

  3. Woodruff A, Stonebraker M. Supporting fine-grained data Lineage in a database visualization environment. In: Proceedings of the 13th International Conference on Data Engineering; p. 97–102.

    Google Scholar 

  4. Altintas I, Barney O, Jaeger-Frank E. Provenance collection support in the Kepler scientific workflow system. In: Proceedings of the International Provenance and Annotation Workshop; 2006. p. 118–32.

    Chapter  Google Scholar 

  5. Foster I, Vockler J, Eilde M, Zhao Y. Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management; 2002. p. 37–46.

    Google Scholar 

  6. Freire J, Silva CT, et al. Managing rapidly-evolving scientific workflows, managing rapidly-evolving scientific workflows. 2006.

    Chapter  Google Scholar 

  7. Simmhan Y, Plale B, Gannon D. A framework for collecting provenance in data-centric scientific workflows. In: Proceedings of the IEEE International Conference on Web Services; 2006.

    Google Scholar 

  8. Wong SC, Miles S, Fang W, Groth P, Moreau L. Provenance-based validation of E-Science experiments. In: Proceedings of the 4th International Semantic Web Conference, Lecture Notes in Computer Science. 2005. p. 801–15.

    Chapter  Google Scholar 

  9. Anand MK, Bowers S, McPhillips T, Ludascher B. Efficient provenance storage over nested data collections. In: Advances in Database Technology, Proceedings of the 12th International Conference on Extending Database Technology; 2009. p. 958–69.

    Google Scholar 

  10. Artem Chebotko SL, Fei X, Fotouhi F. RDFPROV: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl Eng. 2010;69(8):836–65.

    Article  Google Scholar 

  11. Buneman P, Chapman A, Cheney J. Provenance management in curated databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 539–50.

    Google Scholar 

  12. Heinis T, Alonso G. Efficient lineage tracking for scientific workflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1007–18.

    Google Scholar 

  13. Xiey Y, Muniswamy-Reddy K-K, Fengy D, Liz Y, Longz DDE, Tany Z, Chen L. A hybrid approach for efficient provenance storage. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012.

    Google Scholar 

  14. Park H, Ikeda R, Widom J. RAMP: a system for capturing and tracing provenance in mapreduce workflows. In: Proceedings of the 37th International Conference on Very Large Data Bases; 2011.

    Google Scholar 

  15. Mason C. Cryptographic binding of metadata, The National Security Agency’s Review of Emerging Technologies, vol. 18. 2009.

    Google Scholar 

  16. Allen MD, Chapman A, Blaustein B. Engineering choices for open world provenance. In: Proceedings of the 6th International Provenance and Annotation Workshop; 2014.

    Google Scholar 

  17. Dey S, Agun M, Wang M, Ludäscher B, Bowers S, Missier P. A provenance repository for storing and retrieving data lineage information, Technical Report, DataONE Provenance & Workflow Working Group. 2011.

    Google Scholar 

  18. Missier P, Chen Z. Extracting PROV provenance traces from Wikipedia history pages. In: Proceedings of the 16th International Conference on Extending Database Technology; 2013.

    Google Scholar 

  19. Robinson I, Webber J, E. Eifrem. Graph databases. O’Reilly Media, Inc.; 2013.

    Google Scholar 

  20. Dublin Core Metadata Initiative Usage Board. DCMI Metadata Terms: A complete historical record. Dublin Core Metadata Initiative (DCMI), Online, 2014.

    Google Scholar 

  21. Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers J, Plale B, Simmhan Y, Stephan E, Van den Bussche J. The Open Provenance Model core specification (v1.1), Future Generation Computer Systems 2011;27:6, 743–756.

    Article  Google Scholar 

  22. Moreau L, Groth P. Provenance an introduction to PROV. Morgan & Claypool Publishers; 2013.

    Google Scholar 

  23. Groth P, Moreau L. PROV-Overview. World Wide Web Consortium (W3C), Online, 2013.

    Google Scholar 

  24. Abawajy JH, Jami SI, Shaikh ZA, Hammad SA. A framework for scalable distributed provenance storage system. Comput Stand Interfaces. 2013;35(1):179–86.

    Article  Google Scholar 

  25. Allen MD, Chapman A, Blaustein B, Seligman L. Getting it together: enabling multi-organization provenance exchange. In: Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance; 2011.

    Google Scholar 

  26. Groth P, Jiang S, Miles S, Munroe S, Tan V, Tsasakou S, Moreau L. An architecture for provenance systems, Technical Report. ECS, University of Southampton. 2006.

    Google Scholar 

  27. Zhao D, Shou C, Malik T, Raicu I. Distributed data provenance for large-scale data-intensive computing. IEEE Cluster. 2013.

    Google Scholar 

  28. Groth P, Miles S, Moreau L. PReServ: provenance recording for services, UK OST e-Science second AHM. 2005.

    Google Scholar 

  29. PLUS. https://github.com/plus-provenance/plus

  30. Simmhan Y, Plale B, Gannon D. Karma2: provenance management for data driven workflows. J Web Ser Res. 2008;5(2):1–22.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Heinis .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Heinis, T., Chapman, A. (2018). Provenance Storage. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80746

Download citation

Publish with us

Policies and ethics