Synonyms
History storage; Lineage storage; Pedigree organization; Provenance organization
Definition
Given the provenance of data processing or manipulation (e.g., through ad hoc manipulations, workflows, or database operators), provenance storage defines how the provenance information is stored on disk. Provenance information essentially captures all information describing the history, creation, and modification of a data product. In the context of workflows, for example, relevant information includes but is not limited to the parameters used in each step of the workflow recursively, software versions used, etc. Provenance storage defines where and how this information is stored and organized on disk.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Buneman P, Khanna S, Tan W-C Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; p. 316–30.
Cui Y, Widom J. Practical lineage tracing in data warehouses. In: Proceedings of the 16th International Conference on Data Engineering; p. 367–78.
Woodruff A, Stonebraker M. Supporting fine-grained data Lineage in a database visualization environment. In: Proceedings of the 13th International Conference on Data Engineering; p. 97–102.
Altintas I, Barney O, Jaeger-Frank E. Provenance collection support in the Kepler scientific workflow system. In: Proceedings of the International Provenance and Annotation Workshop; 2006. p. 118–32.
Foster I, Vockler J, Eilde M, Zhao Y. Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management; 2002. p. 37–46.
Freire J, Silva CT, et al. Managing rapidly-evolving scientific workflows, managing rapidly-evolving scientific workflows. 2006.
Simmhan Y, Plale B, Gannon D. A framework for collecting provenance in data-centric scientific workflows. In: Proceedings of the IEEE International Conference on Web Services; 2006.
Wong SC, Miles S, Fang W, Groth P, Moreau L. Provenance-based validation of E-Science experiments. In: Proceedings of the 4th International Semantic Web Conference, Lecture Notes in Computer Science. 2005. p. 801–15.
Anand MK, Bowers S, McPhillips T, Ludascher B. Efficient provenance storage over nested data collections. In: Advances in Database Technology, Proceedings of the 12th International Conference on Extending Database Technology; 2009. p. 958–69.
Artem Chebotko SL, Fei X, Fotouhi F. RDFPROV: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl Eng. 2010;69(8):836–65.
Buneman P, Chapman A, Cheney J. Provenance management in curated databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 539–50.
Heinis T, Alonso G. Efficient lineage tracking for scientific workflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1007–18.
Xiey Y, Muniswamy-Reddy K-K, Fengy D, Liz Y, Longz DDE, Tany Z, Chen L. A hybrid approach for efficient provenance storage. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012.
Park H, Ikeda R, Widom J. RAMP: a system for capturing and tracing provenance in mapreduce workflows. In: Proceedings of the 37th International Conference on Very Large Data Bases; 2011.
Mason C. Cryptographic binding of metadata, The National Security Agency’s Review of Emerging Technologies, vol. 18. 2009.
Allen MD, Chapman A, Blaustein B. Engineering choices for open world provenance. In: Proceedings of the 6th International Provenance and Annotation Workshop; 2014.
Dey S, Agun M, Wang M, Ludäscher B, Bowers S, Missier P. A provenance repository for storing and retrieving data lineage information, Technical Report, DataONE Provenance & Workflow Working Group. 2011.
Missier P, Chen Z. Extracting PROV provenance traces from Wikipedia history pages. In: Proceedings of the 16th International Conference on Extending Database Technology; 2013.
Robinson I, Webber J, E. Eifrem. Graph databases. O’Reilly Media, Inc.; 2013.
Dublin Core Metadata Initiative Usage Board. DCMI Metadata Terms: A complete historical record. Dublin Core Metadata Initiative (DCMI), Online, 2014.
Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers J, Plale B, Simmhan Y, Stephan E, Van den Bussche J. The Open Provenance Model core specification (v1.1), Future Generation Computer Systems 2011;27:6, 743–756.
Moreau L, Groth P. Provenance an introduction to PROV. Morgan & Claypool Publishers; 2013.
Groth P, Moreau L. PROV-Overview. World Wide Web Consortium (W3C), Online, 2013.
Abawajy JH, Jami SI, Shaikh ZA, Hammad SA. A framework for scalable distributed provenance storage system. Comput Stand Interfaces. 2013;35(1):179–86.
Allen MD, Chapman A, Blaustein B, Seligman L. Getting it together: enabling multi-organization provenance exchange. In: Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance; 2011.
Groth P, Jiang S, Miles S, Munroe S, Tan V, Tsasakou S, Moreau L. An architecture for provenance systems, Technical Report. ECS, University of Southampton. 2006.
Zhao D, Shou C, Malik T, Raicu I. Distributed data provenance for large-scale data-intensive computing. IEEE Cluster. 2013.
Groth P, Miles S, Moreau L. PReServ: provenance recording for services, UK OST e-Science second AHM. 2005.
Simmhan Y, Plale B, Gannon D. Karma2: provenance management for data driven workflows. J Web Ser Res. 2008;5(2):1–22.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Heinis, T., Chapman, A. (2018). Provenance Storage. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80746
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80746
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering