Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Provenance in Workflows

  • David KoopEmail author
  • Marta Mattoso
  • Juliana Freire
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80745


Computational provenance; Lineage; Origin; Source; History


Data and compute-intensive science require the ability to orchestrate computational steps and integrate distinct tools. Scientific workflow systems have been developed to structure such computations. A scientific workflow is a directed graph where a set of computational steps are linked together. Each computational module/actor/processor contains a set of input and output ports; a link/edge/channel/connection between an output of one module and the input of another indicates a data dependency. Modules may also have settable parameters that influence their computations. Workflow provenance may then include information about the specification of the workflow, the evolution of that specification, and executions of the workflow.

Historical Background

Workflows have been used to model business processes [14]. Business workflows, scripts, coordination languages, and dataflow systems are precursors of today’s...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Alper P, Belhajjame K, Goble C, Karagoz P. Small is beautiful: summarizing scientific workflows using semantic annotations. In: Proceedings of the 2013 IEEE International Congress on Big Data; 2013. p. 18–25.Google Scholar
  2. 2.
    Biton O, Cohen-Boulakia S, Davidson SB. Zoom* userviews: querying relevant provenance in workflow systems. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment; 2007. p. 1366–69.Google Scholar
  3. 3.
    Bose R, Frew J. Lineage retrieval for scientific data processing: a survey. ACM Comput Surv. 2005;37(1):1–28.CrossRefGoogle Scholar
  4. 4.
    Chapman AP, Jagadish HV, Ramanan P. Efficient provenance storage. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data; 2008. p. 993–1006.Google Scholar
  5. 5.
    Chirigati FS, Shasha D, Freire J. Packing experiments for sharing and publication. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 977–80.Google Scholar
  6. 6.
    Davidson SB, Boulakia SC, Eyal A, Ludäscher B, McPhillips TM, Bowers S, Anand MK, Freire J. Provenance in scientific workflow systems. IEEE Data Eng Bull. 2007;30(4):44–50.Google Scholar
  7. 7.
    Dias J, Guerra G, Rochinha F, Coutinho ALGA, Valduriez P, Mattoso M. Data-centric iteration in dynamic workflows. Futur Gener Comput Syst. 2015;46:114–26. http://dx.doi.org/10.1016/j.future.2014.10.021.CrossRefGoogle Scholar
  8. 8.
    Freire J, Koop D, Santos E, Silva C. Provenance for computational tasks: a survey. Comput Sci Eng. 2008;10(3):11–21.CrossRefGoogle Scholar
  9. 9.
    Freire J, Silva C, Callahan S, Santos E, Scheidegger C, Vo H. Managing rapidly-evolving scientific workflows. In: International Provenance and Annotation Workshop (IPAW), LNCS, vol. 4145. Springer; 2006. p. 10–8.Google Scholar
  10. 10.
    Koop D, Freire J, Silva CT. Visual summaries for graph collections. In: Visualization Symposium (PacificVis), 2013 IEEE Pacific; 2013. p. 57–64.Google Scholar
  11. 11.
    Mattoso M, Dias J, Ocaña KACS, Ogasawara E, Costa F, Horta F, Silva V, de Oliveira D. Dynamic steering of HPC scientific workflows: a survey. Futur Gener Comput Syst. 2015;46(May):100–13.CrossRefGoogle Scholar
  12. 12.
    Scheidegger CE, Vo HT, Koop D, Freire J, Silva CT. Querying and creating visualizations by analogy. IEEE Trans Vis Comput Graph. 2007;13(6):1560–67.CrossRefGoogle Scholar
  13. 13.
    Silva CT, Anderson E, Santos E, Freire J. Using VisTrails and provenance for teaching scientific visualization. Comput Graphics Forum. 2011;30(1): 75–84.CrossRefGoogle Scholar
  14. 14.
    Van Der Aalst WMP, Ter Hofstede AHM, Weske M. Business process management: a survey. In: Business Process Management. Springer; 2003. p. 1–2.Google Scholar
  15. 15.
    Walker E, Guiang C. Challenges in executing large parameter sweep studies across widely distributed computing environments. In: Proceedings of the 5th IEEE Workshop on Challenges of Large Applications in Distributed Environments; 2007. p. 11–8.Google Scholar
  16. 16.
    Zhao Y, Foster I. Scientific workflow systems for 21st century, new bottle or new wine. In: IEEE Workshop on Scientific Workflows; 2008.Google Scholar
  17. 17.
    Zhou W, Mapara S, Ren Y, Li Y, Haeberlen A, Ives Z, Loo BT, Sherr M. Distributed time-aware provenance. Proc VLDB Endow. 2012;6(2):49–60.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of Massachusetts DartmouthDartmouthUSA
  2. 2.Federal University of Rio de JaneiroRio de JaneiroBrazil
  3. 3.NYU Tandon School of EngineeringBrooklynUSA
  4. 4.NYU Center for Data ScienceNew YorkUSA
  5. 5.New York UniversityNew YorkUSA