Advertisement

The Lifecycle of Provenance Metadata and Its Associated Challenges and Opportunities

  • Paolo MissierEmail author
Conference paper
Part of the Springer Proceedings in Business and Economics book series (SPBE)

Abstract

This chapter outlines some of the challenges and opportunities associated with adopting provenance principles (Cheney et al., Dagstuhl Reports 2(2):84–113, 2012) and standards (Moreau et al., Web Semant. Sci. Serv. Agents World Wide Web, 2015) in a variety of disciplines, including data publication and reuse, and information sciences.

Keywords

Provenance analytics Provenance data modelling Provenance lifecycle 

References

  1. 1.
    Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. Proc. VLDB Endow. 5 (4), 346–357 (2011)CrossRefGoogle Scholar
  2. 2.
    Biton, O., Cohen-Boulakia, S., Davidson, S.B.: Zoom*UserViews: querying relevant provenance in workflow systems. In: VLDB, pp. 1366–1369 (2007)Google Scholar
  3. 3.
    Cadenhead, T., Khadilkar, V., Kantarcioglu, M., Thuraisingham, B.: Transforming provenance using redaction. In: Proceedings of the 16th ACM Symposium on Access Control Models and Technologies, SACMAT ’11, pp. 93–102. ACM, New York (2011)Google Scholar
  4. 4.
    Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in databases: why, how, and where. Found. Trends Databases 1, 379–474 (2009)CrossRefGoogle Scholar
  5. 5.
    Cheney, J., Missier, P., Moreau, L.: Constraints of the provenance data model. Technical Report (2012)Google Scholar
  6. 6.
    Cheney, J., Finkelstein, A., Ludaescher, B., Vansummeren, S.: Principles of provenance (Dagstuhl Seminar 12091). Dagstuhl Reports 2 (2), 84–113 (2012)Google Scholar
  7. 7.
    Cohen-Boulakia, S., Leser, U.: Search, adapt, and reuse: the future of scientific workflows. SIGMOD Rec. 40 (2), 6–16 (2011)CrossRefGoogle Scholar
  8. 8.
    Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of SIGMOD Conference, Tutorial, pp. 1345–1350 (2008)Google Scholar
  9. 9.
    Davidson, S., Cohen-Boulakia, S., Eyal, A., Ludäscher, B., McPhillips, T., Bowers, S., Anand, M.K., Freire, J.: Provenance in scientific workflow systems. In: Data Engineering Bulletin, vol. 30. IEEE, New York (2007)Google Scholar
  10. 10.
    Dey, S., Zinn, D., Ludäscher, B.: ProPub: towards a declarative approach for publishing customized, policy-aware provenance. In: Cushing, J.B., French, J., Bowers, S. (Eds.), Scientific and Statistical Database Management. Lecture Notes in Computer Science, vol. 6809, pp. 225–243. Springer, Berlin, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Firth, H., Missier, P.: ProvGen: generating synthetic PROV graphs with predictable structure. In: Proceedings of IPAW 2014 (Provenance and Annotations), Koln (2014)Google Scholar
  12. 12.
    Ghoshal, D., Plale, B.: Provenance from log files: a bigdata problem. In: Proceedings of BigProv Workshop on Managing and Querying Provenance at Scale (2013)CrossRefGoogle Scholar
  13. 13.
    Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40 (2007)Google Scholar
  14. 14.
    Hiden, H., Watson, P., Woodman, S., Leahy, D.: e-Science central: cloud-based e-Science and its application to chemical property modelling. Technical Report cs-tr-1227. School of Computing Science, Newcastle University (2011)Google Scholar
  15. 15.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C.A., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, 729–732 (2006)CrossRefGoogle Scholar
  16. 16.
    Katz, D.S.: Transitive credit as a means to address social and technological concerns stemming from citation and attribution of digital products. J. Open Res. Soft. 2 (1), e20 (2014)CrossRefGoogle Scholar
  17. 17.
    Kratz, J.E., Strasser, C.: Making data count. Nature Scientific Data 2, 150039 (2015)CrossRefGoogle Scholar
  18. 18.
    Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S., Zhao, J.: PROV-O: The PROV ontology. Technical Report (2012)Google Scholar
  19. 19.
    Lerner, B.S., Boose, E.R.: Collecting provenance in an interactive scripting environment. In: Proceedings of TAPP’14 (2014)Google Scholar
  20. 20.
    Lerner, B., Boose, E.: RDataTracker: collecting provenance in an interactive scripting environment. In: 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2014) (2014)Google Scholar
  21. 21.
    Lim, C., Lu, S., Chebotko, A., Fotouhi, F.: Prospective and retrospective provenance collection in scientific workflow environments. In: 2010 IEEE International Conference on Services Computing (SCC), pp. 449–456 (2010)Google Scholar
  22. 22.
    Lyle, J., Martin, A.: Trusted computing and provenance: better together. In: Proceedings of the 2nd Conference on Theory and Practice of Provenance, TAPP’10, Berkeley, CA, p. 1. USENIX Association, Berkeley, CA (2010)Google Scholar
  23. 23.
    Macko, P., Chiarini, M., Seltzer, M.: Collecting provenance via the Xen hypervisor. In: Freire, J., Buneman, P. (eds.) TAPP Workshop, Heraklion (2011)Google Scholar
  24. 24.
    Missier, P., Paton, N., Belhajjame, K.: Fine-grained and efficient lineage querying of collection-based workflow provenance. In: Proceedings of EDBT, Lausanne, Switzerland (2010)CrossRefGoogle Scholar
  25. 25.
    Missier, P., Sahoo, S.S., Zhao, J., Sheth, A., Goble, C.: Janus: from workflows to semantic provenance and linked open data. In: Proceedings of IPAW 2010, Troy, NY (2010)Google Scholar
  26. 26.
    Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: Gertz, M., Hey, T., Ludaescher, B. (eds.) Proceedings of SSDBM 2010, Heidelberg (2010)Google Scholar
  27. 27.
    Missier, P., Dey, S., Belhajjame, K., Cuevas, V., Ludaescher, B.: D-PROV: extending the PROV provenance model with workflow structure. In: Proceedings of TAPP’13, Lombard, IL (2013)Google Scholar
  28. 28.
    Missier, P., Woodman, S., Hiden, H., Watson, P.: Provenance and data differencing for workflow reproducibility analysis. Concurr. Comput. 28 (4), 995–1015 (2016)CrossRefGoogle Scholar
  29. 29.
    Missier, P., Bryans, J., Gamble, C., Curcin, V., Danger, R.: ProvAbs: model, policy, and tooling for abstracting PROV graphs. In: Proceedings of IPAW 2014 (Provenance and Annotations), Koln. Springer, Berlin (2014)Google Scholar
  30. 30.
    Mitchell, C., Mitchell, C., Mitchell, C.: Trusted computing. In: Chen, L., Mitchell, C.J., Martin, A. (eds.) Proceedings of Trust 2009, Oxford. Springer, Berlin (2005)Google Scholar
  31. 31.
    Moreau, L., Ludäscher, B., Altintas, I., Barga, R.S.: The first provenance challenge. Concurr. Comput. 20, 409–418 (2008)CrossRefGoogle Scholar
  32. 32.
    Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van Den Bussche, J.: The open provenance model—core specification (v1.1). Futur. Gener. Comput. Syst. 7 (21), 743–756 (2011)Google Scholar
  33. 33.
    Moreau, L., Hartig, O., Simmhan, Y., Myers, J., Lebo, T., Belhajjame, K., Miles, S.: PROV-AQ: provenance access and query. Technical Report (2012)Google Scholar
  34. 34.
    Moreau, L., Missier, P., Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., Groth, P., Klyne, G., Lebo, T., McCusker, J., Miles, S., Myers, J., Sahoo, S., Tilmes, C.: PROV-DM: the PROV data model. Technical Report. World Wide Web Consortium (2012)Google Scholar
  35. 35.
    Moreau, L., Missier, P., Cheney, J., Soiland-Reyes, S.: PROV-N: the provenance notation. Technical Report (2012)Google Scholar
  36. 36.
    Moreau, L., Groth, P., Cheney, J., Lebo, T., Miles, S.: The rationale of PROV. Web Semant. Sci. Serv. Agents World Wide Web 35, Part 4, 235–257 (2015)Google Scholar
  37. 37.
    Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Proceedings of IPAW’14 (2014)Google Scholar
  38. 38.
    PROV DC (2013). Available at http://www.w3.org/TR/prov-dc/
  39. 39.
    PROV Dictionary (2013). Available at http://www.w3.org/TR/prov-dictionary/ Google Scholar
  40. 40.
    PROV-Overview: An Overview of the PROV Family of Documents. Technical Report (2012)Google Scholar
  41. 41.
    PROV-XML (2013). Available at http://www.w3.org/TR/prov-xml/
  42. 42.
    Special Issue on Provenance, Data and Information Quality. J. Data Inf. Qual. 5 (3) (2015)Google Scholar
  43. 43.
    The Provenance Incubator Group Charter (2009). Available at http://www.w3.org/2005/Incubator/prov/charter
  44. 44.
    The Provenance Incubator Group Final Report (2010). Available at http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/
  45. 45.
    The ProvONE provenance model (2014). Available at http://tinyurl.com/ProvONE
  46. 46.
    Woodman, S., Hiden, H., Watson, P.: Workflow provenance: an analysis of long term storage costs. In: Proceedings of 10th WORKS workshop, Austin, TX (2015)Google Scholar
  47. 47.
    Zhang, J., Chapman, A., LeFevre, K.: Do you know where your datas been? tamper-evident database provenance. In: Jonker, W., Petkovic, M. (eds.) Secure Data Management. Lecture Notes in Computer Science, vol. 5776, pp. 17–32. Springer, Berlin/Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Computing ScienceNewcastle UniversityNewcastle upon TyneUK

Personalised recommendations