Recording and Reasoning over Data Provenance in Web and Grid Services

  • Martin Szomszor
  • Luc Moreau
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2888)


Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing computing infrastructures to supply dependable and consistent large-scale computational systems. This kind of architecture has been adopted by the business and scientific communities allowing them to exploit extensive and diverse computing resources to perform complex data processing tasks. In such systems, results are often derived by composing multiple, geographically distributed, heterogeneous services as specified by intricate workflow management. This leads to the undesirable situation where the results are known, but the means by which they were achieved is not. With both scientific experiments and business transactions, the notion of lineage and dataset derivation is of paramount importance since without it, information is potentially worthless. We address the issue of data provenance, the description of the origin of a piece of data, in these environments showing the requirements, uses and implementation difficulties. We propose an infrastructure level support for a provenance recording capability for service-oriented architectures such as the Grid and Web Services. We also offer services to view and retrieve provenance and we provide a mechanism by which provenance is used to determine whether previous computed results are still up to date.


Grid Service Business Process Execution Language Data Provenance Provenance Information Open Grid Service Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buneman, P., Deutsch, A., Tan, W.-C.: A deterministic model for semistructured data. In: Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats (1998)Google Scholar
  2. 2.
    Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: International Conference on Database Theory, ICDT (2001)Google Scholar
  3. 3.
    Buneman, P., Khanna, S., Tan, W.-C.: Computing provenance and annotations for views, Published at [18] (October 2002)Google Scholar
  4. 4.
    Curbera, F., Goland, Y., Klein, J., Leymann, F., Roller, D., Thatte, S., Weerawarana, S.: Business process execution language for web services, bpel4ws (2002),
  5. 5.
    de Roure, D., Jennings, N.R., Shadbolt, N.: The semantic grid: A future e-science infrastructure. International Journal of Concurrency and Computation: Practice and Experience (2003)Google Scholar
  6. 6.
    Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying and automating data derivation. In: Proceedings of the 14th Conference on Scientific and Statistical Database Management, Edinburgh, Scotland (July 2002)Google Scholar
  7. 7.
    Foster, I., Kesselman, C., Nick, J.M., Tuecke, S.: The Physiology of the Grid – An Open Grid Services Architecture for Distributed Systems Integration. Technical report, Argonne National Laboratory (2002)Google Scholar
  8. 8.
    Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications (2001)Google Scholar
  9. 9.
    Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The virtual data grid: A new model and architecture for data-intensive collaboration, Published at [18] (October 2002)Google Scholar
  10. 10.
    Frew, J., Bose, R.: Lineage issues for scientific data and information, Published at [18] (October 2002)Google Scholar
  11. 11.
    Grid computing environments working group at the global grid forum (November 2002),
  12. 12.
    Goble, C.: Position statement: Musings on provenance, workflow and (semantic web) annotations for bioinformatics, Published at [18] (October 2002)Google Scholar
  13. 13.
    Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-science experiments – experience from bioinformatics. In: Proceedings of the UK OSTe-Scienc e second All Hands Meeting 2003 (AHM 2003), p. 4, Nottingham, UK (September 2003)Google Scholar
  14. 14.
    Leyman, F.: Web Services Flow Language (WSFL). Technical report, IBM (May 2001)Google Scholar
  15. 15.
    Luck, M., McBurney, P., Preist, C.: Agent Technolgy: Enabling Next Generation Computing. AgentLink (2003)Google Scholar
  16. 16.
    Moreau, L., Miles, S., Goble, C., Greenwood, M., Dialani, V., Addis, M., Alpdemir, N., Cawley, R., De Roure, D., Ferris, J., Gaizauskas, R., Glover, K., Greenhalgh, C., Li, P., Liu, X., Lord, P., Luck, M., Marvin, D., Oinn, T., Paton, N., Pettifer, S., Radenkovic, M.V., Roberts, A., Robinson, A., Rodden, T., Senger, M., Sharman, N., Stevens, R., Warboys, B., Wipat, A., Wroe, C.: On the Use of Agents in a BioInformatics Grid. In: Lee, S., Sekguchi, S., Matsuoka, S., Sato, M. (eds.) Proceedings of the Third IEEE/ACM CCGRID 2003 Workshop on Agent Based Cluster and Grid Computing, Tokyo, Japan, May 2003, pp. 653–661 (2003)Google Scholar
  17. 17.
    Pearson, D.: Data requirements for the grid – scoping study report, Status Draft (February 2002)Google Scholar
  18. 18.
    Data provenance/derivation workshop (October 2002),
  19. 19.
    Saltz, J.: Data provenance, Published at [18] (October 2002)Google Scholar
  20. 20.
    Tan, H.K., Moreau, L.: Extending Execution Tracing for Mobile Code Security. In: Fischer, K., Hutter, D. (eds.) Second International Workshop on Security of Mobile MultiAgent Systems (SEMAS 2002), DFKI Research Report, RR-02-03, pp. 51–59, Bologna, Italy, DFKI Saarbrucken (June 2002)Google Scholar
  21. 21.
    Thatte, S.: Xlang, web services for business process design (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Martin Szomszor
    • 1
  • Luc Moreau
    • 1
  1. 1.School of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK

Personalised recommendations