Skip to main content

Provenance Support for Data-Intensive Scientific Workflows

  • Chapter
  • First Online:
Grid and Cloud Database Management
  • 1123 Accesses

Abstract

Data-intensive workflows process and produce large volumes of data. The volume of data, number of workflow participants and activities may range from small to large numbers. The traditional way of logging experimental process is no longer valid. This has resulted in a need for techniques to automatically collect information on workflows known as provenance. Several solutions for e-Science provenance have been proposed but these are predominantly domain and application specific. In this chapter, the requirements of e-Science provenance systems are first clearly defined, and then a novel solution named the Vienna e-Science Provenance System (VePS) that satisfies these requirements is proposed. The VePS not only promises to be light weight, workflow enactment engine, domain and application independent, but it also measures the significance of workflow parameters using the Ant Colony Optimization meta-heuristic technique. Major contributions include: (1) interoperable provenance system, (2) quantification of parameters significance, and (3) generation of executable workflow documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hochbaum, D.S. (ed.): Approximation Algorithms for NP-Hard Problems. Course Technology, Florence (1996). ISBN: 978-0534949686

    Google Scholar 

  2. Azeez, A.: Axis2 popularity exponentially increasing. http://afkham.org/2008/08/axis2-popularity-exponentially.html (URL)

  3. Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey, pp. 1–28 (2005)

    Google Scholar 

  4. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F., Cowan, J.: Extensible markup language (XML) 1.1 (2004)

    Google Scholar 

  5. Buneman, P., Khanna, S., Tan, W.C.: Why and where: A characterization of data provenance, pp. 316–330. LNCS, London (2001)

    Google Scholar 

  6. Carole, G., Robert, S., et al.: Data curation + process curation=data integration + science. Brief Bioinform. 6, 506–517 (2008)

    Google Scholar 

  7. Deelman, E., Chervenak, A.: Data management challenges of data-intensive scientific workflows (2008)

    Google Scholar 

  8. Deelman, E., Taylor, I.: Special issue on scientific workflows. J. Grid Comput. 3–4, 151–151 (2005)

    Article  Google Scholar 

  9. Donoho, D.L., Maleki, A., et al.: Reproducible research in computational harmonic analysis, pp. 8–18 (2009)

    Google Scholar 

  10. Dorigo, M., Sttzle, T.: Ant colony optimization. MIT, MA (2004)

    Book  MATH  Google Scholar 

  11. Fox, G., Gannon, D.: Workflow in grid systems. pp. 1009–1019 (2006)

    Google Scholar 

  12. Greenwood, M., Goble, C., et al.: Provenance of e-Science Experiments – Experience from Bioinformatics, pp. 223–226 (2003)

    Google Scholar 

  13. Groth, P., Luck, M., Moreau, L.: Formalising a protocol for recording provenance in grids, pp. 147–154 (2004)

    Google Scholar 

  14. Jayasinghe, D.: Quickstart Apache Axis2: A practical guide to creating quality web services. Packt Publishing (2008)

    Google Scholar 

  15. Khan, F.A., Han, Y., Pllana, S., Brezany, P.: Provenance support for grid-enabled scientific workflows, pp. 173–180. IEEE, Beijing, (2008)

    Google Scholar 

  16. Khan, F.A., Han, Y., Pllana, S., Brezany, P.: Estimation of parameters sensitivity for scientific workflows. In: Proceedings of International Conference on ICPP, Vienna, Austria. IEEE Computer Society (2009)

    Google Scholar 

  17. Khan, F.A., Han, Y., Pllana, S., Brezany, P.: An ant-colony-optimization based approach for determination of parameter significance of scientific workflows, pp. 1241–1248 (2010)

    Google Scholar 

  18. Lord, P., Macdonald, A., Lyon, L., Giaretta, D.: From data deluge to data curation, pp. 371–375 (2004)

    Google Scholar 

  19. Ludaescher, B., Goble, C.: Special section on scientific workflows. SIGMOD Rec. 3, 1–2 (2005)

    Google Scholar 

  20. Moreau, L., Foster, I.: Provenance and annotation of data. In: International Provenance and Annotation Workshop, LNCS. Springer, Berlin (2006)

    Google Scholar 

  21. Moreau, L., Clifford, B., et. al. The Open Provenance Model Core Specification (v1.1). Future Generation Computer Systems, New York (2010)

    Google Scholar 

  22. Muehlen, M.Z.: Volume versus variance: Implications of data-intensive workflows (2009)

    Google Scholar 

  23. OASIS: The WS-BPEL 2.0 specification. http://www.oasis-open.org/committees/download.php/23964/wsbpel-v2.0-primer.htm (2007)

  24. Rajbhandari, S., Walker, D.W.: Incorporating provenance in service oriented architecture, pp. 33–40. IEEE Computer Society, USA (2006)

    Google Scholar 

  25. Rusbridge, C., Burnhill, P., Ross, S. et al.: The digital curation centre: A vision for digital curation, pp. 31–41 (2005). doi: http://doi.ieeecomputersociety.org/10.1109/LGDI.2005.1612461

  26. Schroeder, R.: e-Sciences as research technologies: reconfiguring disciplines, globalizing knowledge. Soc. Sci. Inf. Surles Sci. Sociales 2, 131–157 (2008). doi: 10.1177/ 0539018408089075

    Google Scholar 

  27. Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-Science, pp. 31–36 (2005)

    Google Scholar 

  28. Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: Provenance management for data-driven workflows. Int. J. Web Service Res. 2, 1–22 (2008)

    Article  Google Scholar 

  29. Stevens, R.D., Tipney, H.J., Wroe, C.J., et al.: Exploring Williams-Beuren syndrome using myGrid. In: In Proceedings of 12th International Conference on Intelligent Systems in Molecular Biology (2003)

    Google Scholar 

  30. Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services, pp. 603–620 (2003)

    Google Scholar 

  31. Talbi, E.G.: Metaheuristics: From design to implementation (Wiley Series on Parallel and Distributed Computing). Wiley, NY (2009). http://www.amazon.com/Metaheuristics-Design-Implementation-El-Ghazali-Talbi/dp/0470278587

  32. Tan, W.C.: Research problems in data provenance, pp. 45–52 (2004)

    Google Scholar 

  33. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific workflows for grid. Springer, Berlin (2006)

    Google Scholar 

  34. Uri, B., Avraham, S., Margo, S.: Securing provenance, pp. 1–5. USENIX Association, CA, (2008)

    Google Scholar 

  35. Zhao, J., Goble, C., Greenwood, M., Wroe, C., Stevens, R.: Annotating, linking and browsing provenance logs for e-Science, pp. 158–176 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fakhri Alam Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Khan, F.A., Brezany, P. (2011). Provenance Support for Data-Intensive Scientific Workflows. In: Fiore, S., Aloisio, G. (eds) Grid and Cloud Database Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20045-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20045-8_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20044-1

  • Online ISBN: 978-3-642-20045-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics