Advertisement

Automation of Network-Based Scientific Workflows

  • M. A. Vouk
  • I. Altintas
  • R. Barreto
  • J. Blondin
  • Z. Cheng
  • T. Critchlow
  • A. Khan
  • S. Klasky
  • J. Ligon
  • B. Ludaescher
  • P. A. Mouallem
  • S. Parker
  • N. Podhorszki
  • A. Shoshani
  • C. Silva
Part of the IFIP The International Federation for Information Processing book series (IFIPAICT, volume 239)

Abstract

Comprehensive, end-to-end, data and workflow management solutions are needed to handle the increasing complexity of processes and data volumes associated with modern distributed scientific problem solving, such as ultrascale simulations and high-throughput experiments. The key to the solution is an integrated network-based framework that is functional, dependable, faulttolerant, and supports data and process provenance. Such a framework needs to make development and use of application workflows dramatically easier so that scientists’ efforts can shift away from data management and utility software development to scientific research and discovery. An integrated view of these activities is provided by the notion of scientific workflows - a series of structured activities and computations that arise in scientific problem-solving. An information technology framework that supports scientific workflows is the Ptolemy II based environment called Kepler. This paper discusses the issues associated with practical automation of scientific processes and workflows and illustrates this with workflows developed using the Kepler framework and tools.

Keywords

Failure Probability Service Orient Architecture Backup Service Storage Resource Broker Redundant Service 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    D. Georgakopoulos, M. Hornick, and A. Sheth, “An Overview of Workflow Management: From Process Modeling to Workflow Automation Infrastructure,” Distributed and Parallel Databases, Vol. 3(2), April 1995.Google Scholar
  2. 2.
    “Kepler Project” Website, 2006. http://www.kepler-project.org
  3. 3.
    TRIANA Project, October 2006, http://www.trianacode.org/
  4. 4.
    Windows Workflow Foundation (http://www.msdn2.microsoft.com/en-us/netframework/aa6 63328.aspx URL ) http://www.microsoft.com/windowsserv ersystem/virtualserver/default.mspx
  5. 5.
    B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience, 18(10): 1039–1065, 2006.CrossRefGoogle Scholar
  6. 6.
    B. Ludäscher and C. A. Goble. “Guest Editors: Introduction to the Special Section on Scientific Workflows.” SIGMOD Record, 34(3), 2005.Google Scholar
  7. 7.
    R. Mount et al., Department of Energy, Office of Science report, “Data Management Challenge”. Nov 2004, http://www.er.doe.gov/ascr/Finalreport-v26.pdf
  8. 8.
    Altintas, S. Bhagwanani, D. Buttler, S. Chandra, Z. Cheng, M. Coleman, T. Critchlow, A. Gupta, W. Han, L. Liu, B. Ludäscher, C. Pu, R. Moore, A. Shoshani, and M. Vouk, “A Modeling and Execution Environment for Distributed Scientific Workflows”, demonstration track, 15th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Boston, Massachussets, 2003.Google Scholar
  9. 9.
    R.I. Balay, Vouk M.A., Perros H., “Performance of Network-Based Problem-Solving Environments,” Chapter 18, in Enabling Technologies for Computational Science Frameworks, Middleware and Environments, editors Elias N. Houstis, John R. Rice, Efstratios Gallopoulos, Randall Bramley, Hardbound, ISBN 0-7923-7809-1, 2000Google Scholar
  10. 10.
    M.A Vouk., and M.P. Singh, “Quality of Service and Scientific Workflows,” in The Quality of Numerical Software: Assessment and Enhancements, editor: R. Boisvert, Chapman & Hall, pp. 77–89, 1997.Google Scholar
  11. 11.
    M.P. Singh, Vouk M.A., “Scientific workflows: scientific computing meets transactional workflows,” Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions, Univ. Georgia, Athens, GA, USA; 1996, pp. SUPL 28–34.Google Scholar
  12. 12.
    “The Ptolemy II Project” website, 2005. http://www.ptolemy.eecs.berkeley.edu/ptolemyll/
  13. 13.
    S.E. Elmaghraby, “On generalized activity networks,” J. Ind. Eng., Vol. 17, 621–631, 1966.Google Scholar
  14. 14.
    R.L. Dennis, D.W. Byun, J.H. Novak, K.J. Galluppi, C.C. Coats, and M.A. Vouk, “The Next Generation of Integrated Air Quality Modeling: EPA’s Models-3,” Atmospheric Environment, Vol 30(12), pp 1925–1938, 1996.CrossRefGoogle Scholar
  15. 15.
    J.C. Laprie, and C. Beounes, “Definition and Analysis of Hardware-and Software-Fault-Tolerant Architectures”, IEEE Computer Society Press, Volume 23, Issue 7, Pages: 39–51, July 1990.Google Scholar
  16. 16.
    D.F. McAllister, and M.A. Vouk, “Software Fault-Tolerance Engineering,” Chapter 14 in Handbook of Software Reliability Engineering, McGraw Hill, pp. 567–614, January 1996.Google Scholar
  17. 17.
    ACME Laboratories, “Web Servers Comparison” http://www.acme.com/software/thttpd/benchmarks.html, 1998.
  18. 18.
    Iyengar, A.; MacNair, E.; Nguyen, T., “An analysis of Web server performance”. Global Telecommunications Conference, 1997. GLOBECOM’ 97., IEEE Volume 3, 3-8 Nov. 1997 Page(s): 1943–1947 vol.3Google Scholar
  19. 19.
    Lloyd Ian, “Government website failure — Is it so shocking?” March 06, http://www.webstandards.org/2006/03/31/government-web-site-failure-is-it-so-shocking-2/
  20. 20.
    “OASIS UDDI”, OASIS Open website 2005 http://www.uddi.org
  21. 21.
    “Strikelron Web Services Business Directory”, Strikelron Inc. 2005. http://www.strikeiron.com
  22. 22.
    “Apache Web Services Project: jUDDI” website. 2005 http://www.ws.apache.org/juddi/
  23. 23.
    “Soap UDDI Project” website, 2005. http://www.soapuddi.sourceforge.net/
  24. 24.
    Elias N. Houstis, John R. Rice, Efstratios Gallopoulos, Randall Bramley, “Enabling Technologies for Computational Science Frameworks, Middleware and Environments”, Hardbound, ISBN 0-7923-7809-1, 2000Google Scholar
  25. 25.
    Crnkovic and M. Larsson (editors), Building Reliable Component-Based Software Systems, Artech House Publishers, ISBN 1-58053-327-2, 2002, http://www.idt.mdh.se/cbse-book/
  26. 26.
    Common Component Architecture Forum, http://www.cca-forum.org/, accessed February 2006
  27. 27.
    Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton, “Software Rejuvenation: Analysis, Module and Applications”, in Proc. of 25th Symposium on Fault Tolerant Computing, FTCS-25, pages 381–390, Pasadena, California, June 1995.Google Scholar
  28. 28.
    K. Vaidyanathan; Trivedi, K.S. “A comprehensive model for software rejuvenation”. IEEE Transactions on Dependable and Secure Computing, Volume 2, Issue 2, April–June 2005 Page(s): 124–137CrossRefGoogle Scholar
  29. 29.
    S.E. Elmaghraby, Baxter E.I., and Vouk M.A., “An Approach to the Modeling and Analysis of Software Production Processes,” Intl. Trans. Operational Res., Vol. 2(1), pp. 117–135, 1995.MATHCrossRefGoogle Scholar
  30. 30.
    G. Chin. Jr., Leung, L.R., Schuchardt, K.L., and Gracio, D.K. (2002). New Paradigms in Collaborative Problem Solving Environments for Scientific Computing. In Proceeding of the 2002 International Conference of Intelligent User Interfaces (IUI 2002), (Jan. 13-16, San Francisco, CA). ACM Press, New York.Google Scholar
  31. 31.
    MA Mouk,“Software Reliability Engineering of Numerical Systems,” Chapter13, in Accuracy and Reliability in Scientific Computing, Editor: Bo Einarsson, ISBN 0-89871-584-9, SIAM, 2005, pp. 205–231 [PDF — Draft]Google Scholar
  32. 32.
    Cooperative Computing Lab at the University of Notre Dame (rs http://www.cse.nd.edu/~ccl/software/ftsh/ URL )Google Scholar
  33. 33.
    M.R. Lyu (ed.), Software Fault Tolerance, Trends-in-Software Book Series, Wiley, 1994Google Scholar
  34. 34.
    J.D. Musa, “Operational Profiles in Software-Reliability Engineering, IEEE Software, vol. 10, no. 2, pp. 14–32, Mar. 1993.CrossRefGoogle Scholar
  35. 35.
    M. Vouk, R.L. Klevans, and D.L. Bitzer, “Workflow and End-User Quality of Service Issues in Web-Based Education,” IEEE Trans. On Knowledge Engineering, to Vol 11(4), July/August 1999, pp. 673–687.Google Scholar
  36. 36.
    Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, January 2003, http://www.nsf.gov/od/oci/reports/atkins.pdf
  37. 37.
    Department of Energy, Office of Science, “Data Management Report”. May 2004, http://www.ultralight.caltech.edu/gaeweb/portaVmisc/2005/05DMW/Finalreport.pdf
  38. 38.
    Edward A. Lee and Steve Neuendorffer. MoML — A Modeling Markup Language in XML — Version 0.4. Technical report, University of California at Berkeley, March, 2000.Google Scholar
  39. 39.
    International Provenance and Annotation Workshop (IPAW’06), Chicago, Illinois, May 3–5, 2006, http://www.ipaw.info/ipaw06/
  40. 40.
    Simmhan, Y. L., Plaie, B., Gannon, D., A survey of data provenance in e-science. In SIGMOD Rec. 34(3): 31–36, 2005CrossRefGoogle Scholar
  41. 41.
    Altinats, L, Barney O., Jaeger-Frank, E. “Provenance Collection Support in Kepler Scientific Workflow System,” Proc. of the IPAW’06, http://www.ipaw.info/ipaw06/proceedings/CameraReady_s5_2.pdf
  42. 42.
    Foster, L, Voeckler, J., Wilde, M., Zhao, Y., “Chimera, A Virtual Data System for Representing, Querying, and Automating Data Derivation,” In Proceedings of the 14th Conference on Scientific and Statistical Database Management, 2002Google Scholar
  43. 43.
    Greenwood, M., Goble, C, Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T., “Provenance of e-Science Experiments–experience from Bioinformatics,” In Proceedings of The UK OST e-Science second All Hands Meeting 2003 (AHM’03)Google Scholar
  44. 44.
    Groth, P., Luck, M., Moreau, L. “A protocol for recording provenance in service-oriented grids,” In Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS’04), 2004Google Scholar
  45. 45.
    Bavoil, L., Callahan, S., Crossno, P., Freire, J., Scheidegger, C, Silva, C, and Vo, H., “Vistrails: Enabling interactive multipleview visualizations.” In IEEE Visualization 2005, pages 135–142, 2005Google Scholar
  46. 46.
    Some examples of open source scientific workflow solutions: BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, Grid Job Handler, GRMS (GridLab Resource Management System), GWFE (Gridbus Workflow Engine), GWES (Grid Workflow Execution Service), IT Innovation Enactment Engine, JIGSA, Kepler, Karajan, OSWorkflow, Pegasus (uses DAGMan), ScyFLOW, SDSC Matrix, SHOP2, Taverna, Triana, wftk, YAWL Engine, WebAndFlo, WFEE, etc. see http://www.gridworkflow.org/snips/gridworkflow/space/Workflow+Engines,http://www.extreme.indiana.edu/swf-survey/
  47. 47.
    Win Bausch, Cesare Pautasso, Reto Schaeppi, Gustavo Alonso, “BioOpera: Cluster-Aware Computing,” CLUSTER 2002, pp. 99–106Google Scholar
  48. 48.
    Claus Hagen, Gustavo Alonso, “Flexible Exception Handling in the OPERA Process Support System,” ICDCS 1998, pp. 526–533Google Scholar
  49. 49.
    Fabio Casati, Stefano Ceri, Stefano Paraboschi, and Giuseppe Pozzi, “Specification and Implementation of Exceptions in Workflow Management Systems,” ACM Transactions on Database Systems 24(3), Sept. 1999Google Scholar
  50. 50.
    Service Oriented Architecture (SOA), Wikipedia, 2006 (http://www.en.wikipedia.org/wiki/Service-oriented_architecture, also http://www.306.ibm.com/software/solutions/soa/, and references therein.
  51. 51.
    OASIS, http://www.oasis-open.org/ (e.g., BPEL)
  52. 52.
  53. 53.
    Web Services standards at http://www.w3.org/TR (e.g., WSDL and similar).
  54. 54.
    KEPLER provenance framework at http://www.keplerproject.org/Wiki.jsp ?page=KeplerProvenanceFramework
  55. 55.
  56. 56.
    J. Salas, F. Perez, M. Patia-Martinez, R. Jiminez-Peris, “WS-Replication: A Framework for Highly Available Web Services,” WWW Conf., Edinburgh, Scotland, May 2006.Google Scholar
  57. 57.
    J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger and H. T. Vo, “Managing Rapidly-Evolving Scientific Workflows,” International Provenance and Annotation Workshop (IPAW), LNCS 4145, pages 10–18, 2006. Springer.Google Scholar
  58. 58.
    C. Scheidegger, D. Koop, E. Santos, H. Vo, S. Callahan, J. Freire, and C. Silva. “Tackling the Provenance Challenge One Layer at a Time,” submitted to Concurrency And Computation: Practice And Experience. (Special issue on the first Provenance Challenge.)Google Scholar
  59. 59.
    Grid Account Management Architecture (http://www.grid-devel.sdsc.edu/gridsphere/gridsphere?cid=gama), SDSC, 2005, and Mueller, GEON, 2006 (http://www.geongrid.org/presentations/webcasts/Mueller_GAMA_GEON_May 06.ppt)
  60. 60.
  61. 61.
    Taverna Project Website (http://www.taverna.sourceforge.net/)
  62. 62.
    SciRUN (rs http://www.software.sci.utah.edu/scirun.html/ URL )Google Scholar
  63. 63.
    Ensight (rs http://www.ensight.com/home/index.php URL )Google Scholar
  64. 64.
    Virtual Computing Laboratory (VCL)–http://www.vcl.ncsu.edu

Copyright information

© International Federation for Information Processing 2007

Authors and Affiliations

  • M. A. Vouk
    • 1
  • I. Altintas
    • 2
  • R. Barreto
    • 3
  • J. Blondin
    • 4
  • Z. Cheng
    • 1
  • T. Critchlow
    • 5
  • A. Khan
    • 6
  • S. Klasky
    • 3
  • J. Ligon
    • 1
  • B. Ludaescher
    • 7
  • P. A. Mouallem
    • 1
  • S. Parker
    • 6
  • N. Podhorszki
    • 7
  • A. Shoshani
    • 8
  • C. Silva
    • 6
  1. 1.Department of Computer ScienceNorth Carolina State UniversityRaleighUSA
  2. 2.San Diego Supercomputing CenterUniversity of CaliforniaLa JollaUSA
  3. 3.Oak Ridge National LaboratoryOak RidgeUSA
  4. 4.Department of PhysicsNorth Carolina State UniversityRaleighUSA
  5. 5.Center for Applied Scientific ComputingLawrence Livermore National LaboratoryLivermoreUSA
  6. 6.Department of Computer ScienceUniversity of UtahSalt Lake CityUSA
  7. 7.Department of Computer ScienceUniversity of California DavisDavisUSA
  8. 8.Computing Research DivisionLawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations