Skip to main content

Scientific Workflows in the Cloud

  • Chapter

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

The development of cloud computing has generated significant interest in the scientific computing community. In this chapter we consider the impact of cloud computing on scientific workflow applications. We examine the benefits and drawbacks of cloud computing for workflows, and argue that the primary benefit of cloud computing is not the economic model it promotes, but rather the technologies it employs and how they enable new features for workflow applications. We describe how clouds can be configured to execute workflow tasks and present a case study that examines the performance and cost of three typical workflow applications on Amazon EC2. Finally, we identify several areas in which existing clouds can be improved and discuss the future of workflows in the cloud.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amazon.com: Amazon web services (aws). http://aws.amazon.com

  2. Amazon.com: Elastic block store (ebs). http://aws.amazon.com/ebs

  3. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: a Berkeley view of cloud computing. Tech. rep., UC Berkeley (2009)

    Google Scholar 

  4. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (2003)

    Google Scholar 

  5. Bayucan, A., Henderson, R.L., Lesiak, C., Mann, B., Proett, T., Tweten, D.: Portable batch system: external reference specification. Tech. rep., MRJ Technology Solutions (1999)

    Google Scholar 

  6. Berriman, B., Bergou, A., Deelman, E., Good, J., Jacob, J., Katz, D., Kesselman, C., Laity, A., Singh, G., Su, M.H., Williams, R.: Montage: a grid-enabled image mosaic service for the NVO. In: Astronomical Data Analysis Software and Systems (ADASS) XIII (2003)

    Google Scholar 

  7. Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Su, M.H., Vahi, K.: Characterization of scientific workflows. In: Proceedings of the 3rd Workshop on Workflows in Support of Large-Scale Science (WORKS’08) (2008)

    Google Scholar 

  8. Bruneman, P., Khanna, S., Tan, W.C.: Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory (2001)

    Google Scholar 

  9. Center, S.C.E.: Community modeling environment. http://www.scec.org/cme/

  10. Chase, J.S., Irwin, D.E., Grit, L.E., Moore, J.D., Sprenkle, S.E.: Dynamic virtual clusters in a grid site manager. In: 12th IEEE International Symposium on High Performance Distributed Computing (HPDC’03) (2003)

    Google Scholar 

  11. Corral. http://pegasus.isi.edu/corral/latest

  12. Dagman (directed acyclic graph manager). http://cs.wisc.edu/condor/dagman

  13. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2008)

    Article  Google Scholar 

  14. Deelman, E., Livny, M., Mehta, G., Pavlo, A., Singh, G., Su, M.H., Vahi, K., Wenger, R.K.: Pegasus and DAGMan from Concept to Execution: Mapping Scientific Workflows Onto Today’s Cyberinfrastructure, pp. 56–74. IOS, Amsterdam (2008)

    Google Scholar 

  15. Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (2008)

    Google Scholar 

  16. Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)

    Google Scholar 

  17. Foster, I., Freeman, T., Keahey, K., Scheftner, D., Sotomayer, B., Zhang, X.: Virtual clusters for grid communities. In: Proceedings of the 6th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06) (2006)

    Google Scholar 

  18. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)

    Article  Google Scholar 

  19. Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional grids. In: 10th International Symposium on High Performance Distributed Computing (2001)

    Google Scholar 

  20. Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st International Symposium on Cluster Computing and the Grid (2001)

    Google Scholar 

  21. Gilbert, L., Tseng, J., Newman, R., Iqbal, S., Pepper, R., Celebioglu, O., Hsieh, J., Cobban, M.: Performance implications of virtualization and hyper-threading on high energy physics applications in a grid environment. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) (2005)

    Google Scholar 

  22. Glidein. http://www.cs.wisc.edu/condor/glidein

  23. Groth, P., Deelman, E., Juve, G., Mehta, G., Berriman, B.: Pipeline-centric provenance model. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS’09) (2009)

    Google Scholar 

  24. Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: Proceedings of the 3rd International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES’08) (2008)

    Google Scholar 

  25. Inc., G.: Glusterfs. http://www.gluster.org

  26. Inc., H.: Cloudstatus. http://www.cloudstatus.com

  27. Inc., P.: Panasas. http://www.panasas.com

  28. Juve, G., Deelman, E.: Resource provisioning options for large-scale scientific workflows. In: Proceedings of the 3rd International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES’08) (2008)

    Google Scholar 

  29. Juve, G., Deelman, E., Vahi, K., Mehta, G.: Experiences with resource provisioning for scientific workflows using Corral. Sci. Program. 18(2), 77–92 (2010)

    Google Scholar 

  30. Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B.P., Maechling, P.: Scientific workflow applications on Amazon EC2. In: Workshop on Cloud-based Services and Applications in Conjunction with 5th IEEE International Conference on e-Science (e-Science’09) (2009)

    Google Scholar 

  31. Keahey, K., Freeman, T.: Contextualization: providing one-click virtual clusters. In: Proceedings of the 4th International Conference on eScience (eScience’08) (2008)

    Google Scholar 

  32. Kee, Y., Kesselman, C., Nurmi, D., Wolski, R.: Enabling personal clusters on demand for batch resources using commodity software. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’08) (2008)

    Google Scholar 

  33. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11), 1851–1858 (2008)

    Article  Google Scholar 

  34. Ligon, W.B., Ross, R.B.: Implementation and performance of a parallel file system for high performance distributed applications. In: Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing (1996)

    Google Scholar 

  35. Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems (1988)

    Google Scholar 

  36. Microsystems, S.: Lustre. http://www.lustre.org

  37. National center for supercomputing applications (ncsa). http://www.ncsa.illinois.edu

  38. Open science grid. http://www.opensciencegrid.org

  39. Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science grids: a viable solution? In: International Workshop on Data-Aware Distributed Computing (2008)

    Google Scholar 

  40. Pegasus workflow management system. http://pegasus.isi.edu

  41. Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., Wilde, M.: Falkon: a fast and light-weight task execution framework. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (2007)

    Google Scholar 

  42. Sapuntzakis, C., Brumley, D., Chandra, R., Zeldovich, N., Chow, J., Lam, M., Rosenblum, M.: Virtual appliances for deploying and maintaining software. In: Proceedings of the 17th USENIX Conference on System Administration (2003)

    Google Scholar 

  43. Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (2002)

    Google Scholar 

  44. San Diego Supercomputing Center (sdsc). http://www.sdsc.edu

  45. Singh, G., Kesselman, C., Deelman, E.: Performance impact of resource provisioning on workflows. Tech. rep., University of Southern California, Information Sciences Institute (2005)

    Google Scholar 

  46. Singh, G., Kesselman, C., Deelman, E.: A provisioning model and its comparison with best-effort for performance-cost optimization in grids. In: Proceedings of the 16th International Symposium on High Performance Distributed Computing (HPDC’07) (2007)

    Google Scholar 

  47. Sotomayor, B., Childers, L.: Globus Toolkit 4 Programming Java Services. Elsevier/Morgan Kaufmann, Amsterdam (2006)

    Google Scholar 

  48. Teragrid. http://www.teragrid.org/

  49. Youseff, L., Seymour, K., You, H., Dongarra, J., Wolski, R.: The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing (2008)

    Google Scholar 

  50. Yu, W., Vetter, J.S.: Xen-based HPC: a parallel I/O perspective. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid’08) (2008)

    Google Scholar 

Download references

Acknowledgements

We acknowledge the contributions of Karan Vahi, Gaurang Mehta, Phil Maechling, Benjamin P. Berman, and Bruce Berriman. This work was supported by the National Science Foundation under the SciFlow (CCF-0725332) grant. This research made use of Montage, funded by the National Aeronautics and Space Administration’s Earth Science Technology Office, Computation Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gideon Juve .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Juve, G., Deelman, E. (2011). Scientific Workflows in the Cloud. In: Cafaro, M., Aloisio, G. (eds) Grids, Clouds and Virtualization. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-049-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-049-6_4

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-048-9

  • Online ISBN: 978-0-85729-049-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics