Abstract
The development of cloud computing has generated significant interest in the scientific computing community. In this chapter we consider the impact of cloud computing on scientific workflow applications. We examine the benefits and drawbacks of cloud computing for workflows, and argue that the primary benefit of cloud computing is not the economic model it promotes, but rather the technologies it employs and how they enable new features for workflow applications. We describe how clouds can be configured to execute workflow tasks and present a case study that examines the performance and cost of three typical workflow applications on Amazon EC2. Finally, we identify several areas in which existing clouds can be improved and discuss the future of workflows in the cloud.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amazon.com: Amazon web services (aws). http://aws.amazon.com
Amazon.com: Elastic block store (ebs). http://aws.amazon.com/ebs
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: a Berkeley view of cloud computing. Tech. rep., UC Berkeley (2009)
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (2003)
Bayucan, A., Henderson, R.L., Lesiak, C., Mann, B., Proett, T., Tweten, D.: Portable batch system: external reference specification. Tech. rep., MRJ Technology Solutions (1999)
Berriman, B., Bergou, A., Deelman, E., Good, J., Jacob, J., Katz, D., Kesselman, C., Laity, A., Singh, G., Su, M.H., Williams, R.: Montage: a grid-enabled image mosaic service for the NVO. In: Astronomical Data Analysis Software and Systems (ADASS) XIII (2003)
Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Su, M.H., Vahi, K.: Characterization of scientific workflows. In: Proceedings of the 3rd Workshop on Workflows in Support of Large-Scale Science (WORKS’08) (2008)
Bruneman, P., Khanna, S., Tan, W.C.: Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory (2001)
Center, S.C.E.: Community modeling environment. http://www.scec.org/cme/
Chase, J.S., Irwin, D.E., Grit, L.E., Moore, J.D., Sprenkle, S.E.: Dynamic virtual clusters in a grid site manager. In: 12th IEEE International Symposium on High Performance Distributed Computing (HPDC’03) (2003)
Dagman (directed acyclic graph manager). http://cs.wisc.edu/condor/dagman
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2008)
Deelman, E., Livny, M., Mehta, G., Pavlo, A., Singh, G., Su, M.H., Vahi, K., Wenger, R.K.: Pegasus and DAGMan from Concept to Execution: Mapping Scientific Workflows Onto Today’s Cyberinfrastructure, pp. 56–74. IOS, Amsterdam (2008)
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (2008)
Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
Foster, I., Freeman, T., Keahey, K., Scheftner, D., Sotomayer, B., Zhang, X.: Virtual clusters for grid communities. In: Proceedings of the 6th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06) (2006)
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)
Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional grids. In: 10th International Symposium on High Performance Distributed Computing (2001)
Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st International Symposium on Cluster Computing and the Grid (2001)
Gilbert, L., Tseng, J., Newman, R., Iqbal, S., Pepper, R., Celebioglu, O., Hsieh, J., Cobban, M.: Performance implications of virtualization and hyper-threading on high energy physics applications in a grid environment. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) (2005)
Groth, P., Deelman, E., Juve, G., Mehta, G., Berriman, B.: Pipeline-centric provenance model. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS’09) (2009)
Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: Proceedings of the 3rd International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES’08) (2008)
Inc., G.: Glusterfs. http://www.gluster.org
Inc., H.: Cloudstatus. http://www.cloudstatus.com
Inc., P.: Panasas. http://www.panasas.com
Juve, G., Deelman, E.: Resource provisioning options for large-scale scientific workflows. In: Proceedings of the 3rd International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES’08) (2008)
Juve, G., Deelman, E., Vahi, K., Mehta, G.: Experiences with resource provisioning for scientific workflows using Corral. Sci. Program. 18(2), 77–92 (2010)
Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B.P., Maechling, P.: Scientific workflow applications on Amazon EC2. In: Workshop on Cloud-based Services and Applications in Conjunction with 5th IEEE International Conference on e-Science (e-Science’09) (2009)
Keahey, K., Freeman, T.: Contextualization: providing one-click virtual clusters. In: Proceedings of the 4th International Conference on eScience (eScience’08) (2008)
Kee, Y., Kesselman, C., Nurmi, D., Wolski, R.: Enabling personal clusters on demand for batch resources using commodity software. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’08) (2008)
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11), 1851–1858 (2008)
Ligon, W.B., Ross, R.B.: Implementation and performance of a parallel file system for high performance distributed applications. In: Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing (1996)
Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems (1988)
Microsystems, S.: Lustre. http://www.lustre.org
National center for supercomputing applications (ncsa). http://www.ncsa.illinois.edu
Open science grid. http://www.opensciencegrid.org
Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science grids: a viable solution? In: International Workshop on Data-Aware Distributed Computing (2008)
Pegasus workflow management system. http://pegasus.isi.edu
Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., Wilde, M.: Falkon: a fast and light-weight task execution framework. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (2007)
Sapuntzakis, C., Brumley, D., Chandra, R., Zeldovich, N., Chow, J., Lam, M., Rosenblum, M.: Virtual appliances for deploying and maintaining software. In: Proceedings of the 17th USENIX Conference on System Administration (2003)
Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (2002)
San Diego Supercomputing Center (sdsc). http://www.sdsc.edu
Singh, G., Kesselman, C., Deelman, E.: Performance impact of resource provisioning on workflows. Tech. rep., University of Southern California, Information Sciences Institute (2005)
Singh, G., Kesselman, C., Deelman, E.: A provisioning model and its comparison with best-effort for performance-cost optimization in grids. In: Proceedings of the 16th International Symposium on High Performance Distributed Computing (HPDC’07) (2007)
Sotomayor, B., Childers, L.: Globus Toolkit 4 Programming Java Services. Elsevier/Morgan Kaufmann, Amsterdam (2006)
Teragrid. http://www.teragrid.org/
Youseff, L., Seymour, K., You, H., Dongarra, J., Wolski, R.: The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing (2008)
Yu, W., Vetter, J.S.: Xen-based HPC: a parallel I/O perspective. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid’08) (2008)
Acknowledgements
We acknowledge the contributions of Karan Vahi, Gaurang Mehta, Phil Maechling, Benjamin P. Berman, and Bruce Berriman. This work was supported by the National Science Foundation under the SciFlow (CCF-0725332) grant. This research made use of Montage, funded by the National Aeronautics and Space Administration’s Earth Science Technology Office, Computation Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Juve, G., Deelman, E. (2011). Scientific Workflows in the Cloud. In: Cafaro, M., Aloisio, G. (eds) Grids, Clouds and Virtualization. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-049-6_4
Download citation
DOI: https://doi.org/10.1007/978-0-85729-049-6_4
Publisher Name: Springer, London
Print ISBN: 978-0-85729-048-9
Online ISBN: 978-0-85729-049-6
eBook Packages: Computer ScienceComputer Science (R0)