Abstract
In this work, we investigate tools that enable dollar cost optimization of scientific simulations using commercial clouds. We present a framework, called CloudTracker, that transparently records information from a simulation that is executed in a commercial cloud so that it may be “replayed” exactly to reproduce its results. Using the automated CloudTracker provenance and replay facilities, scientists can choose either to store the results of a simulation or to reproduce it on-demand – whichever is more cost efficient in terms of the dollar cost charged for storage and computing by the commercial cloud provider. We present a prototype implementation of CloudTracker for the Amazon AWS commercial cloud and the StochSS stochastic simulation system. Using this prototype, we analyze the storage-versus-compute cost tradeoffs for different classes of StochSS simulations when deployed and executed in AWS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbadi, I.M., Lyle, J.: Challenges for provenance in cloud computing. In: USENIX Workshop on the Theory and Practice of Provenance (2011)
Amazon Public Datasets (2014). https://aws.amazon.com/datasets. Accessed 15 June 2014
Amazon AWS (2014). http://aws.amazon.com/. Accessed 15 Mar 2014
Aws manifest file options. http://docs.aws.amazon.com/AWSImportExport/latest/DG/ManifestFileParameters.html
Aws market share. https://www.srgresearch.com/articles/amazon-continues-to-dominate-iaaspaas-despite-strong-push-frommicrosoft-ibm
Celery (2014). http://www.celeryproject.org/. Accessed 15 Mar 2014
Eucalyptus - Open Source, AWS-Compatible Private Cloud Infrastructure. http://www.eucalyptus.com
Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. In: Concurrency and Computation: Practice and Experience (2008)
Google Public Datasets (2014). https://www.google.com/publicdata/directory. Accessed 15 June 2014
HealthData.gov Public Datasets (2014). http://healthdata.gov/dataset/search. Accessed 15 June 2014
Horuk, C., Douglas, G., Gupta, A., Krintz, C., Bales, B., Bellesia, G., Drawert, B., Wolski, R., Petzold, L., Hellander, A.: Automatic and Portable Cloud Deployment for Scientific Simulations. In: IEEE Conference on High Performance Computing and Simulation (HPCS) (2014)
Jette, M., Yoo, A., Grondona, M.: Slurm: Simple linux utility for resource management. In: Job Scheduling Strategies for Parallel Processing (JSSPP) (2002)
Muniswamy-Reddy, K., Seltzer, M.: Provenance for the Cloud. In: USENIX Conference on File and Storage Technologies (2010)
Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. CCGRID’09, pp. 124–131. IEEE (2009)
ReadWriteWeb Open Data (2014). http://readwrite.com/2008/04/09/where_to_find_data_on_the#awesm=oHspy4ZUfG9lUr. Accessed 15 June 2014
Sanft, K., Wu, S., Roh, M., Fu, J., Lim, R.K., Petzold, L.: StochKit2: software for discrete stochastic simulation of biochemical systems with events. Bioinformatics 27(17), 2457–2458 (2011)
Simmhan, Y., Pale, B., Gannon, D.: A survey of data provenance in e-Science. SIGMOD Rec. 34(3), 31–36 (2005)
Stanford Large Network Dataset Collection (SNAP) (2014). http://snap.stanford.edu/data/. Accessed 15 June 2014
Stoch, S.S.: http://www.stochss.org/. Accessed 20 Apr 2014
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurr. Pract. Experience 17(2–4), 323–356 (2005)
Zhang, O., Kirchberg, M., Ko, R., Lee, B.: How to track your data: The case for cloud computing provenance. In: CloudCom (2011)
Zhao, B.: Social network datasets (2014). http://current.cs.ucsb.edu/socialnets/#code. Accessed 15 June 2014
Acknowledgements
We thank the reviewers for their valuable feedback on this paper. This work was funded in part by NSF (CNS-0905237 and CNS-1218808) and NIH (1R01EB014877-01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Douglas, G., Drawert, B., Krintz, C., Wolski, R. (2014). CloudTracker: Using Execution Provenance to Optimize the Cost of Cloud Use. In: Altmann, J., Vanmechelen, K., Rana, O. (eds) Economics of Grids, Clouds, Systems, and Services. GECON 2014. Lecture Notes in Computer Science(), vol 8914. Springer, Cham. https://doi.org/10.1007/978-3-319-14609-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-14609-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14608-9
Online ISBN: 978-3-319-14609-6
eBook Packages: Computer ScienceComputer Science (R0)