Skip to main content

CloudTracker: Using Execution Provenance to Optimize the Cost of Cloud Use

  • Conference paper
  • First Online:
Book cover Economics of Grids, Clouds, Systems, and Services (GECON 2014)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 8914))

Included in the following conference series:

Abstract

In this work, we investigate tools that enable dollar cost optimization of scientific simulations using commercial clouds. We present a framework, called CloudTracker, that transparently records information from a simulation that is executed in a commercial cloud so that it may be “replayed” exactly to reproduce its results. Using the automated CloudTracker provenance and replay facilities, scientists can choose either to store the results of a simulation or to reproduce it on-demand – whichever is more cost efficient in terms of the dollar cost charged for storage and computing by the commercial cloud provider. We present a prototype implementation of CloudTracker for the Amazon AWS commercial cloud and the StochSS stochastic simulation system. Using this prototype, we analyze the storage-versus-compute cost tradeoffs for different classes of StochSS simulations when deployed and executed in AWS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbadi, I.M., Lyle, J.: Challenges for provenance in cloud computing. In: USENIX Workshop on the Theory and Practice of Provenance (2011)

    Google Scholar 

  2. Amazon Public Datasets (2014). https://aws.amazon.com/datasets. Accessed 15 June 2014

  3. Amazon AWS (2014). http://aws.amazon.com/. Accessed 15 Mar 2014

  4. Aws manifest file options. http://docs.aws.amazon.com/AWSImportExport/latest/DG/ManifestFileParameters.html

  5. Aws market share. https://www.srgresearch.com/articles/amazon-continues-to-dominate-iaaspaas-despite-strong-push-frommicrosoft-ibm

  6. Boto. http://boto.readthedocs.org/en/latest/

  7. Celery (2014). http://www.celeryproject.org/. Accessed 15 Mar 2014

  8. Eucalyptus - Open Source, AWS-Compatible Private Cloud Infrastructure. http://www.eucalyptus.com

  9. Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. In: Concurrency and Computation: Practice and Experience (2008)

    Google Scholar 

  10. Google Public Datasets (2014). https://www.google.com/publicdata/directory. Accessed 15 June 2014

  11. HealthData.gov Public Datasets (2014). http://healthdata.gov/dataset/search. Accessed 15 June 2014

  12. Horuk, C., Douglas, G., Gupta, A., Krintz, C., Bales, B., Bellesia, G., Drawert, B., Wolski, R., Petzold, L., Hellander, A.: Automatic and Portable Cloud Deployment for Scientific Simulations. In: IEEE Conference on High Performance Computing and Simulation (HPCS) (2014)

    Google Scholar 

  13. Jette, M., Yoo, A., Grondona, M.: Slurm: Simple linux utility for resource management. In: Job Scheduling Strategies for Parallel Processing (JSSPP) (2002)

    Google Scholar 

  14. Muniswamy-Reddy, K., Seltzer, M.: Provenance for the Cloud. In: USENIX Conference on File and Storage Technologies (2010)

    Google Scholar 

  15. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. CCGRID’09, pp. 124–131. IEEE (2009)

    Google Scholar 

  16. ReadWriteWeb Open Data (2014). http://readwrite.com/2008/04/09/where_to_find_data_on_the#awesm=oHspy4ZUfG9lUr. Accessed 15 June 2014

  17. Sanft, K., Wu, S., Roh, M., Fu, J., Lim, R.K., Petzold, L.: StochKit2: software for discrete stochastic simulation of biochemical systems with events. Bioinformatics 27(17), 2457–2458 (2011)

    Article  Google Scholar 

  18. Simmhan, Y., Pale, B., Gannon, D.: A survey of data provenance in e-Science. SIGMOD Rec. 34(3), 31–36 (2005)

    Article  Google Scholar 

  19. Stanford Large Network Dataset Collection (SNAP) (2014). http://snap.stanford.edu/data/. Accessed 15 June 2014

  20. Stoch, S.S.: http://www.stochss.org/. Accessed 20 Apr 2014

  21. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurr. Pract. Experience 17(2–4), 323–356 (2005)

    Article  Google Scholar 

  22. Zhang, O., Kirchberg, M., Ko, R., Lee, B.: How to track your data: The case for cloud computing provenance. In: CloudCom (2011)

    Google Scholar 

  23. Zhao, B.: Social network datasets (2014). http://current.cs.ucsb.edu/socialnets/#code. Accessed 15 June 2014

Download references

Acknowledgements

We thank the reviewers for their valuable feedback on this paper. This work was funded in part by NSF (CNS-0905237 and CNS-1218808) and NIH (1R01EB014877-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chandra Krintz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Douglas, G., Drawert, B., Krintz, C., Wolski, R. (2014). CloudTracker: Using Execution Provenance to Optimize the Cost of Cloud Use. In: Altmann, J., Vanmechelen, K., Rana, O. (eds) Economics of Grids, Clouds, Systems, and Services. GECON 2014. Lecture Notes in Computer Science(), vol 8914. Springer, Cham. https://doi.org/10.1007/978-3-319-14609-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14609-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14608-9

  • Online ISBN: 978-3-319-14609-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics