Skip to main content

Storage and Data Life Cycle Management in Cloud Environments with FRIEDA

  • Chapter
  • First Online:
Cloud Computing for Data-Intensive Applications

Abstract

Infrastructure as a Service (IaaS) clouds provide a composable environment that is attractive for mid-range, high-throughput and data-intensive scientific workloads. However, the flexibility of IaaS clouds presents unique challenges for storage and data management in these environments. Users use manual and/or ad-hoc methods to manage storage selection, storage configuration and data management in these environments. We address these challenges via a novel storage and data life cycle management through FRIEDA (Flexible Robust Intelligent Elastic Data Management), an application specific storage and data management framework for composable infrastructure environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Monkey is a play on words. It plays on FRIEDA by referencing Frieda Kahlo’s use of Monkeys in her paintings. Additionally, the flexible nature of Monkey is highlighted which allows the user to “monkey-around” with different infrastructure deployments.

References

  1. Apache Libcloud. http://libcloud.apache.org/, 2013.

  2. Open mpi. http://www.open-mpi.org/, 2013.

  3. Puppet Labs Puppet Open Source. http://puppetlabs.com/puppet/puppet-open-source/, 2013.

  4. W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The globus striped gridftp framework and server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, SC ’05, pages 54–, Washington, DC, USA, 2005. IEEE Computer Society.

    Google Scholar 

  5. B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy, R. Kettimuthu, J. Kordas, M. Link, S. Martin, K. Pickett, et al. Globus online: Radical simplification of data movement via saas. Preprint CI-PP-5-0611, Computation Institute, The University of Chicago, 2011.

    Google Scholar 

  6. G. A. Alvarez, E. Borowsky, S. Go, T. H. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Veitch, and J. Wilkes. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst., 19(4):483–518, Nov. 2001.

    Google Scholar 

  7. R. D. Bjornson, A. H. Sherman, S. B. Weston, N. Willard, and J. Wing. Turboblast(r): A parallel implementation of blast built on the turbohub. In Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS ’02, pages 325–, Washington, DC, USA, 2002. IEEE Computer Society.

    Google Scholar 

  8. D. Borthakur. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation, 2007.

    Google Scholar 

  9. J. Bresnahan, K. Keahey, D. LaBissoniere, and T. Freeman. Cumulus: an open source storage cloud for science. In Proceedings of the 2nd international workshop on Scientific cloud computing, ScienceCloud ’11, pages 25–32, New York, NY, USA, 2011. ACM.

    Google Scholar 

  10. A. Chervenak, R. Schuler, M. Ripeanu, M. Ali Amer, S. Bharathi, I. Foster, A. Iamnitchi, and C. Kesselman. The globus replica location service: Design and experience. Parallel and Distributed Systems, IEEE Transactions on, 20(9):1260 –1272, sept. 2009.

    Google Scholar 

  11. L. Costa and M. Ripeanu. Towards automating the configuration of a distributed storage system. In Grid Computing (GRID), 2010 11th IEEE/ACM International Conference on, pages 201–208, 2010.

    Google Scholar 

  12. L.B. Costa, S. Al-Kiswany, A. Barros, H. Yang, M. Ripeanu, Predicting intermediate storage performance for workflow applications. CoRR, abs/1302.4760, 2013.

    Google Scholar 

  13. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008.

    Google Scholar 

  14. E. Deelman and A. Chervenak. Data management challenges of data-intensive scientific workflows. In Cluster Computing and the Grid, 2008. CCGRID’08. 8th IEEE International Symposium on, pages 687–692. IEEE, 2008.

    Google Scholar 

  15. E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The cost of doing science on the cloud: the montage example. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC ’08, pages 50:1–50:12, 2008.

    Google Scholar 

  16. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP ’03, pages 29–43, New York, NY, USA, 2003. ACM.

    Google Scholar 

  17. D. Ghoshal, R. S. Canon, and L. Ramakrishnan. I/o performance of virtualized cloud environments. In Proceedings of the second international workshop on Data intensive computing in the clouds, DataCloud-SC ’11, pages 71–80, 2011.

    Google Scholar 

  18. A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: online storage performance management in virtualized datacenters. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 19:1–19:14, New York, NY, USA, 2011. ACM.

    Google Scholar 

  19. H. Herodotou, F. Dong, and S. Babu. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 18:1–18:14, New York, NY, USA, 2011. ACM.

    Google Scholar 

  20. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys ’07, pages 59–72, New York, NY, USA, 2007. ACM.

    Google Scholar 

  21. K. R. Jackson, L. Ramakrishnan, K. J. Runge, and R. C. Thomas. Seeking supernovae in the clouds: a performance study. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, 2010.

    Google Scholar 

  22. K. Keahey, P. Armstrong, J. Bresnahan, D. LaBissoniere, and P. Riteau. Infrastructure outsourcing in multi-cloud environment. In Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit, FederatedClouds ’12, pages 33–38, New York, NY, USA, 2012. ACM.

    Google Scholar 

  23. A. Krishnan. Gridblast: a globus-based high-throughput implementation of blast in a grid computing framework. Concurrency Computat.: Pract. Exper., 43(2):16071623, Apr. 2005.

    Google Scholar 

  24. A. Rajasekar, R. Moore, C.-Y. Hou, C. A. Lee, R. Marciano, A. de Torcy, M. Wan, W. Schroeder, S.-Y. Chen, L. Gilbert, P. Tooby, and B. Zhu. irods primer: Integrated rule-oriented data system. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1):1–143, 2010.

    Google Scholar 

  25. S. Sakr, A. Liu, D. Batista, and M. Alomari. A survey of large scale data management approaches in cloud environments. Communications Surveys Tutorials, IEEE, 13(3):311–336, 2011.

    Article  Google Scholar 

  26. A. Shoshani, A. Sim, and J. Gu. Storage resource managers: Middleware components for grid storage. NASA Conference Publication. NASA, 2002.

    Google Scholar 

  27. G. Singh, S. Bharathi, A. Chervenak, E. Deelman, C. Kesselman, M. Manohar, S. Patil, and L. Pearlman. A metadata catalog service for data intensive applications. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC ’03, pages 33–, New York, NY, USA, 2003. ACM.

    Google Scholar 

  28. I. J. Taylor, E. Deelman, and D. B. Gannon. Workflows for e-Science: Scientific Workflows for Grids. Springer, Dec. 2006.

    Google Scholar 

  29. E. Thereska, M. Abd-El-Malek, J. J. Wylie, D. Narayanan, and G. R. Ganger. Informed data distribution selection in a self-predicting storage system. In Proceedings of the 2006 IEEE International Conference on Autonomic Computing, ICAC ’06, pages 187–198, Washington, DC, USA, 2006. IEEE Computer Society.

    Google Scholar 

  30. E. Walker, W. Brisken, and J. Romney. To lease or not to lease from storage clouds. Computer, 43(4):44–50, 2010.

    Article  Google Scholar 

  31. J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, and X. Qin. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), pages 1–9, Atlanta, Georgia, April 2010.

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the Director, Office of Science, office of Advanced Scientific Computing Research (ASCR) of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lavanya Ramakrishnan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Ramakrishnan, L., Ghoshal, D., Hendrix, V., Feller, E., Mantha, P., Morin, C. (2014). Storage and Data Life Cycle Management in Cloud Environments with FRIEDA. In: Li, X., Qiu, J. (eds) Cloud Computing for Data-Intensive Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1905-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1905-5_15

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1904-8

  • Online ISBN: 978-1-4939-1905-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics