Abstract
Infrastructure as a Service (IaaS) clouds provide a composable environment that is attractive for mid-range, high-throughput and data-intensive scientific workloads. However, the flexibility of IaaS clouds presents unique challenges for storage and data management in these environments. Users use manual and/or ad-hoc methods to manage storage selection, storage configuration and data management in these environments. We address these challenges via a novel storage and data life cycle management through FRIEDA (Flexible Robust Intelligent Elastic Data Management), an application specific storage and data management framework for composable infrastructure environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Monkey is a play on words. It plays on FRIEDA by referencing Frieda Kahlo’s use of Monkeys in her paintings. Additionally, the flexible nature of Monkey is highlighted which allows the user to “monkey-around” with different infrastructure deployments.
References
Apache Libcloud. http://libcloud.apache.org/, 2013.
Open mpi. http://www.open-mpi.org/, 2013.
Puppet Labs Puppet Open Source. http://puppetlabs.com/puppet/puppet-open-source/, 2013.
W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The globus striped gridftp framework and server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, SC ’05, pages 54–, Washington, DC, USA, 2005. IEEE Computer Society.
B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy, R. Kettimuthu, J. Kordas, M. Link, S. Martin, K. Pickett, et al. Globus online: Radical simplification of data movement via saas. Preprint CI-PP-5-0611, Computation Institute, The University of Chicago, 2011.
G. A. Alvarez, E. Borowsky, S. Go, T. H. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Veitch, and J. Wilkes. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst., 19(4):483–518, Nov. 2001.
R. D. Bjornson, A. H. Sherman, S. B. Weston, N. Willard, and J. Wing. Turboblast(r): A parallel implementation of blast built on the turbohub. In Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS ’02, pages 325–, Washington, DC, USA, 2002. IEEE Computer Society.
D. Borthakur. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation, 2007.
J. Bresnahan, K. Keahey, D. LaBissoniere, and T. Freeman. Cumulus: an open source storage cloud for science. In Proceedings of the 2nd international workshop on Scientific cloud computing, ScienceCloud ’11, pages 25–32, New York, NY, USA, 2011. ACM.
A. Chervenak, R. Schuler, M. Ripeanu, M. Ali Amer, S. Bharathi, I. Foster, A. Iamnitchi, and C. Kesselman. The globus replica location service: Design and experience. Parallel and Distributed Systems, IEEE Transactions on, 20(9):1260 –1272, sept. 2009.
L. Costa and M. Ripeanu. Towards automating the configuration of a distributed storage system. In Grid Computing (GRID), 2010 11th IEEE/ACM International Conference on, pages 201–208, 2010.
L.B. Costa, S. Al-Kiswany, A. Barros, H. Yang, M. Ripeanu, Predicting intermediate storage performance for workflow applications. CoRR, abs/1302.4760, 2013.
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008.
E. Deelman and A. Chervenak. Data management challenges of data-intensive scientific workflows. In Cluster Computing and the Grid, 2008. CCGRID’08. 8th IEEE International Symposium on, pages 687–692. IEEE, 2008.
E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The cost of doing science on the cloud: the montage example. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC ’08, pages 50:1–50:12, 2008.
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP ’03, pages 29–43, New York, NY, USA, 2003. ACM.
D. Ghoshal, R. S. Canon, and L. Ramakrishnan. I/o performance of virtualized cloud environments. In Proceedings of the second international workshop on Data intensive computing in the clouds, DataCloud-SC ’11, pages 71–80, 2011.
A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: online storage performance management in virtualized datacenters. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 19:1–19:14, New York, NY, USA, 2011. ACM.
H. Herodotou, F. Dong, and S. Babu. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 18:1–18:14, New York, NY, USA, 2011. ACM.
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys ’07, pages 59–72, New York, NY, USA, 2007. ACM.
K. R. Jackson, L. Ramakrishnan, K. J. Runge, and R. C. Thomas. Seeking supernovae in the clouds: a performance study. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, 2010.
K. Keahey, P. Armstrong, J. Bresnahan, D. LaBissoniere, and P. Riteau. Infrastructure outsourcing in multi-cloud environment. In Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit, FederatedClouds ’12, pages 33–38, New York, NY, USA, 2012. ACM.
A. Krishnan. Gridblast: a globus-based high-throughput implementation of blast in a grid computing framework. Concurrency Computat.: Pract. Exper., 43(2):16071623, Apr. 2005.
A. Rajasekar, R. Moore, C.-Y. Hou, C. A. Lee, R. Marciano, A. de Torcy, M. Wan, W. Schroeder, S.-Y. Chen, L. Gilbert, P. Tooby, and B. Zhu. irods primer: Integrated rule-oriented data system. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1):1–143, 2010.
S. Sakr, A. Liu, D. Batista, and M. Alomari. A survey of large scale data management approaches in cloud environments. Communications Surveys Tutorials, IEEE, 13(3):311–336, 2011.
A. Shoshani, A. Sim, and J. Gu. Storage resource managers: Middleware components for grid storage. NASA Conference Publication. NASA, 2002.
G. Singh, S. Bharathi, A. Chervenak, E. Deelman, C. Kesselman, M. Manohar, S. Patil, and L. Pearlman. A metadata catalog service for data intensive applications. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC ’03, pages 33–, New York, NY, USA, 2003. ACM.
I. J. Taylor, E. Deelman, and D. B. Gannon. Workflows for e-Science: Scientific Workflows for Grids. Springer, Dec. 2006.
E. Thereska, M. Abd-El-Malek, J. J. Wylie, D. Narayanan, and G. R. Ganger. Informed data distribution selection in a self-predicting storage system. In Proceedings of the 2006 IEEE International Conference on Autonomic Computing, ICAC ’06, pages 187–198, Washington, DC, USA, 2006. IEEE Computer Society.
E. Walker, W. Brisken, and J. Romney. To lease or not to lease from storage clouds. Computer, 43(4):44–50, 2010.
J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, and X. Qin. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), pages 1–9, Atlanta, Georgia, April 2010.
Acknowledgements
This material is based upon work supported by the Director, Office of Science, office of Advanced Scientific Computing Research (ASCR) of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Ramakrishnan, L., Ghoshal, D., Hendrix, V., Feller, E., Mantha, P., Morin, C. (2014). Storage and Data Life Cycle Management in Cloud Environments with FRIEDA. In: Li, X., Qiu, J. (eds) Cloud Computing for Data-Intensive Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1905-5_15
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1905-5_15
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1904-8
Online ISBN: 978-1-4939-1905-5
eBook Packages: Computer ScienceComputer Science (R0)