Abstract
The relatively recent introduction of infrastructure-as-a-service (IaaS) clouds, such as Amazon Elastic Compute Cloud (EC2), provide users with the ability to deploy custom software stacks in virtual machines (VMs) across different cloud providers. Users can leverage IaaS clouds to create elastic environments that outsource compute and storage as needed. Additionally, these environments can adapt dynamically to demand, scaling up as demand increases and scaling down as demand decreases. In this paper, we present a large-scale elastic environment that extends cluster resources managers (e.g. Torque) with IaaS resources. Our solution integrates with an open-source elastic manager, the Elastic Processing Unit (EPU), and includes the ability to periodically recontextualize the environment with a light-weight REST-based recontextualization broker. We deploy the Gluster file system to provide a shared file system for all nodes in the environment. Though our implementation currently only supports Torque, we also thoroughly discuss how our architecture can interface with different workflows, including Hadoop’s MapReduce workflows and Condor’s match-making and high-throughput capabilities. For evaluation, we demonstrate the ability to recontextualize 256-node environments within one second of the recontextualization period, scale to over 475 nodes in less than 15 minutes, and support parallel IO from distributed nodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amazon Web Services, http://aws.amazon.com/
Amazon Web Services CloudWatch, http://aws.amazon.com/cloudwatch/
Apache Hadoop, http://hadoop.apache.org
Armbrust, M., et al.: Above the Clouds: A Berkeley View of Cloud Computing. Technical report, UC-Berkeley (2009)
Armstrong, P., et al.: Cloud Scheduler: A Resource Manager for Distributed Compute Clouds. J. CoRR (2010)
Barham, P., et al.: Xen and the art of virtualization. In: 19th ACM Symposium on Operating System Principles, pp. 164–177. ACM, New York (2003)
Bode, B., et al.: The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters. In: 4th Annual Linux Showcase and Conference, p. 27. USENIX Association, Berkeley (2000)
Ceph, http://ceph.com
EPU, GitHub, https://github.com/ooici/epu
FutureGrid, https://portal.futuregrid.org
Evangelinos, C., Hill, C.: Cloud Computing for Parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2. In: 1st Workshop on Cloud Computing and its Applications (2008)
Gentzsch, W.: Sun Grid Engine: Towards Creating A Compute Power Grid. In: 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 35–36. IEEE Computer Society, Washington D.C. (2001)
Ghoshal, D., et al.: I/O Performance of Virtualized Cloud Environments. In: 2nd International Workshop on Data Intensive Computing In The Clouds, pp. 71–80. ACM, New York (2011)
Gluster, http://www.gluster.org
He, Q., et al.: Case Study for Running HPC Applications in Public Clouds. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 395–401. ACM, New York (2010)
Huang, W., et al.: A Case for High Performance Computing with Virtual Machines. In: 20th Annual International Conference on Supercomputing, pp. 125–134. ACM, New York (2006)
Jackson, D., et al.: Core Algorithms of the Maui Scheduler. J. Job Sch. Str. for Par. Proc. 2221, 87–102 (2001)
Jackson, K., et al.: Seeking Supernovae in the Clouds: A Performance Study. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 421–429. ACM, New York (2010)
Jacob, A.: Infrastructure in the Cloud Era. In: International OReilly Conference Velocity (2009)
Juve, G., et al.: Data Sharing Options for Scientific Workflows on Amazon EC2. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–9. IEEE Computer Society, Washington D.C. (2010)
Juve, G., Deelman, E.: Automating Application Deployment in Infrastructure Clouds. In: IEEE International Conference on Cloud Computing Technology and Science, pp. 658–665 (2011)
Keahey, K., et al.: Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid. J. Sci. Pro. 13, 265–276 (2005)
Keahey, K., Freeman, T.: Contextualization: Providing One-Click Virtual Clusters. In: eScience, pp. 301–308 (2008)
Keahey, K., et al.: Infrastructure Outsourcing in Multi-Cloud Environments. In: 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, pp. 33–38. ACM, New York (2012)
Marshall, P., Keahey, K., Freeman, T.: Elastic Site: Using Clouds to Elastically Extend Site Resources. In: 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 43–52. IEEE Computer Society, Washington D.C. (2010)
Marshall, P., Tufo, H., Keahey, K.: Provisioning Policies for Elastic Computing Environments. In: 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, pp. 1085–1094. IEEE Computer Society, Washington D.C. (2012)
Marshall, P., et al.: Architecting a Large-Scale Elastic Environment: Recontextualization and Adaptive Cloud Services for Scientific Computing. In: 7th International Joint Conference on Software Technologies (2012)
Maui, http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php
Murphy, M., et al.: Dynamic Provisioning of Virtual Organization Clusters. In: 9th IEEE International Symposium on Cluster Computing and the Grid, pp. 364–371. IEEE Computer Society, Washington D.C (2009)
Nimbus, http://www.nimbusproject.org
November 2011 Top500, http://top500.org/list/2011/11/100
OOI EPU, https://confluence.oceanobservatories.org/display/syseng/CIAD+CEI+OV+Elastic+Computing
Opscode, Chef, http://www.opscode.com/chef/
Oracle Grid Engine, http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html
Ostermann, S., et al.: A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing. In: Avresky, D.R., Diaz, M., Bode, A., Ciciani, B., Dekel, E. (eds.) Cloudc omp 2009. LNICST, vol. 34, pp. 115–131. Springer, Heidelberg (2010)
PBS Python, https://subtrac.sara.nl/oss/pbs_python
Rehr, J., et al.: Scientific Computing in the Cloud. J. Com. in Sci. Eng. 12, 34–43 (2010)
Ruth, P., McGachey, P., Dongyan, X.: VioCluster: Virtualization for Dynamic Computational Domains. In: IEEE Cluster Computing, pp. 1–10. IEEE Computer Society, Washington D.C. (2005)
Ruth, P., et al.: Autonomic Live Adaptation of Virtual Computational Environments In a Multi-Domain Infrastructure. In: IEEE International Conference on Autonomic Computing, pp. 5–14. IEEE Computer Society, Washington D.C. (2006)
Sotomayor, B., et al.: Virtual Infrastructure Management in Private and Hybrid Clouds. J. Int. Comp. 13, 14–22 (2009)
Tannenbaum, T., et al.: Condor: A Distributed Job Scheduler. B. C. Comp. w. Win. 307–350 (2002)
Vinoski, S.: Advanced Message Queuing Protocol. J. Int. Comp. 10, 87–89 (2006)
Wilkening, J., et al.: Using Clouds for Metagenomics: A Case Study. In: Cluster Computing and Workshops, pp. 1–6. IEEE Computer Society, Washington D.C. (2009)
Woitaszek, M., Tufo, H.: Developing a Cloud Computing Charging Model for High-Performance Computing Resources. In: 10th IEEE International Conference on Computer and Information Technology, pp. 210–217. IEEE Computer Society, Washington D.C. (2010)
XtreemFS, http://www.xtreemfs.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marshall, P., Tufo, H., Keahey, K., LaBissoniere, D., Woitaszek, M. (2013). A Large-Scale Elastic Environment for Scientific Computing. In: Cordeiro, J., Hammoudi, S., van Sinderen, M. (eds) Software and Data Technologies. ICSOFT 2012. Communications in Computer and Information Science, vol 411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45404-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-45404-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45403-5
Online ISBN: 978-3-642-45404-2
eBook Packages: Computer ScienceComputer Science (R0)