Management += Grid
State of the art management products have until now focused on monitoring of resources such as networks, systems, servers and applications. Management usually does not deal with handling the complete resource life-cycle, i.e., of providing resources on-demand to application providers, matching and allocating resources to users so that application requirements are best met, providing guarantees through service level agreements, and monitoring and assuring these SLAs. Grid aims to provide seamless access to computational resources. In Grid based systems and the latest initiative of Open Grid Service Infrastructure (OGSI)  concepts related to discovery of network topology, allocation based on farms, monitoring towards certain goal (as specified in SLAs), and control to assure these goals are missing. However, there are parallels between grid and traditional management systems. A grid just like traditional management systems is comprised of grid nodes each of which manages a group of resources. The management+  system that we propose, will be able to manage the complete life-cycle of resources assigned to applications, provided, Grid technologies are used for reserving, allocating, and handing over resources to applications and traditional management functionalities are incorporated that deal with monitoring and control.
A management+ system will be comprised of grid nodes that communicate with each other to coordinate the tasks of resource allocation and management. The management+ system may be used for managing a resource pool inside the enterprise or spanning multiple enterprises. In order to do so, the management+ system needs capabilities to discover the resources so as to create a resource pool, maintain & model the resource information, be able to provide resources to application providers on receiving requests, be able to provide guarantees as agreed upon through Service-Level Agreements, and by monitoring and assuring them. This poses the following requirements for the management+ system:
Resource Discovery: Under the realm of each grid node are collections of resources that are allocatable and manageable by that grid node, that need to be registered and discovered. The registration information may be maintained in a central repository that may be LDAP based (as in traditional Grid) or in a UDDI based registry. The other approach is to let each of the grid nodes manage their own repository, but to have protocols similar to p-2-p systems for searching resources amongst grid nodes.
Resource Modeling: Resources that are managed by the grid or those even within a grid node are heterogeneous in nature. They are of different types, potentially distributed in different geographic locations and administrative domains (e.g., consolidated in racks vs. located on desktops). Modeling is therefore an important task for the management+ system. A relevant resource model is CIM.
Requesting Resources and Guarantees: Application providers need to specify their requests to the management+ system in some form. The specification of the request may be done in terms of application level metrics (e.g. throughput, transactions/sec) or at a low-level, an enhanced RSL may be used. The guarantees may be provided in terms of reliability, availability, security, timeliness through SLA.
Resource Allocation and Deployment: Resource requests are sent to the management+ system that then undertakes resource reservation, allocation and deployment. Resource reservation is done when the resources are not immediately required. The management+ system has to keep a note of all the competing reservations and of ensuring that the resource pool is being properly utilized. As the reservations become current, requests for resources are satisfied through match-making .
Monitoring Guarantees: Traditionally, applications are installed and operated on a fixed set of associated, physical hardware. In management+ system, resources virtualize the physical hardware. This virtualization provides the capability of transparently switching the physical hardware (in case of degradations/failures) while maintaining the transparency of a continually running application. The metrics specified on the resources in SLAs have to be monitored and aggregated into higher-level metrics that business manager(s) may relate to.
Assuring Guarantees: Once the SLA violations are detected, it is important for the management+ system to take corrective actions. Analysis tools are employed to analyze the violation data so as to decide on corrective action(s). The corrective action may range from taking one of the control actions like fail-over, reboot/rejuvenation of the resources to transparently switching the failed resources with new set of resources out of the resource pool. These new resources are obtained through the same resource allocation and deployment mechanisms as discussed earlier. This may enable closed-loop management.
KeywordsCorrective Action Grid Node Service Level Agreement Resource Pool Resource Reservation
- 1.Sahai, A., Machiraju, V., van Moorsel, A.: A System that combines Grid and Management technologies for Closed Loop Enterprise IT Management. HPL Invention disclosure #200310038 (patent pending)Google Scholar
- 2.Platform Computing, http://www.platform.org
- 3.Raman, R., Livny, M., Sloman, M.: Match-Making: Distributed Resource Man- agement for High Throughput Computing. In: Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, Chicago, IL, July 28-31 (1998)Google Scholar
- 4.Foster, I., Kesselman, C., Nick, J.M., Tuecke, S.: The Physiology of the Grid, http://www.globus.org/research/papers/ogsa.pdf
- 5.Sahai, A., Graupner, S., Machiraju, V., van Moorsel, A.: Specifying and Monitoring Commercial Grids through SLA. CCGrid (May 2003)Google Scholar