Resource management for high-performance PC clusters
With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode.
For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full fledged resource management software that allows to partition the system for multi-user access.
partitioning of exclusive and non-exclusive resources,
hardware-independent scheduling of interactive and batch jobs,
open, extensible interfaces to other resource management systems,
a high degree of reliability.
KeywordsService Description Resource Management System Configuration Manager Queue Manager Machine Manager
Unable to display preview. Download preview PDF.
- 1.Abramson, D., Sosic, R., Giddy, J. Hall, B.: Nimrod: A Tool for Performing Parameterized Simulations using Distributed Workstations. 4th IEEE Symp. High Performance and Distributed Computing, August 1995.Google Scholar
- 2.Baker, M., Fox, G., Yau, H.: Cluster Computing Review. Northeast Parallel Architectures Center, Syracuse University, New York, November 1995. http://www.npar.-syr.edu/techreports/index.htmlGoogle Scholar
- 3.Bayucan, A., Henderson, R., Proett, T., Tweten, D., Kelly, B.: Portable Batch System: External Reference Specification. Release 1.1.7, NASA Ames Research Center, June 1996.Google Scholar
- 4.Berman, F., Wolski, R., Figueira, S., Schopf, J., Shao, G.: Application-Level Scheduling on Distributed Heterogeneous Networks. Supercomputing, November 1996.Google Scholar
- 6.Brune, M., Gehring, J., Keller, A., Reinefeld, A.: RSD—Resource and Service Description. Intl. Symp. on High Performance Computing Systems and Applications HPCS'98, Edmonton Canada, Kluwer Academic Press, May 1998.Google Scholar
- 7.Epema, D., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. FGCS, Vol. 12, 1996, pp. 53–66.Google Scholar
- 8.Gehring, J., Ramme, F.: Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations. IPPS'96 Workshop on Scheduling Strategies for Parallel Processing, Hawaii, Springer LNCS 1162, 1996, pp. 41–54.Google Scholar
- 9.GENIAS Software GmbH: Codine: Computing in Distributed Networked Environments. http://www.genias.de/products/codine, January 1999.Google Scholar
- 11.Hellwagner, H., Reinefeld, A. (eds.): Scalable Coherent Interface: Technology and Applications. Proceedings of the SCI-Europe98, Bordeaux Sept. 98. Cheshire Hensbury, 1998.Google Scholar
- 12.Jones, J., Brickell, C.: Second Evaluation of Job Queueing/Scheduling Software: Phase 1 Report. Nasa Ames Research Center, NAS Tech. Rep. NAS-97-013, June 1997.Google Scholar
- 14.Kinsbury, B.A.: The Network Queuing System. Cosmic Software, NASA Ames Research Center, 1986.Google Scholar
- 15.Litzkow, M.J., Livny, M.: Condor-A Hunter of Idle Workstations. Procs. 8th IEEE Int. Conference on Distributed Computing Systems, June 1988, pp. 104–111.Google Scholar
- 16.LSF: Product Overview. http://www.platform.com/content/products/, January 1999.Google Scholar
- 17.NQE-Administration. Cray-Soft USA, SG-2150 2.0, May 1995.Google Scholar
- 18.Tandiary, F., Kothari, S.C., Dixit, A., Anderson, E.W.: Batrun: Utilizing Idle Workstations for Large-Scale Computing. IEEE Parallel and Distributed Technics, 1996, pp. 41–48.Google Scholar