Abstract
With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode.
For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full fledged resource management software that allows to partition the system for multi-user access.
In this paper, we present our Computing Center Software (CCS), which was originally designed for managing massively parallel high-performance computers, and now adapted to modern workstation clusters. It provides
-
partitioning of exclusive and non-exclusive resources,
-
hardware-independent scheduling of interactive and batch jobs,
-
open, extensible interfaces to other resource management systems,
-
a high degree of reliability.
The work presented in this paper was done while all three authors were at Paderborn Center for Parallel Computing, http://www.uni-paderborn.de/pc2
Preview
Unable to display preview. Download preview PDF.
References
Abramson, D., Sosic, R., Giddy, J. Hall, B.: Nimrod: A Tool for Performing Parameterized Simulations using Distributed Workstations. 4th IEEE Symp. High Performance and Distributed Computing, August 1995.
Baker, M., Fox, G., Yau, H.: Cluster Computing Review. Northeast Parallel Architectures Center, Syracuse University, New York, November 1995. http://www.npar.-syr.edu/techreports/index.html
Bayucan, A., Henderson, R., Proett, T., Tweten, D., Kelly, B.: Portable Batch System: External Reference Specification. Release 1.1.7, NASA Ames Research Center, June 1996.
Berman, F., Wolski, R., Figueira, S., Schopf, J., Shao, G.: Application-Level Scheduling on Distributed Heterogeneous Networks. Supercomputing, November 1996.
Boden, N., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro 15, 1, Feb. 1995, pp. 29–36.
Brune, M., Gehring, J., Keller, A., Reinefeld, A.: RSD—Resource and Service Description. Intl. Symp. on High Performance Computing Systems and Applications HPCS'98, Edmonton Canada, Kluwer Academic Press, May 1998.
Epema, D., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. FGCS, Vol. 12, 1996, pp. 53–66.
Gehring, J., Ramme, F.: Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations. IPPS'96 Workshop on Scheduling Strategies for Parallel Processing, Hawaii, Springer LNCS 1162, 1996, pp. 41–54.
GENIAS Software GmbH: Codine: Computing in Distributed Networked Environments. http://www.genias.de/products/codine, January 1999.
Grimshaw, A., Weissman, J., West, E., Loyot, E.: Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems. J. Parallel Distributed Computing, Vol. 21, 1994, pp. 257–270.
Hellwagner, H., Reinefeld, A. (eds.): Scalable Coherent Interface: Technology and Applications. Proceedings of the SCI-Europe98, Bordeaux Sept. 98. Cheshire Hensbury, 1998.
Jones, J., Brickell, C.: Second Evaluation of Job Queueing/Scheduling Software: Phase 1 Report. Nasa Ames Research Center, NAS Tech. Rep. NAS-97-013, June 1997.
Keller, A., Reinefeld, A.: CCS Resource Management in Networked HPC Systems. 7th Heterogeneous Computing Workshop HCW'98 at IPPS, Orlando Florida, IEEE Comp. Society Press, 1998, pp. 44–56.
Kinsbury, B.A.: The Network Queuing System. Cosmic Software, NASA Ames Research Center, 1986.
Litzkow, M.J., Livny, M.: Condor-A Hunter of Idle Workstations. Procs. 8th IEEE Int. Conference on Distributed Computing Systems, June 1988, pp. 104–111.
LSF: Product Overview. http://www.platform.com/content/products/, January 1999.
NQE-Administration. Cray-Soft USA, SG-2150 2.0, May 1995.
Tandiary, F., Kothari, S.C., Dixit, A., Anderson, E.W.: Batrun: Utilizing Idle Workstations for Large-Scale Computing. IEEE Parallel and Distributed Technics, 1996, pp. 41–48.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag
About this paper
Cite this paper
Keller, A., Brune, M., Reinefeld, A. (1999). Resource management for high-performance PC clusters. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100588
Download citation
DOI: https://doi.org/10.1007/BFb0100588
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65821-4
Online ISBN: 978-3-540-48933-7
eBook Packages: Springer Book Archive