Resource management for high-performance PC clusters

Keller, Axel; Brune, Matthias; Reinefeld, Alexander

doi:10.1007/BFb0100588

Axel Keller¹,
Matthias Brune² &
Alexander Reinefeld²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1593))

Included in the following conference series:

International Conference on High-Performance Computing and Networking

119 Accesses
2 Citations

Abstract

With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode.

For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full fledged resource management software that allows to partition the system for multi-user access.

In this paper, we present our Computing Center Software (CCS), which was originally designed for managing massively parallel high-performance computers, and now adapted to modern workstation clusters. It provides

partitioning of exclusive and non-exclusive resources,
hardware-independent scheduling of interactive and batch jobs,
open, extensible interfaces to other resource management systems,
a high degree of reliability.

The work presented in this paper was done while all three authors were at Paderborn Center for Parallel Computing, http://www.uni-paderborn.de/pc2

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abramson, D., Sosic, R., Giddy, J. Hall, B.: Nimrod: A Tool for Performing Parameterized Simulations using Distributed Workstations. 4th IEEE Symp. High Performance and Distributed Computing, August 1995.
Google Scholar
Baker, M., Fox, G., Yau, H.: Cluster Computing Review. Northeast Parallel Architectures Center, Syracuse University, New York, November 1995. http://www.npar.-syr.edu/techreports/index.html
Google Scholar
Bayucan, A., Henderson, R., Proett, T., Tweten, D., Kelly, B.: Portable Batch System: External Reference Specification. Release 1.1.7, NASA Ames Research Center, June 1996.
Google Scholar
Berman, F., Wolski, R., Figueira, S., Schopf, J., Shao, G.: Application-Level Scheduling on Distributed Heterogeneous Networks. Supercomputing, November 1996.
Google Scholar
Boden, N., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro 15, 1, Feb. 1995, pp. 29–36.
Article Google Scholar
Brune, M., Gehring, J., Keller, A., Reinefeld, A.: RSD—Resource and Service Description. Intl. Symp. on High Performance Computing Systems and Applications HPCS'98, Edmonton Canada, Kluwer Academic Press, May 1998.
Google Scholar
Epema, D., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. FGCS, Vol. 12, 1996, pp. 53–66.
Google Scholar
Gehring, J., Ramme, F.: Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations. IPPS'96 Workshop on Scheduling Strategies for Parallel Processing, Hawaii, Springer LNCS 1162, 1996, pp. 41–54.
Google Scholar
GENIAS Software GmbH: Codine: Computing in Distributed Networked Environments. http://www.genias.de/products/codine, January 1999.
Google Scholar
Grimshaw, A., Weissman, J., West, E., Loyot, E.: Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems. J. Parallel Distributed Computing, Vol. 21, 1994, pp. 257–270.
Article Google Scholar
Hellwagner, H., Reinefeld, A. (eds.): Scalable Coherent Interface: Technology and Applications. Proceedings of the SCI-Europe98, Bordeaux Sept. 98. Cheshire Hensbury, 1998.
Google Scholar
Jones, J., Brickell, C.: Second Evaluation of Job Queueing/Scheduling Software: Phase 1 Report. Nasa Ames Research Center, NAS Tech. Rep. NAS-97-013, June 1997.
Google Scholar
Keller, A., Reinefeld, A.: CCS Resource Management in Networked HPC Systems. 7th Heterogeneous Computing Workshop HCW'98 at IPPS, Orlando Florida, IEEE Comp. Society Press, 1998, pp. 44–56.
Chapter Google Scholar
Kinsbury, B.A.: The Network Queuing System. Cosmic Software, NASA Ames Research Center, 1986.
Google Scholar
Litzkow, M.J., Livny, M.: Condor-A Hunter of Idle Workstations. Procs. 8th IEEE Int. Conference on Distributed Computing Systems, June 1988, pp. 104–111.
Google Scholar
LSF: Product Overview. http://www.platform.com/content/products/, January 1999.
Google Scholar
NQE-Administration. Cray-Soft USA, SG-2150 2.0, May 1995.
Google Scholar
Tandiary, F., Kothari, S.C., Dixit, A., Anderson, E.W.: Batrun: Utilizing Idle Workstations for Large-Scale Computing. IEEE Parallel and Distributed Technics, 1996, pp. 41–48.
Google Scholar

Download references

Author information

Authors and Affiliations

Paderborn Center for Parallel Computing, Baderborn, Germany
Axel Keller
Konrad-Zuse-Zentrum für Informationstechnik Berlin, Berlin, Germany
Matthias Brune & Alexander Reinefeld

Authors

Axel Keller
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Brune
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Reinefeld
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Peter Sloot Marian Bubak Alfons Hoekstra Bob Hertzberger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keller, A., Brune, M., Reinefeld, A. (1999). Resource management for high-performance PC clusters. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100588

Download citation

DOI: https://doi.org/10.1007/BFb0100588
Published: 17 November 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65821-4
Online ISBN: 978-3-540-48933-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics