Skip to main content

Resource management for high-performance PC clusters

  • Track C2: Computational Science
  • Conference paper
  • First Online:
High-Performance Computing and Networking (HPCN-Europe 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1593))

Included in the following conference series:

Abstract

With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode.

For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full fledged resource management software that allows to partition the system for multi-user access.

In this paper, we present our Computing Center Software (CCS), which was originally designed for managing massively parallel high-performance computers, and now adapted to modern workstation clusters. It provides

  • partitioning of exclusive and non-exclusive resources,

  • hardware-independent scheduling of interactive and batch jobs,

  • open, extensible interfaces to other resource management systems,

  • a high degree of reliability.

The work presented in this paper was done while all three authors were at Paderborn Center for Parallel Computing, http://www.uni-paderborn.de/pc2

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramson, D., Sosic, R., Giddy, J. Hall, B.: Nimrod: A Tool for Performing Parameterized Simulations using Distributed Workstations. 4th IEEE Symp. High Performance and Distributed Computing, August 1995.

    Google Scholar 

  2. Baker, M., Fox, G., Yau, H.: Cluster Computing Review. Northeast Parallel Architectures Center, Syracuse University, New York, November 1995. http://www.npar.-syr.edu/techreports/index.html

    Google Scholar 

  3. Bayucan, A., Henderson, R., Proett, T., Tweten, D., Kelly, B.: Portable Batch System: External Reference Specification. Release 1.1.7, NASA Ames Research Center, June 1996.

    Google Scholar 

  4. Berman, F., Wolski, R., Figueira, S., Schopf, J., Shao, G.: Application-Level Scheduling on Distributed Heterogeneous Networks. Supercomputing, November 1996.

    Google Scholar 

  5. Boden, N., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro 15, 1, Feb. 1995, pp. 29–36.

    Article  Google Scholar 

  6. Brune, M., Gehring, J., Keller, A., Reinefeld, A.: RSD—Resource and Service Description. Intl. Symp. on High Performance Computing Systems and Applications HPCS'98, Edmonton Canada, Kluwer Academic Press, May 1998.

    Google Scholar 

  7. Epema, D., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. FGCS, Vol. 12, 1996, pp. 53–66.

    Google Scholar 

  8. Gehring, J., Ramme, F.: Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations. IPPS'96 Workshop on Scheduling Strategies for Parallel Processing, Hawaii, Springer LNCS 1162, 1996, pp. 41–54.

    Google Scholar 

  9. GENIAS Software GmbH: Codine: Computing in Distributed Networked Environments. http://www.genias.de/products/codine, January 1999.

    Google Scholar 

  10. Grimshaw, A., Weissman, J., West, E., Loyot, E.: Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems. J. Parallel Distributed Computing, Vol. 21, 1994, pp. 257–270.

    Article  Google Scholar 

  11. Hellwagner, H., Reinefeld, A. (eds.): Scalable Coherent Interface: Technology and Applications. Proceedings of the SCI-Europe98, Bordeaux Sept. 98. Cheshire Hensbury, 1998.

    Google Scholar 

  12. Jones, J., Brickell, C.: Second Evaluation of Job Queueing/Scheduling Software: Phase 1 Report. Nasa Ames Research Center, NAS Tech. Rep. NAS-97-013, June 1997.

    Google Scholar 

  13. Keller, A., Reinefeld, A.: CCS Resource Management in Networked HPC Systems. 7th Heterogeneous Computing Workshop HCW'98 at IPPS, Orlando Florida, IEEE Comp. Society Press, 1998, pp. 44–56.

    Chapter  Google Scholar 

  14. Kinsbury, B.A.: The Network Queuing System. Cosmic Software, NASA Ames Research Center, 1986.

    Google Scholar 

  15. Litzkow, M.J., Livny, M.: Condor-A Hunter of Idle Workstations. Procs. 8th IEEE Int. Conference on Distributed Computing Systems, June 1988, pp. 104–111.

    Google Scholar 

  16. LSF: Product Overview. http://www.platform.com/content/products/, January 1999.

    Google Scholar 

  17. NQE-Administration. Cray-Soft USA, SG-2150 2.0, May 1995.

    Google Scholar 

  18. Tandiary, F., Kothari, S.C., Dixit, A., Anderson, E.W.: Batrun: Utilizing Idle Workstations for Large-Scale Computing. IEEE Parallel and Distributed Technics, 1996, pp. 41–48.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Peter Sloot Marian Bubak Alfons Hoekstra Bob Hertzberger

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag

About this paper

Cite this paper

Keller, A., Brune, M., Reinefeld, A. (1999). Resource management for high-performance PC clusters. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100588

Download citation

  • DOI: https://doi.org/10.1007/BFb0100588

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65821-4

  • Online ISBN: 978-3-540-48933-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics