Abstract
When planning the purchase of a compute cluster, usually much thought is spent on the choice of compute nodes, interconnects, switches, and—to a lesser extent—the operating system and system software. Important software tools for system configuration, user administration, fault tolerance, debugging, and monitoring are often overlooked. While in small systems, this does not matter too much, the lack of suitable software tools might become a nightmare, though, when trying to operate compute clusters for a large, diverse user community. The following three chapters deal with tools.
In Chapter 24, researchers from Technische Universität München (TUM) present a network monitoring tool that has been implemented in the context of their SMiLE project. With the data obtained from a hardware monitor on their own adapter card (see Chapter 4), the TUM researchers have implemented an infrastructure for the evaluation and controlled deterministic execution of hardware-supported distributed shared memory architectures.
Based on the Dolphin PCI adapter cards, researchers from the University of Paderborn have developed a simple but powerful software that allows the user to observe the utilization of processors and the network. The software monitor presented in Chapter 25 is intended for administrators to trace the system status and for users to debug and tune their application. In contrast to the above TUM project, this monitor does not actively influence the application.
Finally, Chapter 26 addresses the important issue of operating large SCI clusters as general purpose compute servers in a multi-user environment. The authors from Paderborn present the architecture of their Computer Center Software (CCS) which provides mechanisms for system partitioning, job scheduling, and user access management. With CCS, an SCI cluster is no longer seen as a collection of machines, but rather as a dedicated high-performance computer. Hence the focus of CCS is on supporting parallel high-performance applications rather than throughput computing (which is the prevalent operation mode for LAN clusters).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hellwagner, H., Reinefeld, A. (1999). Tools for SCI Clusters. In: Hellwagner, H., Reinefeld, A. (eds) SCI: Scalable Coherent Interface. Lecture Notes in Computer Science, vol 1734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10704208_31
Download citation
DOI: https://doi.org/10.1007/10704208_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66696-7
Online ISBN: 978-3-540-47048-9
eBook Packages: Springer Book Archive