Components of Systems Software for Parallel Systems
Systems software for clusters and other parallel systems affects multiple types of users. End users interact with it to submit and interact with application jobs and to avail themselves of scalable system tools. Systems administrators interact with it to configure and build software installations on individual nodes, schedule, manage, and account for application jobs and to continuously monitor the status of the system, repairing it as needed. Libraries interact with system software as they deal with the host environment. In this talk we discuss an ongoing research project devoted to an architecture for systems software that promotes robustness, flexibility, and efficiency. We present a component architecture that allows great simplicity and flexibility in the implementation of systems software. We describe a mechanism by which systems administrators can easly customize or replace individual components independently of others. We then describe the introduction of parallelism into a variety of both familiar and new system tools for both users and administrators. Finally, we present COBALT (COmponent-BAsed Lightweight Toolkit), an open-source, freely available preliminary implementation of the systems software components and scalable user tools, currently in production use in a number of environments.