Multi-Tasking with a Memory Hierarchy
Multi-tasking on multi vector-processors connected to GSU (Global Storage Unit) is discussed as one of the solutions to the market requirement for large scientific calculation.
The key points to obtaining high efficiency and performance from such multi-tasking systems are (1) vectorization, (2) granularity, (3) memory localization.
Vectorization is a technique to vectorize each task in order to obtain as much efficiency as possible from each vector processor.
Granularity is the length of time for executing a task which is independent of the tasks that are executable concurrently. Granularity depends on division of tasks that use global (common) variables. Large granularity reduces overhead for synchronization.
Memory localization depends on the division of tasks, assignment of tasks to processors, and assignment of memory to each local memory and global memory. The data movement overhead for starting a task from other processor’s local memory or from global memory lessens the efficiency of multi-tasking.
There are already many techniques for vectorization to make efficient use of vector processors. So the techniques for the other two key points are important.
This paper assumes a typical hardware model. First, the behaviour of the bench mark code “SHALLOW” on this model is analyzed from three key points of view. It is shown that a code that is multi-tasked on ten vector-processors is from two or three times as fast as that on a single vector-processor. Then various techniques for optimizing the code for multi-tasking system are discussed. Finally, it is shown that for this bench mark code the optimization is very effective, and that the modified code that is multi-tasked on ten vector-processors is about seven times as fast as original code on one vector-processor.
KeywordsData Movement Outer Loop Memory Localization Global Memory Peak Performance
Unable to display preview. Download preview PDF.
- G-R. Hoffman, P.N. Swarztrauber, and R.A. Sweet, “Aspects of Using Multiprocessors for Meteorological Modelling,” WORKSHOP on Using Multiprocessor in Meteorological Models, (1984), PP 270–358.Google Scholar