A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread Computing
In this paper, we propose a novel organization of high-speed memories, known as the register-cache, for a multi-threaded architecture. As the term suggests, it is organized both as a register file and a cache. Viewed from the execution unit, its contents are addressable similar to ordinary CPU registers using relatively short addresses. From the main memory perspective, it is content addressable, i.e., its contents are tagged just as in conventional caches. The register allocation for the register-cache is adaptively performed at runtime, resulting in a dynamically allocated register file.
A program is compiled into a number of instruction threads called super-actors. A super-actor becomes ready for execution only when its input data are physically residing in the register-cache and space is reserved in the register-cache to store its result. Therefore, the execution unit will never stall or ‘freeze’ when accessing instruction or data. Another advantage is that since registers are dynamically assigned at runtime, register allocation difficulties at compile-time, e.g., allocating registers for subscripted variables of large arrays, can be avoided. Architectural support for overlapping executions of super-actors and main memory operations are provided so that the available concurrency in the underlying machine can be better utilized. The preliminary simulation results seem to be very encouraging: with software pipelined loops, a register-cache of moderate size can keep the execution unit usefully busy.
KeywordsEurope Paral Allo
Unable to display preview. Download preview PDF.
- A. Agarwal, B. H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th International Symposium on Computer Architecture, pages 104–114, 1990.Google Scholar
- R. Alverson et al. The Tera computer system. In Proc. of the 1990 Intl. Conf. on Supercomputing, 1990.Google Scholar
- Arvind. Personal communication, 1990.Google Scholar
- Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing. Computation Structures Group Memo 226, Laboratory for Computer Science, MIT, 1987.Google Scholar
- R. Bell. IBM RISC system/6000 preformance tuning for numerically intensive FORTRAN and C programs. Technical Report GG24–3611, IBM Int’l. Technical Support Center, Aug. 1990.Google Scholar
- David Callahan, Steve Carr, and Ken Kennedy. Improving register allocation for subscripted variables. Proceedings of the SIGPLAN ‘80 Conference on Programming Language Design and Implementation, June 1990. White Plains, NY.Google Scholar
- G. R. Gao, H. H. J. Hum, and Y. B. Wong. Towards efficient fine-grain software pipelining. In Proceedings of the ACM International Conference on Supercomputing, Amsterdam, Netherlands, June 1990.Google Scholar
- G. R. Gao, R. Tio, and H. J. Hum. Design of an efficient datafiow architecture without datafiow. In Proceedings of the International Conference on Fifth-Generation Computers, pages 861–868, Tokyo, Japan, December 1988.Google Scholar
- P.P. Gelsinger et al. Microprocessors circa 2000. IEEE Spectrum, pages 43–47, Oct. 1989.Google Scholar
- R. H. Halstead Jr and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443–451, 1988.Google Scholar
- R. A. Iannucci. Toward a datafiow/von Neumann hybrid architecture. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 131–140, 1988.Google Scholar
- N.P. Jouppi and D.W. Wall. Available instruction-level parallelism for superscalar and superpipelined machines. In Third Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 272–282, 1988.Google Scholar
- R. Nikhil and Arvind. Can datafiow subsume von Neumann computing? In Proceedings of the 16th International Symposium on Computer Architecture, pages 262–272, Israel, 1989.Google Scholar
- R: S. Nikhil and Arvind. Id: A language with implicit parallelism. Computation Structures Group Memo 305, Laboratory for Computer Science, MIT, 1990.Google Scholar
- M.R. Thistle and B.J. Smith. A processor architecture for Horizon. In Proc. of the Supercomputing Conference ‘88, Florida, 1988.Google Scholar
- K. R. Traub. Sequential implementation of lenient programming languages. Technical Report MIT/LCS/TR-417, Laboratory for Computer Science, MIT, 1988.Google Scholar