A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread Computing

  • Herbert H. J. Hum
  • Guang R. Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 505)


In this paper, we propose a novel organization of high-speed memories, known as the register-cache, for a multi-threaded architecture. As the term suggests, it is organized both as a register file and a cache. Viewed from the execution unit, its contents are addressable similar to ordinary CPU registers using relatively short addresses. From the main memory perspective, it is content addressable, i.e., its contents are tagged just as in conventional caches. The register allocation for the register-cache is adaptively performed at runtime, resulting in a dynamically allocated register file.

A program is compiled into a number of instruction threads called super-actors. A super-actor becomes ready for execution only when its input data are physically residing in the register-cache and space is reserved in the register-cache to store its result. Therefore, the execution unit will never stall or ‘freeze’ when accessing instruction or data. Another advantage is that since registers are dynamically assigned at runtime, register allocation difficulties at compile-time, e.g., allocating registers for subscripted variables of large arrays, can be avoided. Architectural support for overlapping executions of super-actors and main memory operations are provided so that the available concurrency in the underlying machine can be better utilized. The preliminary simulation results seem to be very encouraging: with software pipelined loops, a register-cache of moderate size can keep the execution unit usefully busy.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    A. Agarwal, B. H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th International Symposium on Computer Architecture, pages 104–114, 1990.Google Scholar
  2. [2]
    R. Alverson et al. The Tera computer system. In Proc. of the 1990 Intl. Conf. on Supercomputing, 1990.Google Scholar
  3. [3]
    Arvind. Personal communication, 1990.Google Scholar
  4. [4]
    Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing. Computation Structures Group Memo 226, Laboratory for Computer Science, MIT, 1987.Google Scholar
  5. [5]
    R. Bell. IBM RISC system/6000 preformance tuning for numerically intensive FORTRAN and C programs. Technical Report GG24–3611, IBM Int’l. Technical Support Center, Aug. 1990.Google Scholar
  6. [6]
    D. Callahan and A. Porterfield. Data cache performance of supercomputer applications. In Proc. of the Supercomputing ‘80 Conference, pages 564–572, New York, New York, 1990.CrossRefGoogle Scholar
  7. [7]
    David Callahan, Steve Carr, and Ken Kennedy. Improving register allocation for subscripted variables. Proceedings of the SIGPLAN ‘80 Conference on Programming Language Design and Implementation, June 1990. White Plains, NY.Google Scholar
  8. [8]
    G. R. Gao, H. H. J. Hum, and Y. B. Wong. Towards efficient fine-grain software pipelining. In Proceedings of the ACM International Conference on Supercomputing, Amsterdam, Netherlands, June 1990.Google Scholar
  9. [9]
    G. R. Gao, R. Tio, and H. J. Hum. Design of an efficient datafiow architecture without datafiow. In Proceedings of the International Conference on Fifth-Generation Computers, pages 861–868, Tokyo, Japan, December 1988.Google Scholar
  10. [10]
    P.P. Gelsinger et al. Microprocessors circa 2000. IEEE Spectrum, pages 43–47, Oct. 1989.CrossRefGoogle Scholar
  11. [11]
    R. H. Halstead Jr and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443–451, 1988.Google Scholar
  12. [12]
    J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgain Kaufman Publishers Inc., San Mateo, CA, 1990.MATHGoogle Scholar
  13. [13]
    R. A. Iannucci. Toward a datafiow/von Neumann hybrid architecture. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 131–140, 1988.Google Scholar
  14. [14]
    N.P. Jouppi and D.W. Wall. Available instruction-level parallelism for superscalar and superpipelined machines. In Third Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 272–282, 1988.Google Scholar
  15. [15]
    R. Nikhil and Arvind. Can datafiow subsume von Neumann computing? In Proceedings of the 16th International Symposium on Computer Architecture, pages 262–272, Israel, 1989.Google Scholar
  16. [16]
    R: S. Nikhil and Arvind. Id: A language with implicit parallelism. Computation Structures Group Memo 305, Laboratory for Computer Science, MIT, 1990.Google Scholar
  17. [17]
    M.R. Thistle and B.J. Smith. A processor architecture for Horizon. In Proc. of the Supercomputing Conference ‘88, Florida, 1988.Google Scholar
  18. [18]
    K. R. Traub. Sequential implementation of lenient programming languages. Technical Report MIT/LCS/TR-417, Laboratory for Computer Science, MIT, 1988.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1991

Authors and Affiliations

  • Herbert H. J. Hum
    • 1
    • 2
  • Guang R. Gao
    • 3
  1. 1.Centre de recherche informatique de MontréalMontrealCanada
  2. 2.McGill UniversityCanada
  3. 3.School of Computer ScienceMcGill UniversityMontrealCanada

Personalised recommendations