The Conversion Via Software of a SIMD Processor into a MIMD Processor
In this paper a method is described which takes a (pure) Lisp program and automatically decomposes it (automatic parallelization) into several parts, one for each processor of a SIMD architecture. Each of these parts is a different execution flow--a different program--. The execution of these different programs by a SIMD architecture is the main theme of the paper.
The method has been developed in some detail for the PS-2000, a SIMD Soviet multiprocessor, making it behave like AHR, a Mexican MIMD multi-microprocessor. Both the PS-2000 and AHR execute a pure Lisp program in parallel; the user or programmer is not responsible for its decomposition into n pieces, their synchronization, scheduling, etc. All these chores are performed by the system (hardware and software) instead.
In order to achieve simultaneous execution of different programs in a SIMD processor, the method uses a scheme of node scheduling (a node is a primitive Lisp operation) and node exportation.
KeywordsHost Machine Execution Flow Automatic Parallelization Node Schedule Master Copy
Unable to display preview. Download preview PDF.
- 1.Bouknight, W. J., et al. The Illiac IV System. Proc. IEEE 60 4 April 72 369–388.Google Scholar
- 2.Glushkov, V. M., et al Recursive machines and computing technology. Proc. IFIP 1974, North Holland, 65–70.Google Scholar
- 3.Guzmán A. A parallel heterarchical machine for high level language processing. In Languages and Architectures for Image Processing, M. J. B. Duff and S. Levialdi (eds). 1981 Academic Press, 230–244. Also in: Proc. 1981 Int’l conf. on Parallel Processing, 64–71.Google Scholar
- 4.Guzmán A. A heterarchical multi-microprocessor Lisp machine. Proc. 1981 IEEE Workshop on Computer Architecture for Pattern Analysis and Image Database Management. IEEE Publication 81CH-1697–2, pages 309–317.Google Scholar
- 5.Guzmán, A., and Norkin, K. The design and construction of a parallel heterarchical machine; final report of phase 1 of the AHR Project. Technical Report AHR-82–21, AHR Lab, IIMAS, Nat’l Univ. of Mexico 1982.Google Scholar
- 6.Guzmán, A., Gerzso, M., Norkin, K., and Kuprianov, B. The PS-2000 SIMD computer; technical description and instruction set. Tech. Report AHR-82–23, AHR laboratory, i1MAS, Nat’l Univ. of Mexico, 1982.Google Scholar
- 7.Guzmán, A., Gerzso, M., Norkin, K., and Vilenkin, S. Y. Functional design of Lisp interpreter for the PS-2000 SIMD computer. Technical Report AHR-83–24, IIMAS, Nat’l University of Mexico, 1983.Google Scholar
- 8.Russell, R.M. The Cray-1 computer system C ACM 21 1 Jan 78, 63–72.Google Scholar
- 9.Strong, H. R. Vector execution of flow graphs J ACM 31 1 Jan 83 186–196.Google Scholar
- 10.Tandem Nonstop II sytem description manual, Vols 1 and 2. P/N 82077 Tandem Computers Inc. Cupertino, C, USA. April 1981.Google Scholar
- 11.Glushkiv  postulated this search. To avoid it, AHR uses a fifo holding nodes ready for evaluation;they are handed out by the distributor.Google Scholar
- 12.The Lisp processor does not actually look for more work to do; instead,it just“signals” to the distributor that it wants more work; the distributor accesses the fifo and provides a new node to the processor.Google Scholar
- 13.Actually, the Lisp processor just request that thing to the distributor, which actually does the placement of the result into the father, as well as the decrementing of the nane of the father and its optional inscription in the fifo.Google Scholar
- 14.To give an example, let us suppose that the scheduler has just run its first part and it counted 12, 14, 7, 9, 10,… CAR’s and 5, 8, 2, 4, 6,… CONS’es, in processors 1, 2, 3, 4, 5, Using this information, it decides to go through 10 (parallel) executions of CAR’S and 6 (parallel) evaluation of CONS’es, in that order. Let us supposse that the evaluation of the CAR’s has generated 2, 1, 3, 2, 0,… additional CONS’es ready for evaluation. That is, there are now 7, 9, 5, 6, 6,…CONS’es ready. When coming to the evaluation of the CONS’es which was already decided to be 6, each processor evaluates 6 CONS’es (or less, if it had fewer ready). Efficiency was lost only in processor 3 (who evaluated 5 CONS’es and wasted one CONS evaluation cycle), as oppossed to the case when no addi- tional CONSES were made ready. In this last hypothetical case, since the CONS count remained at 5, 8, 2, 4, 6,…,processors 1, 3, 4,… would be below 100% efficiency. The example shows that, without spending additional computing time, the additional CONS’es made ready by the CAR evaluations, improved the efficiency of every processor who had fewer CONS’es that the number(six, in our example) of executions chosen by the scheduler.Google Scholar