A systolizing compilation scheme for nested loops with linear bounds

  • Michael Barnett
  • Christian Lengauer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 693)


With the recent advances in massively parallel programmable processor networks, methods for the infusion of massive MIMD parallelism into programs have become increasingly relevant. We present a mechanical scheme for the synthesis of systolic programs from programs that do not specify concurrency or communication. The scheme can handle source programs that are perfectly nested loops with regular data dependences and that correspond to uniform recurrence equations. The target programs are in a machine-independent distributed language with asynchronous parallelism and synchronous communication. The scheme has been implemented as a prototype systolizing compiler.


Nest Loop Systolic Array Source Program Processor Array Loop Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    H. Aida, S. Leinwand, and J. Meseguer. Architectural design of the rewrite rule ensemble. In J. Delgado-Frias and W. R. Moore, editors, Proc. Int. Workshop on VLSI for Artificial Intelligence and Neural Networks, 1990. Also: Technical Report SRI-CSL-90-17, SRI Int., Dec. 1990.Google Scholar
  2. 2.
    M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, and J. A. Webb. The Warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers, C-36(12):1523–1538, Dec. 1987.Google Scholar
  3. 3.
    M. Barnett. A Systolizing Compiler. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, Mar. 1992. Technical Report TR-92-13.Google Scholar
  4. 4.
    M. Barnett and C. Lengauer. The synthesis of systolic programs. In J.-P. Banâtre and D. Le Métayer, editors, Research Directions in High-Level Parallel Programming Languages, Lecture Notes in Computer Science 574, pages 309–325. Springer-Verlag, 1992.Google Scholar
  5. 5.
    M. Barnett and C. Lengauer. Unimodularity considered non-essential (extended abstract). In M. Cosnard, editor, CONPAR 92, Lecture Notes in Computer Science. Springer-Verlag, 1992. To appear.Google Scholar
  6. 6.
    B. Baxter, G. Cox, T. Gross, H. T. Kung, D. O'Hallaron, C. Peterson, J. Webb, and P. Wiley. Building blocks for a new generation of application-specific computing systems. In S. Y. Kung and E. E. Swartzlander, editors, Application Specific Array Processors, pages 190–201. IEEE Computer Society Press, 1990.Google Scholar
  7. 7.
    A. Benaini and Y. Robert. Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem. Parallel Computing, 15(1):211–226, 1990.Google Scholar
  8. 8.
    J. Bu and E. F. Deprettere. Converting sequential iterative algorithms to recurrent equations for automatic design of systolic arrays. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 88), volume IV: VLSI; Spectral Estimation, pages 2025–2028. IEEE Press, 1988.Google Scholar
  9. 9.
    Ph. Clauss, C. Mongenet, and G. R. Perrin. Calculus of space-optimal mappings of systolic algorithms on processor arrays. In S. Y. Kung and E. E. Swartzlander, editors, Application Specific Array Processors, pages 4–18. IEEE Computer Society, 1990.Google Scholar
  10. 10.
    E. W. Dijkstra and C. S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science, Springer-Verlag, 1990.Google Scholar
  11. 11.
    B. R. Engstrom and P. R. Cappello. The SDEF programming system. Journal of Parallel and Distributed Computing, pages 201–231, 1989.Google Scholar
  12. 12.
    H. A. Fencl and C. H. Huang. On the synthesis of programs for various parallel architectures. In Proc. 1991 Int. Conf. on Parallel Processing, Vol. II, pages 202–206. Pennsylvania State University Press, 1991.Google Scholar
  13. 13.
    A. Fernández, J. M. Llabería, and J. J. Navarro. On the use of systolic algorithms for programming distributed memory multiprocessors. In J. McCanny, J. McWhirter, and E. Swartzlander Jr., editors, Systolic Array Processors, pages 631–640. Prentice-Hall Inc., 1989.Google Scholar
  14. 14.
    G. Hadley. Linear Algebra. Series in Industrial Management. Addison-Wesley, 1961.Google Scholar
  15. 15.
    C.-H. Huang and C. Lengauer. The derivation of systolic implementations of programs. Acta Informatica, 24(6):595–632, Nov. 1987.Google Scholar
  16. 16.
    INMOS Ltd. occam Programming Manual. Series in Computer Science. Prentice-Hall Inc., 1984.Google Scholar
  17. 17.
    INMOS Ltd. occam 2 Reference Manual. Series in Computer Science. Prentice-Hall Inc., 1988.Google Scholar
  18. 18.
    INMOS Ltd. transputer Reference Manual Prentice-Hall Inc., 1988.Google Scholar
  19. 19.
    INMOS Ltd. The T9000 transputer · Products Overview · Manual. SGS-Thompson Microelectronics Group, first edition, 1991.Google Scholar
  20. 20.
    H. V. Jagadish, S. K. Rao, and T. Kailath. Array architectures for iterative algorithms. Proc. IEEE, 75(9):1304–1320, Sept. 1987.Google Scholar
  21. 21.
    R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. Journal of the Association for Computing Machinery, 14(3):563–590, July 1967.Google Scholar
  22. 22.
    H. T. Kung and C. E. Leiserson. Algorithms for VLSI processor arrays. In C. Mead and L. Conway, editors, Introduction to VLSI Systems. Addison-Wesley, 1980.Google Scholar
  23. 23.
    S.-Y. Kung. VLSI Array Processors. Prentice-Hall Inc., 1988.Google Scholar
  24. 24.
    S. Lay. Convex Sets and Their Applications. Series in Pure and Applied Mathematics. John Wiley & Sons, 1982.Google Scholar
  25. 25.
    H. Le Verge, C. Mauras, and P. Quinton. The ALPHA language and its use for the design of systolic arrays. Journal of VLSI Signal Processing, 3:173–182, 1991.Google Scholar
  26. 26.
    H. Le Verge and P. Quinton. The palindrome systolic array revisited. In J.-P. Banâtre and D. Le Métayer, editors, Research Directions in High-Level Parallel Programming Languages, Lecture Notes in Computer Science 574, pages 298–308. Springer-Verlag, 1992.Google Scholar
  27. 27.
    P. Lee and Z. Kedem. Synthesizing linear array algorithms from nested for loop algorithms. IEEE Transactions on Computers, TC-37(12):1578–1598, Dec. 1988.Google Scholar
  28. 28.
    C. Lengauer, M. Barnett, and D. G. Hudson. Towards systolizing compilation. Distributed Computing, 5(1):7–24, 1991.Google Scholar
  29. 29.
    L.-C. Lu and M. Chen. New loop transformation techniques for massive parallelism. Technical Report YALEU/DCS/TR-833, Yale University, Oct. 1990.Google Scholar
  30. 30.
    D. D. Prest. Translation of abstract distributed programs to occam 2. 4th-Year Report, Department of Computer Science, University of Edinburgh, May 1992.Google Scholar
  31. 31.
    P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In Proc. 11th Ann. Int. Symp. on Computer Architecture, pages 208–214. IEEE Computer Society Press, 1984.Google Scholar
  32. 32.
    S. K. Rao. Regular Iterative Algorithms and their Implementations on Processor Arrays. PhD thesis, Stanford University, Oct. 1985.Google Scholar
  33. 33.
    S. K. Rao and T. Kailath. Regular iterative algorithms and their implementations on processor arrays. Proc. IEEE, 76(2):259–282, Mar. 1988.Google Scholar
  34. 34.
    H. B. Ribas. Automatic Generation of Systolic Programs from Nested Loops. PhD thesis, Department of Computer Science, Carnegie-Mellon University, June 1990. Technical Report CMU-CS-90-143.Google Scholar
  35. 35.
    C. E. Seitz. Multicomputers. In C. A. R. Hoare, editor, Developments in Concurrency and Communication, chapter 5, pages 131–200. Addison-Wesley, 1990.Google Scholar
  36. 36.
    Z. Shen, Z. Li, and P.-C. Yew. An empirical study of FORTRAN programs for parallelizing compilers. IEEE Transactions on Parallel and Distributed Systems, 1(3):356–364, July 1990.Google Scholar
  37. 37.
    T. Shimizu, T. Horie, and H. Ishihata. Low-latency message passing communication support for the AP1000. In Proc. 19th Ann. Int. Symp. on Computer Architecture, pages 288–297. ACM Press, 1992.Google Scholar
  38. 38.
    Thinking Machines Corporation. The Connection Machine CM-5, Technical Summary, Oct. 1991.Google Scholar
  39. 39.
    M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.Google Scholar
  40. 40.
    J. Xue and C. Lengauer. On one-dimensional systolic arrays. In Proc. ACM Int. Workshop on Formal Methods in VLSI Design. Springer-Verlag, Jan. 1991. To appear.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1993

Authors and Affiliations

  • Michael Barnett
    • 1
  • Christian Lengauer
    • 2
  1. 1.Department of Computer SciencesThe University of Texas at AustinAustinUSA
  2. 2.Fakultät für Mathematik und InformatikUniversität PassauPassauGermany

Personalised recommendations