Skip to main content

A systolizing compilation scheme for nested loops with linear bounds

  • Chapter
  • First Online:
Functional Programming, Concurrency, Simulation and Automated Reasoning

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 693))

Abstract

With the recent advances in massively parallel programmable processor networks, methods for the infusion of massive MIMD parallelism into programs have become increasingly relevant. We present a mechanical scheme for the synthesis of systolic programs from programs that do not specify concurrency or communication. The scheme can handle source programs that are perfectly nested loops with regular data dependences and that correspond to uniform recurrence equations. The target programs are in a machine-independent distributed language with asynchronous parallelism and synchronous communication. The scheme has been implemented as a prototype systolizing compiler.

Financial support was received from the Science and Engineering Research Council (SERC), grant no. GR/G55457.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Aida, S. Leinwand, and J. Meseguer. Architectural design of the rewrite rule ensemble. In J. Delgado-Frias and W. R. Moore, editors, Proc. Int. Workshop on VLSI for Artificial Intelligence and Neural Networks, 1990. Also: Technical Report SRI-CSL-90-17, SRI Int., Dec. 1990.

    Google Scholar 

  2. M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, and J. A. Webb. The Warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers, C-36(12):1523–1538, Dec. 1987.

    Google Scholar 

  3. M. Barnett. A Systolizing Compiler. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, Mar. 1992. Technical Report TR-92-13.

    Google Scholar 

  4. M. Barnett and C. Lengauer. The synthesis of systolic programs. In J.-P. Banâtre and D. Le Métayer, editors, Research Directions in High-Level Parallel Programming Languages, Lecture Notes in Computer Science 574, pages 309–325. Springer-Verlag, 1992.

    Google Scholar 

  5. M. Barnett and C. Lengauer. Unimodularity considered non-essential (extended abstract). In M. Cosnard, editor, CONPAR 92, Lecture Notes in Computer Science. Springer-Verlag, 1992. To appear.

    Google Scholar 

  6. B. Baxter, G. Cox, T. Gross, H. T. Kung, D. O'Hallaron, C. Peterson, J. Webb, and P. Wiley. Building blocks for a new generation of application-specific computing systems. In S. Y. Kung and E. E. Swartzlander, editors, Application Specific Array Processors, pages 190–201. IEEE Computer Society Press, 1990.

    Google Scholar 

  7. A. Benaini and Y. Robert. Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem. Parallel Computing, 15(1):211–226, 1990.

    Google Scholar 

  8. J. Bu and E. F. Deprettere. Converting sequential iterative algorithms to recurrent equations for automatic design of systolic arrays. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 88), volume IV: VLSI; Spectral Estimation, pages 2025–2028. IEEE Press, 1988.

    Google Scholar 

  9. Ph. Clauss, C. Mongenet, and G. R. Perrin. Calculus of space-optimal mappings of systolic algorithms on processor arrays. In S. Y. Kung and E. E. Swartzlander, editors, Application Specific Array Processors, pages 4–18. IEEE Computer Society, 1990.

    Google Scholar 

  10. E. W. Dijkstra and C. S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science, Springer-Verlag, 1990.

    Google Scholar 

  11. B. R. Engstrom and P. R. Cappello. The SDEF programming system. Journal of Parallel and Distributed Computing, pages 201–231, 1989.

    Google Scholar 

  12. H. A. Fencl and C. H. Huang. On the synthesis of programs for various parallel architectures. In Proc. 1991 Int. Conf. on Parallel Processing, Vol. II, pages 202–206. Pennsylvania State University Press, 1991.

    Google Scholar 

  13. A. Fernández, J. M. Llabería, and J. J. Navarro. On the use of systolic algorithms for programming distributed memory multiprocessors. In J. McCanny, J. McWhirter, and E. Swartzlander Jr., editors, Systolic Array Processors, pages 631–640. Prentice-Hall Inc., 1989.

    Google Scholar 

  14. G. Hadley. Linear Algebra. Series in Industrial Management. Addison-Wesley, 1961.

    Google Scholar 

  15. C.-H. Huang and C. Lengauer. The derivation of systolic implementations of programs. Acta Informatica, 24(6):595–632, Nov. 1987.

    Google Scholar 

  16. INMOS Ltd. occam Programming Manual. Series in Computer Science. Prentice-Hall Inc., 1984.

    Google Scholar 

  17. INMOS Ltd. occam 2 Reference Manual. Series in Computer Science. Prentice-Hall Inc., 1988.

    Google Scholar 

  18. INMOS Ltd. transputer Reference Manual Prentice-Hall Inc., 1988.

    Google Scholar 

  19. INMOS Ltd. The T9000 transputer · Products Overview · Manual. SGS-Thompson Microelectronics Group, first edition, 1991.

    Google Scholar 

  20. H. V. Jagadish, S. K. Rao, and T. Kailath. Array architectures for iterative algorithms. Proc. IEEE, 75(9):1304–1320, Sept. 1987.

    Google Scholar 

  21. R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. Journal of the Association for Computing Machinery, 14(3):563–590, July 1967.

    Google Scholar 

  22. H. T. Kung and C. E. Leiserson. Algorithms for VLSI processor arrays. In C. Mead and L. Conway, editors, Introduction to VLSI Systems. Addison-Wesley, 1980.

    Google Scholar 

  23. S.-Y. Kung. VLSI Array Processors. Prentice-Hall Inc., 1988.

    Google Scholar 

  24. S. Lay. Convex Sets and Their Applications. Series in Pure and Applied Mathematics. John Wiley & Sons, 1982.

    Google Scholar 

  25. H. Le Verge, C. Mauras, and P. Quinton. The ALPHA language and its use for the design of systolic arrays. Journal of VLSI Signal Processing, 3:173–182, 1991.

    Google Scholar 

  26. H. Le Verge and P. Quinton. The palindrome systolic array revisited. In J.-P. Banâtre and D. Le Métayer, editors, Research Directions in High-Level Parallel Programming Languages, Lecture Notes in Computer Science 574, pages 298–308. Springer-Verlag, 1992.

    Google Scholar 

  27. P. Lee and Z. Kedem. Synthesizing linear array algorithms from nested for loop algorithms. IEEE Transactions on Computers, TC-37(12):1578–1598, Dec. 1988.

    Google Scholar 

  28. C. Lengauer, M. Barnett, and D. G. Hudson. Towards systolizing compilation. Distributed Computing, 5(1):7–24, 1991.

    Google Scholar 

  29. L.-C. Lu and M. Chen. New loop transformation techniques for massive parallelism. Technical Report YALEU/DCS/TR-833, Yale University, Oct. 1990.

    Google Scholar 

  30. D. D. Prest. Translation of abstract distributed programs to occam 2. 4th-Year Report, Department of Computer Science, University of Edinburgh, May 1992.

    Google Scholar 

  31. P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In Proc. 11th Ann. Int. Symp. on Computer Architecture, pages 208–214. IEEE Computer Society Press, 1984.

    Google Scholar 

  32. S. K. Rao. Regular Iterative Algorithms and their Implementations on Processor Arrays. PhD thesis, Stanford University, Oct. 1985.

    Google Scholar 

  33. S. K. Rao and T. Kailath. Regular iterative algorithms and their implementations on processor arrays. Proc. IEEE, 76(2):259–282, Mar. 1988.

    Google Scholar 

  34. H. B. Ribas. Automatic Generation of Systolic Programs from Nested Loops. PhD thesis, Department of Computer Science, Carnegie-Mellon University, June 1990. Technical Report CMU-CS-90-143.

    Google Scholar 

  35. C. E. Seitz. Multicomputers. In C. A. R. Hoare, editor, Developments in Concurrency and Communication, chapter 5, pages 131–200. Addison-Wesley, 1990.

    Google Scholar 

  36. Z. Shen, Z. Li, and P.-C. Yew. An empirical study of FORTRAN programs for parallelizing compilers. IEEE Transactions on Parallel and Distributed Systems, 1(3):356–364, July 1990.

    Google Scholar 

  37. T. Shimizu, T. Horie, and H. Ishihata. Low-latency message passing communication support for the AP1000. In Proc. 19th Ann. Int. Symp. on Computer Architecture, pages 288–297. ACM Press, 1992.

    Google Scholar 

  38. Thinking Machines Corporation. The Connection Machine CM-5, Technical Summary, Oct. 1991.

    Google Scholar 

  39. M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.

    Google Scholar 

  40. J. Xue and C. Lengauer. On one-dimensional systolic arrays. In Proc. ACM Int. Workshop on Formal Methods in VLSI Design. Springer-Verlag, Jan. 1991. To appear.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Peter E. Lauer

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Barnett, M., Lengauer, C. (1993). A systolizing compilation scheme for nested loops with linear bounds. In: Lauer, P.E. (eds) Functional Programming, Concurrency, Simulation and Automated Reasoning. Lecture Notes in Computer Science, vol 693. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56883-2_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-56883-2_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56883-4

  • Online ISBN: 978-3-540-47776-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics