A systolizing compilation scheme for nested loops with linear bounds

Barnett, Michael; Lengauer, Christian

doi:10.1007/3-540-56883-2_17

Michael Barnett¹ &
Christian Lengauer²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 693))

134 Accesses
1 Citations

Abstract

With the recent advances in massively parallel programmable processor networks, methods for the infusion of massive MIMD parallelism into programs have become increasingly relevant. We present a mechanical scheme for the synthesis of systolic programs from programs that do not specify concurrency or communication. The scheme can handle source programs that are perfectly nested loops with regular data dependences and that correspond to uniform recurrence equations. The target programs are in a machine-independent distributed language with asynchronous parallelism and synchronous communication. The scheme has been implemented as a prototype systolizing compiler.

Financial support was received from the Science and Engineering Research Council (SERC), grant no. GR/G55457.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Aida, S. Leinwand, and J. Meseguer. Architectural design of the rewrite rule ensemble. In J. Delgado-Frias and W. R. Moore, editors, Proc. Int. Workshop on VLSI for Artificial Intelligence and Neural Networks, 1990. Also: Technical Report SRI-CSL-90-17, SRI Int., Dec. 1990.
Google Scholar
M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, and J. A. Webb. The Warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers, C-36(12):1523–1538, Dec. 1987.
Google Scholar
M. Barnett. A Systolizing Compiler. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, Mar. 1992. Technical Report TR-92-13.
Google Scholar
M. Barnett and C. Lengauer. The synthesis of systolic programs. In J.-P. Banâtre and D. Le Métayer, editors, Research Directions in High-Level Parallel Programming Languages, Lecture Notes in Computer Science 574, pages 309–325. Springer-Verlag, 1992.
Google Scholar
M. Barnett and C. Lengauer. Unimodularity considered non-essential (extended abstract). In M. Cosnard, editor, CONPAR 92, Lecture Notes in Computer Science. Springer-Verlag, 1992. To appear.
Google Scholar
B. Baxter, G. Cox, T. Gross, H. T. Kung, D. O'Hallaron, C. Peterson, J. Webb, and P. Wiley. Building blocks for a new generation of application-specific computing systems. In S. Y. Kung and E. E. Swartzlander, editors, Application Specific Array Processors, pages 190–201. IEEE Computer Society Press, 1990.
Google Scholar
A. Benaini and Y. Robert. Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem. Parallel Computing, 15(1):211–226, 1990.
Google Scholar
J. Bu and E. F. Deprettere. Converting sequential iterative algorithms to recurrent equations for automatic design of systolic arrays. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 88), volume IV: VLSI; Spectral Estimation, pages 2025–2028. IEEE Press, 1988.
Google Scholar
Ph. Clauss, C. Mongenet, and G. R. Perrin. Calculus of space-optimal mappings of systolic algorithms on processor arrays. In S. Y. Kung and E. E. Swartzlander, editors, Application Specific Array Processors, pages 4–18. IEEE Computer Society, 1990.
Google Scholar
E. W. Dijkstra and C. S. Scholten. Predicate Calculus and Program Semantics. Texts and Monographs in Computer Science, Springer-Verlag, 1990.
Google Scholar
B. R. Engstrom and P. R. Cappello. The SDEF programming system. Journal of Parallel and Distributed Computing, pages 201–231, 1989.
Google Scholar
H. A. Fencl and C. H. Huang. On the synthesis of programs for various parallel architectures. In Proc. 1991 Int. Conf. on Parallel Processing, Vol. II, pages 202–206. Pennsylvania State University Press, 1991.
Google Scholar
A. Fernández, J. M. Llabería, and J. J. Navarro. On the use of systolic algorithms for programming distributed memory multiprocessors. In J. McCanny, J. McWhirter, and E. Swartzlander Jr., editors, Systolic Array Processors, pages 631–640. Prentice-Hall Inc., 1989.
Google Scholar
G. Hadley. Linear Algebra. Series in Industrial Management. Addison-Wesley, 1961.
Google Scholar
C.-H. Huang and C. Lengauer. The derivation of systolic implementations of programs. Acta Informatica, 24(6):595–632, Nov. 1987.
Google Scholar
INMOS Ltd. occam Programming Manual. Series in Computer Science. Prentice-Hall Inc., 1984.
Google Scholar
INMOS Ltd. occam 2 Reference Manual. Series in Computer Science. Prentice-Hall Inc., 1988.
Google Scholar
INMOS Ltd. transputer Reference Manual Prentice-Hall Inc., 1988.
Google Scholar
INMOS Ltd. The T9000 transputer · Products Overview · Manual. SGS-Thompson Microelectronics Group, first edition, 1991.
Google Scholar
H. V. Jagadish, S. K. Rao, and T. Kailath. Array architectures for iterative algorithms. Proc. IEEE, 75(9):1304–1320, Sept. 1987.
Google Scholar
R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. Journal of the Association for Computing Machinery, 14(3):563–590, July 1967.
Google Scholar
H. T. Kung and C. E. Leiserson. Algorithms for VLSI processor arrays. In C. Mead and L. Conway, editors, Introduction to VLSI Systems. Addison-Wesley, 1980.
Google Scholar
S.-Y. Kung. VLSI Array Processors. Prentice-Hall Inc., 1988.
Google Scholar
S. Lay. Convex Sets and Their Applications. Series in Pure and Applied Mathematics. John Wiley & Sons, 1982.
Google Scholar
H. Le Verge, C. Mauras, and P. Quinton. The ALPHA language and its use for the design of systolic arrays. Journal of VLSI Signal Processing, 3:173–182, 1991.
Google Scholar
H. Le Verge and P. Quinton. The palindrome systolic array revisited. In J.-P. Banâtre and D. Le Métayer, editors, Research Directions in High-Level Parallel Programming Languages, Lecture Notes in Computer Science 574, pages 298–308. Springer-Verlag, 1992.
Google Scholar
P. Lee and Z. Kedem. Synthesizing linear array algorithms from nested for loop algorithms. IEEE Transactions on Computers, TC-37(12):1578–1598, Dec. 1988.
Google Scholar
C. Lengauer, M. Barnett, and D. G. Hudson. Towards systolizing compilation. Distributed Computing, 5(1):7–24, 1991.
Google Scholar
L.-C. Lu and M. Chen. New loop transformation techniques for massive parallelism. Technical Report YALEU/DCS/TR-833, Yale University, Oct. 1990.
Google Scholar
D. D. Prest. Translation of abstract distributed programs to occam 2. 4th-Year Report, Department of Computer Science, University of Edinburgh, May 1992.
Google Scholar
P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In Proc. 11th Ann. Int. Symp. on Computer Architecture, pages 208–214. IEEE Computer Society Press, 1984.
Google Scholar
S. K. Rao. Regular Iterative Algorithms and their Implementations on Processor Arrays. PhD thesis, Stanford University, Oct. 1985.
Google Scholar
S. K. Rao and T. Kailath. Regular iterative algorithms and their implementations on processor arrays. Proc. IEEE, 76(2):259–282, Mar. 1988.
Google Scholar
H. B. Ribas. Automatic Generation of Systolic Programs from Nested Loops. PhD thesis, Department of Computer Science, Carnegie-Mellon University, June 1990. Technical Report CMU-CS-90-143.
Google Scholar
C. E. Seitz. Multicomputers. In C. A. R. Hoare, editor, Developments in Concurrency and Communication, chapter 5, pages 131–200. Addison-Wesley, 1990.
Google Scholar
Z. Shen, Z. Li, and P.-C. Yew. An empirical study of FORTRAN programs for parallelizing compilers. IEEE Transactions on Parallel and Distributed Systems, 1(3):356–364, July 1990.
Google Scholar
T. Shimizu, T. Horie, and H. Ishihata. Low-latency message passing communication support for the AP1000. In Proc. 19th Ann. Int. Symp. on Computer Architecture, pages 288–297. ACM Press, 1992.
Google Scholar
Thinking Machines Corporation. The Connection Machine CM-5, Technical Summary, Oct. 1991.
Google Scholar
M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.
Google Scholar
J. Xue and C. Lengauer. On one-dimensional systolic arrays. In Proc. ACM Int. Workshop on Formal Methods in VLSI Design. Springer-Verlag, Jan. 1991. To appear.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, The University of Texas at Austin, 78712-1188, Austin, Texas, USA
Michael Barnett
Fakultät für Mathematik und Informatik, Universität Passau, Postfach 25 40, D-W8390, Passau, Germany
Christian Lengauer

Authors

Michael Barnett
View author publications
You can also search for this author in PubMed Google Scholar
Christian Lengauer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Peter E. Lauer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Barnett, M., Lengauer, C. (1993). A systolizing compilation scheme for nested loops with linear bounds. In: Lauer, P.E. (eds) Functional Programming, Concurrency, Simulation and Automated Reasoning. Lecture Notes in Computer Science, vol 693. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56883-2_17

Download citation

DOI: https://doi.org/10.1007/3-540-56883-2_17
Published: 29 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56883-4
Online ISBN: 978-3-540-47776-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics