Loop quantization or unwinding done right

Nicolau, Alexandru

doi:10.1007/3-540-18991-2_17

Alexandru Nicolau¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 297))

Included in the following conference series:

International Conference on Supercomputing

151 Accesses
6 Citations

Abstract

Loop unwinding is a known technique for reducing loop overhead, exposing parallelism and increasing the efficiency of pipelining. Traditional loop unwinding is limited to the innermost loop in a group of nested loops and the amount of unwinding is either fixed or has to be specified by the user, on a case by case basis. In this paper we present a general technique for automatically unwinding multiply nested loops, explain its advantages over other transformation techniques and illustrate its practical effectiveness. Loop Quantization could be beneficial by itself, or coupled with other loop transformations (e.g., Do-across).

This work is supported in part by NSF grant DCR 8502884, ONR grant N00014-86-K-0215, and the Cornell NSF Supercomputing Center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.Aiken and A.Nicolau. Loop Quantization: An analysis and Algorithm. Technical Report No.87-821, Department of Computer Science, Cornell University, March 1987.
Google Scholar
J.R.Allen and K.Kennedy. Automatic Loop Interchange. In the Proceedings of the Symposium on Compiler Construction, SIGPLAN Notices, Vol.19 No.6, 1984.
Google Scholar
Alliant. Product Summary. Alliant Computer Systems Corporation. Acton Mass. January 1985.
Google Scholar
U.Banerjee. Speedup of Ordinary Programs. University of Illinois Computer Science Technical Report UIUCDS-R-79-989, Oct. 1979.
Google Scholar
R. Bogen. MACSYMA Reference Manual. Symbolics Inc., Cambridge, Mass. December 1983.
Google Scholar
R.Brent. The Parallel Evaluation of General Arithmetic Expressions. Journal of the ACM 21, pp. 201–206, 1974.
Google Scholar
A.E. Charlesworth. An approach to Scientific Array Processing: The Architectural Design of the AP-120b/FPS-164 Family. IEEE Computer, Vol.14, No.3, pp.18–27, 1981.
Google Scholar
R.Cytron. Doacross: beyond vectorization for multiprocessors. Proceedings of the 1986 International Conference on Parallel Processing, pp.836–844, Aug.1986.
Google Scholar
J.A.Fisher, J.R.Ellis, J.C.Ruttenberg and A.Nicolau. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. of the ACM Symposium on Compiler Construction, 1984.
Google Scholar
J. A. Fisher. The Optimization of Horizontal Microcode within and beyond Basic Blocks: an Application of Processor Scheduling with Resources. New York University Ph. D. thesis, New York, 1979.
Google Scholar
J.A.Fisher Very long instruction word architectures and the ELI-512. Yale University Department of Computer Science, Technical report # 253, 1982.
Google Scholar
J. R. Goodman, J. Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, H. C. Young. PIPE: A VLSI Decoupled Architecture. The 12^th Annual International Symposium on Computer Architecture, June 17–19, 1985, Boston, MA, 20–27.
Google Scholar
R.W.Heuft and W.D.Little. Improved Time and Parallel Processor Bounds for Fortran-like Loops. IEEE Transactions on Computers Vol.31, No.1, 1982.
Google Scholar
D.J. Kuck. Parallel Processing of Ordinary Programs. In Advances in Computers, Vol. 15, pp. 119–179, 1976.
Google Scholar
R.H.Khun. Optimization and Interconnection Complexity for: Parallel Processors, Single-Stage Networks and Decision Trees. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1980.
Google Scholar
F. H. McMahon. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS. Livermore, CA. 1983.
Google Scholar
Y.Muraoka. Parallelism Exposure and Exploitation in Programs. University of Illinois, Urbana, Dept. of Computer Science, Tech. Rep. 71–424, 1971.
Google Scholar
A.Nicolau. Parallelism, Memory Anti-Aliasing and Correctness for Trace Scheduling Compilers. Yale University Ph.D. Thesis, June 1984.
Google Scholar
A.Nicolau. Percolation Scheduling: A Parallel Compilation Technique. Cornell University, Dept. of Computer Science Technical Report TR-85-678, May 1985.
Google Scholar
A. Nicolau and K. Karplus. ROPE: a Statically Scheduled Supercomputer Architecture. First International Conference on Supercomputing Systems, St. Petersburg, FL, December 1985.
Google Scholar
C.L.Seitz. The Cosmic Cube. Communications of the ACM, Vol.28, No.1 January 1985.
Google Scholar
J.Solworth and A.Nicolau. Microflow: A fine-grain Parallel Processing Approach. Cornell University, Dept. of Computer Science Technical Report TR-85-710
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, 14853, Ithaca, New York
Alexandru Nicolau

Authors

Alexandru Nicolau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

E. N. Houstis T. S. Papatheodorou C. D. Polychronopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicolau, A. (1988). Loop quantization or unwinding done right. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds) Supercomputing. ICS 1987. Lecture Notes in Computer Science, vol 297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-18991-2_17

Download citation

DOI: https://doi.org/10.1007/3-540-18991-2_17
Published: 27 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18991-6
Online ISBN: 978-3-540-38888-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics