Skip to main content

Loop quantization or unwinding done right

  • Session 4A: Compilers And Restructuring Techniques I
  • Conference paper
  • First Online:
Supercomputing (ICS 1987)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 297))

Included in the following conference series:

Abstract

Loop unwinding is a known technique for reducing loop overhead, exposing parallelism and increasing the efficiency of pipelining. Traditional loop unwinding is limited to the innermost loop in a group of nested loops and the amount of unwinding is either fixed or has to be specified by the user, on a case by case basis. In this paper we present a general technique for automatically unwinding multiply nested loops, explain its advantages over other transformation techniques and illustrate its practical effectiveness. Loop Quantization could be beneficial by itself, or coupled with other loop transformations (e.g., Do-across).

This work is supported in part by NSF grant DCR 8502884, ONR grant N00014-86-K-0215, and the Cornell NSF Supercomputing Center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.Aiken and A.Nicolau. Loop Quantization: An analysis and Algorithm. Technical Report No.87-821, Department of Computer Science, Cornell University, March 1987.

    Google Scholar 

  2. J.R.Allen and K.Kennedy. Automatic Loop Interchange. In the Proceedings of the Symposium on Compiler Construction, SIGPLAN Notices, Vol.19 No.6, 1984.

    Google Scholar 

  3. Alliant. Product Summary. Alliant Computer Systems Corporation. Acton Mass. January 1985.

    Google Scholar 

  4. U.Banerjee. Speedup of Ordinary Programs. University of Illinois Computer Science Technical Report UIUCDS-R-79-989, Oct. 1979.

    Google Scholar 

  5. R. Bogen. MACSYMA Reference Manual. Symbolics Inc., Cambridge, Mass. December 1983.

    Google Scholar 

  6. R.Brent. The Parallel Evaluation of General Arithmetic Expressions. Journal of the ACM 21, pp. 201–206, 1974.

    Google Scholar 

  7. A.E. Charlesworth. An approach to Scientific Array Processing: The Architectural Design of the AP-120b/FPS-164 Family. IEEE Computer, Vol.14, No.3, pp.18–27, 1981.

    Google Scholar 

  8. R.Cytron. Doacross: beyond vectorization for multiprocessors. Proceedings of the 1986 International Conference on Parallel Processing, pp.836–844, Aug.1986.

    Google Scholar 

  9. J.A.Fisher, J.R.Ellis, J.C.Ruttenberg and A.Nicolau. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. of the ACM Symposium on Compiler Construction, 1984.

    Google Scholar 

  10. J. A. Fisher. The Optimization of Horizontal Microcode within and beyond Basic Blocks: an Application of Processor Scheduling with Resources. New York University Ph. D. thesis, New York, 1979.

    Google Scholar 

  11. J.A.Fisher Very long instruction word architectures and the ELI-512. Yale University Department of Computer Science, Technical report # 253, 1982.

    Google Scholar 

  12. J. R. Goodman, J. Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, H. C. Young. PIPE: A VLSI Decoupled Architecture. The 12th Annual International Symposium on Computer Architecture, June 17–19, 1985, Boston, MA, 20–27.

    Google Scholar 

  13. R.W.Heuft and W.D.Little. Improved Time and Parallel Processor Bounds for Fortran-like Loops. IEEE Transactions on Computers Vol.31, No.1, 1982.

    Google Scholar 

  14. D.J. Kuck. Parallel Processing of Ordinary Programs. In Advances in Computers, Vol. 15, pp. 119–179, 1976.

    Google Scholar 

  15. R.H.Khun. Optimization and Interconnection Complexity for: Parallel Processors, Single-Stage Networks and Decision Trees. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1980.

    Google Scholar 

  16. F. H. McMahon. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS. Livermore, CA. 1983.

    Google Scholar 

  17. Y.Muraoka. Parallelism Exposure and Exploitation in Programs. University of Illinois, Urbana, Dept. of Computer Science, Tech. Rep. 71–424, 1971.

    Google Scholar 

  18. A.Nicolau. Parallelism, Memory Anti-Aliasing and Correctness for Trace Scheduling Compilers. Yale University Ph.D. Thesis, June 1984.

    Google Scholar 

  19. A.Nicolau. Percolation Scheduling: A Parallel Compilation Technique. Cornell University, Dept. of Computer Science Technical Report TR-85-678, May 1985.

    Google Scholar 

  20. A. Nicolau and K. Karplus. ROPE: a Statically Scheduled Supercomputer Architecture. First International Conference on Supercomputing Systems, St. Petersburg, FL, December 1985.

    Google Scholar 

  21. C.L.Seitz. The Cosmic Cube. Communications of the ACM, Vol.28, No.1 January 1985.

    Google Scholar 

  22. J.Solworth and A.Nicolau. Microflow: A fine-grain Parallel Processing Approach. Cornell University, Dept. of Computer Science Technical Report TR-85-710

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

E. N. Houstis T. S. Papatheodorou C. D. Polychronopoulos

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicolau, A. (1988). Loop quantization or unwinding done right. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds) Supercomputing. ICS 1987. Lecture Notes in Computer Science, vol 297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-18991-2_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-18991-2_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-18991-6

  • Online ISBN: 978-3-540-38888-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics