Skip to main content

Acceleration of first and higher order recurrences on processors with instruction level parallelism

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 768))

Abstract

This paper describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism. We introduce a new technique, called blocked back-substitution, which has lower operation count and higher performance than previous methods. The blocked back-substitution technique requires unrolling and non-symmetric optimization of innermost loop iterations. We present metrics to characterize the performance of software-pipelined loops and compare these metrics for a range of height reduction techniques and processor architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. S. Stone. Parallel Tridiagonal Equation Solvers. ACM Trans Math. Softw 1, 4 (1975), 289–307.

    Article  Google Scholar 

  2. S.-C. Chen and D. J. Kuck. Time and Parallel Processor Bounds for Linear Recurrence Systems. IEEE Transactions on Computers C-24 7 (1975), 701–717.

    Google Scholar 

  3. D. Heller. Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal Linear Systems. SIAM J. Numerical Anal. 13, 4 (1976), 484–496.

    Article  Google Scholar 

  4. A. H. Sameh and R. P. Brent. Solving Triangular Systems on a Parallel Computer. SIAM J. Numerical Analysis 14, 6 (1977), 1101–1113.

    Article  Google Scholar 

  5. D. J. Kuck. The structure of Computers and Computations. (Wiley, New York, 1978).

    Google Scholar 

  6. S. C. Chen, D. J. Kuck, and A. H. Sameh. Practical Parallel Band Triangular System Solvers. ACM Transactions on Mathematical Software 4, (1978), 270–277.

    Article  Google Scholar 

  7. H. H. Wang. A Parallel Method for Tridiagonal Equations. ACM Transactions on Mathematical Software 7, 2 (1981), 170–183.

    Google Scholar 

  8. H. A. Van Der Vorst and K. Dekker. Vectorization of Linear Recurrence Relations. SIAM Journal on Scientific and Statistical Computing 10, 1 (1989), 27–35.

    Google Scholar 

  9. Y. Tanaka, et al. Compiling Techniques for First-Order Linear Recurrences on a Vector Computer. Proceedings of the Supercomputing Conference (Orlando, FL., 1988), 174–180.

    Google Scholar 

  10. D. Callahan. Recognizing and Parallelizing Bounded Recurrences. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Processing (Santa Clara, CA, 1991).

    Google Scholar 

  11. J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers C-30. (1981), 478–490.

    Google Scholar 

  12. G. Lowney, et al. The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing 7. 1/2 (1993), 51–142.

    Google Scholar 

  13. W.-M. W. Hwu, et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing 7, 1/2 (1993), 229–248.

    Article  Google Scholar 

  14. B. R. Rau and C. D. Glaeser. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proceedings of the Fourteenth Annual Workshop on Microporgramming (1981), 183–198.

    Google Scholar 

  15. M. S.-L. Lam, A Systolic Array Optimizing Compiler. 1987, Carnegie Mellon University

    Google Scholar 

  16. B. Rau. Data Flow and Dependence Analysis for Instruction Level Parallelism. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA, 1992), 236–250.

    Google Scholar 

  17. B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. Code Generation Schemas for Modulo Scheduled DO-Loops and WHILE-Loops. Proceedings of the 25th Annual International Symposium on Micro architecture (Portland, Oregon, 1992), 158–169.

    Google Scholar 

  18. J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. The Journal of Supercomputing 7, 1/2 (1993), 181–227.

    Google Scholar 

  19. M. Schlansker and V. Kathail, Acceleration of Algebraic Recurrences on Processors with Instruction Level Parallelism. Technical Report HPL-93-55. Hewlett-Packard Laboratories, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schlansker, M., Kathail, V. (1994). Acceleration of first and higher order recurrences on processors with instruction level parallelism. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-57659-2_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57659-4

  • Online ISBN: 978-3-540-48308-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics