Acceleration of first and higher order recurrences on processors with instruction level parallelism

Schlansker, Michael; Kathail, Vinod

doi:10.1007/3-540-57659-2_24

Michael Schlansker¹ &
Vinod Kathail¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 768))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

147 Accesses
4 Citations

Abstract

This paper describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism. We introduce a new technique, called blocked back-substitution, which has lower operation count and higher performance than previous methods. The blocked back-substitution technique requires unrolling and non-symmetric optimization of innermost loop iterations. We present metrics to characterize the performance of software-pipelined loops and compare these metrics for a range of height reduction techniques and processor architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. S. Stone. Parallel Tridiagonal Equation Solvers. ACM Trans Math. Softw 1, 4 (1975), 289–307.
Article Google Scholar
S.-C. Chen and D. J. Kuck. Time and Parallel Processor Bounds for Linear Recurrence Systems. IEEE Transactions on Computers C-24 7 (1975), 701–717.
Google Scholar
D. Heller. Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal Linear Systems. SIAM J. Numerical Anal. 13, 4 (1976), 484–496.
Article Google Scholar
A. H. Sameh and R. P. Brent. Solving Triangular Systems on a Parallel Computer. SIAM J. Numerical Analysis 14, 6 (1977), 1101–1113.
Article Google Scholar
D. J. Kuck. The structure of Computers and Computations. (Wiley, New York, 1978).
Google Scholar
S. C. Chen, D. J. Kuck, and A. H. Sameh. Practical Parallel Band Triangular System Solvers. ACM Transactions on Mathematical Software 4, (1978), 270–277.
Article Google Scholar
H. H. Wang. A Parallel Method for Tridiagonal Equations. ACM Transactions on Mathematical Software 7, 2 (1981), 170–183.
Google Scholar
H. A. Van Der Vorst and K. Dekker. Vectorization of Linear Recurrence Relations. SIAM Journal on Scientific and Statistical Computing 10, 1 (1989), 27–35.
Google Scholar
Y. Tanaka, et al. Compiling Techniques for First-Order Linear Recurrences on a Vector Computer. Proceedings of the Supercomputing Conference (Orlando, FL., 1988), 174–180.
Google Scholar
D. Callahan. Recognizing and Parallelizing Bounded Recurrences. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Processing (Santa Clara, CA, 1991).
Google Scholar
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers C-30. (1981), 478–490.
Google Scholar
G. Lowney, et al. The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing 7. 1/2 (1993), 51–142.
Google Scholar
W.-M. W. Hwu, et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing 7, 1/2 (1993), 229–248.
Article Google Scholar
B. R. Rau and C. D. Glaeser. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proceedings of the Fourteenth Annual Workshop on Microporgramming (1981), 183–198.
Google Scholar
M. S.-L. Lam, A Systolic Array Optimizing Compiler. 1987, Carnegie Mellon University
Google Scholar
B. Rau. Data Flow and Dependence Analysis for Instruction Level Parallelism. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA, 1992), 236–250.
Google Scholar
B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. Code Generation Schemas for Modulo Scheduled DO-Loops and WHILE-Loops. Proceedings of the 25th Annual International Symposium on Micro architecture (Portland, Oregon, 1992), 158–169.
Google Scholar
J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. The Journal of Supercomputing 7, 1/2 (1993), 181–227.
Google Scholar
M. Schlansker and V. Kathail, Acceleration of Algebraic Recurrences on Processors with Instruction Level Parallelism. Technical Report HPL-93-55. Hewlett-Packard Laboratories, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett-Packard Laboratories, 1501 Page Mill Road, 94304, Palo Alto, CA
Michael Schlansker & Vinod Kathail

Authors

Michael Schlansker
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Kathail
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schlansker, M., Kathail, V. (1994). Acceleration of first and higher order recurrences on processors with instruction level parallelism. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_24

Download citation

DOI: https://doi.org/10.1007/3-540-57659-2_24
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57659-4
Online ISBN: 978-3-540-48308-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics