Abstract
This paper addresses the compile-time optimization of a class of nested-loop computations that arise in some computational physics applications. The computations involve summations over products of array terms in order to compute multi-dimensional surface and volume integrals. Reordering additions and multiplications and applying the distributive law can significantly reduce the number of operations required in evaluating these summations. In a multiprocessor environment, proper distribution of the arrays among processors will reduce the inter-processor communication time. We present a formal description of the operation minimization problem, a proof of its NP-completeness, and a pruning strategy for finding the optimal solution in small cases. We also give an algorithm for determining the optimal distribution of the arrays among processors in a multiprocessor environment.
Supported in part by NSF grant DMR-9520319.
Preview
Unable to display preview. Download preview PDF.
References
C. N. Fischer and R. J. Leblanc Jr. Crafting a Compiler. Menlo Park, CA: Benjamin/ Cummings, 1991.
Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman, 1979.
Ken Kennedy and Kathryn S. McKinley. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Languages and Compilers for Parallel Computing, August 1993, 301–320.
Ken Kennedy and Kathryn S. McKinley. Optimizing for Parallelism and Data Locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, July 1992, 323–334.
V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. RedWood City, CA: Benjamin/Cummings, 1994.
C. C. Lu and W. C. Chew. Fast Algorithm for Solving Hybrid Integral Equations. In IEE Proceedings-H, 140(6): 455–460, December 1993.
Edmund K. Miller. Solving Bigger Problems-By Decreasing the Operation Count and Increasing the Computation Bandwidth. In Proceedings of the IEEE, 79(10): 1493–1504, October 1991.
M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan. Multiple Constant Multiplications: Efficient and Versatile Framework and Algorithms for Exploring Common Subexpression Elimination. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 15(2): 151–164, February 1996.
S. Winograd. Arithmetic complexity of computations. Philadelphia: Society for Industrial and Applied Mathematics, 1980.
M. Wolfe. High Performance Compilers for Parallel Computing. Addison Wesley, 1996.
Michael E. Wolf and Monica S. Lam. A Data Locality Algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, June 1991, 30–44.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lam, CC., Sadayappan, P., Wenger, R. (1997). Optimal reordering and mapping of a class of nested-loops for parallel execution. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017261
Download citation
DOI: https://doi.org/10.1007/BFb0017261
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive