Abstract
This paper presents a compiler algorithm that automatically detects the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. It also describes method of generating communication for the tiled loop on distributed memory machines. The algorithm presented here has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on the RISC System/6000 Scalable POWERparallel System.
Preview
Unable to display preview. Download preview PDF.
References
Stanford SUIF Compiler Group: “SUIF: A Parallelizing and Optimizing Research Compiler,” Technical Report, Stanford University, CSL-TR-94-620, 1994
C. W. Tseng: “An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines,” PhD thesis, Rice University, CRPC-TR93291, 1993
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka: “Fortran90D/HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation and Performance Results,” in Proceedings of Supercomputing `93, pp. 351–360, 1993
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, and S. Ramaswamy: “The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers,” in Proceedings of the First International Workshop on Parallel Processing, pp. 322–330, 1994
High Performance Fortran Forum: “High Performance Fortran Language Specification, Version 1.0,” Technical Report, Rice University, CRPC-TR92225, 1992
S. Hiranandani, K. Kennedy, and C. W. Tseng: “Compiling Fortran D for MIMD Distributed-Memory Machines,” Communications of the ACM, Vol. 35, pp. 66–80, 1992
T. Horie, K. Hayashi, T. Shimizu, and H. Ishihata: “Improving AP1000 Parallel Computer Performance with Message Communication,” in the 20th Annual International Symposium on Computer Architecture, pp. 314–325, 1993
A. Rogar and K. Pingali: “Process Decomposition Through Locality of Reference,” in Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, 1989
D. J. Palermo, E. Su, J. A. Chandy, and P. Banerjee: “Communication Optimizations Used in the PARADIGM Compiler for Distributed-Memory Multicomputers,” In Proceedings of the 23rd International Conference on Parallel Processing, pp. II:1–10, 1994
M. Wolfe: “High Performance Compiler for Parallel Computing,” Addison-Wesley Publishing Company, 1995
M. E. Wolfe and M. S. Lam: “A Loop Transformation and Theory and an Algorithm to Maximize Parallelism,” IEEE Transaction on Parallel and Distributed Systems, Vol. 2, No. 4, pp. 452–471, 1991
T. Agewara, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir: “SP2 System Architecture,” IBM Systems Journal 344, No.2. pp. 152–184,1995
A. Lain and P. Banerjee: “Techniques to Overlap Computation and Communication in Irregular Iterative Applications,” in Proceedings of the International Conference on Supercomputing, pp. 236–245, 1994
C. Koelbel, P. Mehrotra, and J. V. Rosendale: “Supporting Shared Data Structures on Distributed Memory Architectures,” in Proceedings of the ACM SIGPLAN `90 Symposium on Principles and Practice of Parallel Programming, pp. 177–186, 1990
S. Hiranandani, K. Kennedy, and C. W. Tseng: “Preliminary Experiences with the Fortran D Compiler,” in Proceedings of Supercomputing `93, pp. 338–350, 1993
R. Hanxlenden and K. Kennedy: “GIVE-N-TAKE: A Balanced Code Placement Framework,” in Proceedings of the ACM SIGPLAN `94 Conference on Program Language Design and Implementation, pp. 107–120,1994
A. W. Lim and M. S. Lam: ”Maximizing Parallelism and Minimizing Synchronization with Affine Transforms,” Conference Record of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1997
J. M. Anderson, S. P. Amarasinghe and M. S. Lam: “Data and Computation Transformations for Multiprocessors,“ in Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, 1995
M. E. Wolfe and M. S. Lam: “A Data Locality Optimizing Algorithm,” in Proceedings of the ACM SIGPLAN `91 Conference on Program Language Design and Implementation, pp. 30–44, 1991
K. Ishizaki and H. Komatsu: “A Loop Parallelization Algorithm for HPF Compilers,” 8th Workshop on Language and Compilers for Parallel Computing, pp. 12.l–15, 1995
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishizaki, K., Komatsu, H., Nakatani, T. (1997). An algorithm for automatic detection of loop indices for communication overlapping. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024218
Download citation
DOI: https://doi.org/10.1007/BFb0024218
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63766-0
Online ISBN: 978-3-540-69644-5
eBook Packages: Springer Book Archive