Accurate Data and Context Management in Message-Passing Programs
This paper presents a novel scheme for maintaining accurate information about distributed data in message-passing programs. The ability to maintain dynamically the data-to-processor mapping as well as the program contexts at which state changes occur enable a variety of sophisticated optimizations. The algorithms described in this paper are based on the static single assignment (SSA) form of message-passing programs which can be used for performing many of the classical compiler optimizations during automatic parallelization as well as for analyzing user-written message-passing programs. Reaching definition analysis is performed on SSA-structures for determining a suitable communication point. The scheme addresses possible optimizations and shows how appropriate representation of the data structures can substantially reduce the associated overheads. Our scheme uniformly handles arbitrary subscripts in array references and can handle general reducible control flow. Experimental results for a number of benchmarks on an IBM SP-2 show a conspicuous reduction in inter-processor communication as well as a marked improvement in the total run-times. We have observed up to around 10–25% reduction in total run-times in our SSA-based schemes compared to non-SSA-based schemes on 16 processors.
KeywordsContext Management Array Variable Processor Element Iteration Vector Communication Point
Unable to display preview. Download preview PDF.
- 1.pghpf Version 2.2, The Portland Group, Inc., 1997.Google Scholar
- 2.S. P. Amarasinghe and M. S. Lam. Communication Optimization and Code Generation for Distributed Memory Machines. In Proceedings of the SIGPLAN’93 Conference on Programming Language Design and Implementation, June 1993.Google Scholar
- 3.D. R. Chakrabarti and P. Banerjee. SSA-Form for Message Passing Programs. Technical Report CPDC-TR-9904-005, Center for Parallel & Distributed Computing http://www.ece.nwu.edu/cpdc, Northwestern University, April 1999.
- 4.D. R. Chakrabarti, N. Shenoy, A. Choudhary, and P. Banerjee. An Efficient Uniform Run-time Scheme for Mixed Regular-Irregular Applications. In Proceedings of The 12th ACM International Conference on Supercomputing (ICS’98), July 1998.Google Scholar
- 5.S. Chakrabarti, M. Gupta, and J. Choi. Global Communication Analysis and Optimization. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, May 1996.Google Scholar
- 6.J. F. Collard. Array SSA for Explicitly Parallel Programs, http://www.prism.uvsq.fr/jfcollar/assaepp.ps.gz. Technical report, CNRS and PRiSM, University of Versailles.
- 7.J. F. Collard. The Advantages of Instance-wise Reaching Definition Analyses in Array (S)SA. In Proceedings of LCPC’98, Eleventh International Workshop on Languages and Compilers for Parallel Computing, Chapel Hill, NC, USA, August 1998.Google Scholar
- 8.R. Cytron, J. Ferrante, B. K. Rosen, M. Wegman, and F. K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Transactions on Programming Languages and Systems, pages 451–490, October 1991.Google Scholar
- 9.K. Kennedy and N. Nedeljkovic. Combining Dependence and Data-Flow Analyses to Optimize Communication. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.Google Scholar
- 10.K. Knobe and V. Sarkar. Array SSA Form and its Use in Parallelization. In ACM Symposium on Principles of Programming Languages, pages 107–120, January 1998.Google Scholar
- 11.V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Benjamin-Cummings, 1994.Google Scholar
- 12.A. Lain. Compiler and Run-time Support for Irregular Computations. Technical report, PhD thesis, University of Illinois at Urbana-Champaign, 1995.Google Scholar
- 13.J. Lee, S. Midkiff, and D. A. Padua. Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs. In Workshop on Languages and Compilers for Parallel Computing, August 1997.Google Scholar
- 14.F. McMohan. The Livermore Fortran kernels: A computer test of the numerical performance range. Technical report, Lawrence Livermore National Laboratory, Livermore, CA, Tech. Rep. UCRL-53745, December 1986.Google Scholar
- 15.J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-time Scheduling and Execution of Loops on Message Passing Machines. Journal of Parallel and Distributed Computing, April 1990.Google Scholar
- 16.H. Srinivasan, J. Hook, and M. Wolfe. Static Single Assignment Form for Explicitly Parallel Programs. In 20th ACM Symposium on Principles of Programming Languages, pages 260–272, January 1993.Google Scholar
- 17.E. Stoltz, M. P. Gerlek, and M. Wolfe. Extended SSA with Factored Use-Def Chains to Support Optimization and Parallelism. In Proc. Hawaii International Conference on Systems Sciences, Maui, Hawaii, January 1994.Google Scholar
- 18.R. v. Hanxleden and K. Kennedy. Give-N-Take-A balanced code placement framework. In Proceedings of the SIGPLAN’ 94 Conference on Programming Language Design and Implementation, June 1994.Google Scholar