An Automatic Iteration/Data Distribution Method Based on Access Descriptors for DSMM
Nowadays NUMA architectures are widely accepted. For such multiprocessors exploiting data locality is clearly a key issue. In this work, we present a method for automatically selecting the iteration/data distributions for a sequential F77 code, while minimizing the parallel execution overhead (communications and load unbalance). We formulate an integer programming problem to achieve that minimum parallel overhead. The constraints of the integer programming problem are derived directly from a graph known as the Locality-Communication Graph (LCG), which captures the memory locality, as well as the communication patterns, of a parallel program. In addition, our approach use the LCG to automatically schedule the communication operations required during the program execution, once the iteration/data distributions have been selected. The aggregation of messages in blocks is also dealt in our approach. The TFFT2 code, from NASA benchmarks, that includes non-affine access functions and non-affine index bounds, and repeated subroutine calls inside loops, has been correctly handled by our approach. With the iteration/data distributions derived from our method, this code achieves parallel efficiencies of over 69% for 16 processors, in a Cray T3E, an excellent performance for a complex real code.
KeywordsLocal Memory Integer Programming Problem Parallel Iteration Communication Operation Load Unbalance
Unable to display preview. Download preview PDF.
- 1.J.M. Anderson and M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of SIGPLAN’93 Conference on Programming Language Design and Implementation (PLDI), Alburquerque, New Mexico, June 1993.Google Scholar
- 2.D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and Stodghill. Solving alignment using elementary linear algebra. In K. Pingali et al., editor, Proceedings of LCPC’94, number 892 in LNCS. Springer Verlag, Ithaca, N.Y., August 1994.Google Scholar
- 3.W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, W. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, pages 78–82, Dec 1996.Google Scholar
- 4.J. Garcia, E. Ayguade, and J. Labarta. Dynamic data distribution with control flow analysis. In Proceedings of Supercomputing, Pittsburgh, PA, November 1996.Google Scholar
- 6.K. Kennedy and U. Kremer. Automatic data layout using 0-1 integer programming. In Int’l Conf. Parallel Architectures and Compilation Techniques, Montréal, Canada, Aug. 1994.Google Scholar
- 7.A. Navarro, R. Asenjo, E. Zapata, and D. Padua. Access descriptor based locality analysis for distributed-shared memory multiprocessors. In International Conference on Parallel Processing (ICPP’99), pages 86–94, Aizu-akamatzu, Japan, September 21–24 1999.Google Scholar
- 8.Angeles G. Navarro and E.L. Zapata. An automatic iteration/data distribution method based on access descriptors for DSM multiprocessors. Technical Report UMA-DAC-99/07, Department of Computer Architecture, University of Málaga, 1999.Google Scholar
- 9.Y. Paek, J. Hoeflinger, and D. Padua. Simplification of array access patterns for compiler optimizations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, June 1994.Google Scholar