A new program transformation to minimise communication in distributed memory architectures

  • Michael O'Boyle
  • G. A. Hedayat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 605)


One of the major overheads in implementing programs on distributed memory architectures is communication or non-local access. This paper describes a new transformation technique to minimise redundant non-local accesses. Firstly a criteria for determining data re-use is outlined which provides the basis for a new transformation technique based on the Hermite normal form. Once a non-local data item is accessed, it is stored locally and the computation re-ordered so that no further communications are required. This transformation is extended to the case of multiple array accesses where in general scalar expansion is necessary.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banerjee U.: Unimodular Transformations of Double Loops, Proc. of 3rd Workshop on Programming Languages and Compilers for Parallel Computing, Irvine CA, August 1990.Google Scholar
  2. 2.
    Callahan D. and Kennedy K.: Compiling Programs for Distributed Memory Multiprocessors, Journal of Supercomputing, Vol. 2 No. 2, pp 151–207, 1988.CrossRefGoogle Scholar
  3. 3.
    Carr S. and Kennedy K.:Blocking Linear Algebra Codes for Memory Hierarchies, Proceedings of the Fourth Siam Conference on Parallel Processing for Scientific Computing, Chicago, Illinois December 1989.Google Scholar
  4. 4.
    Dowling, M.: Optimal Code Parallelization using Unimodular Transformations, Parallel Computing Vol. 16, pp 157–171, 1990.CrossRefGoogle Scholar
  5. 5.
    Fox G., Hiranandani S., Kennedy K., Koelbel C., Kremer U., Tseng C-W. and Wu M-Y.: FORTRAN D Language Specification, Rice COMP TR90-141, Department of Computer Science, Rice University, February 1991.Google Scholar
  6. 6.
    Gannon D. and Jalby W.: Strategies for Cache and Local Memory Management by Global Program Transformation, Journal of Parallel and Distributed Computing, pp87–616 1988.Google Scholar
  7. 7.
    Knobe K., Lukas J.D., and Steele G.L.: Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines, Journal of Parallel and Distributed Computing 8, pp102–118 1990.CrossRefGoogle Scholar
  8. 8.
    Kulkarni D., Kumar K.G., Basu A., and Paulraj A.: Loop Partitioning for Distributed Memory Multiprocessors as Unimodular Transformations, Proc. of ACM International Conference on Supercomputing, June, 1991.Google Scholar
  9. 9.
    Lam M.S., Rothberg E.E. and Wolf M.E.: The Cache Performance and Optimizations of Blocked Algorithms, Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.Google Scholar
  10. 10.
    Lamport L.: The Parallel Execution of DO loops, CACM Vol. 17 No. 2, Feb 1974.Google Scholar
  11. 11.
    Li J. and Chen M.: Index Domain Alignment: Minimising Cost of Cross-Referencing between Distributed Arrays, IEEE Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, October 1990.Google Scholar
  12. 12.
    Lu L-C.: A Unified Framework for Systematic Loop Transformations, 3rd ACM Symposium on the Principles and Practice of Parallel Programming, April 1991.Google Scholar
  13. 13.
    O'Boyle M.F.P.: Program and Data Transformations for Efficient Execution of Array Computation on Distributed Memory Architectures, PhD thesis, University of Manchester, Dept. of Computer Science, January 1992.Google Scholar
  14. 14.
    Padua D.A. and Wolfe M.J.: Advanced Compiler Optimizations for Supercomputers, Communications of the ACM Vol.29 No.12 pp 1184–1201, December 1986.CrossRefGoogle Scholar
  15. 15.
    Pugh W.: Uniform Techniques for Loop Optimisation, International Conference on Supercomputing, June 1991.Google Scholar
  16. 16.
    Ramanujam J. and Sadyappan P.: Tiling of Iteration Spaces for Multicomputers, Proc. of International Conference on Parallel Processing, Vol. 2, pp 179–186, 1990.Google Scholar
  17. 17.
    Ribas, H.B.: Automatic Generation of Systolic Programs from Nested Loops, Carnegie-Mellon Tech. Rep. CMU-CS-90-143, June 1990.Google Scholar
  18. 18.
    Rogers A.: Compiling for Locality of Reference, Technical Report TR 91-1195, Department of Computer Science, Cornell University, March 1991.Google Scholar
  19. 19.
    Schrijver A.: Theory of Linear and Integer Programming, Wiley, Chichester 1986Google Scholar
  20. 20.
    Wolfe M.: Loop Skewing: The Wavefront Method Revisited, International Journal of Parallel Programming, Vol. 15, No. 4 pp 279–294, August 1986.CrossRefGoogle Scholar
  21. 21.
    Wolfe M.: Massive Parallelism through Program Restructuring, 3rd Symposium on the Frontiers of Massively Parallel Computation, pp 407–415, October 1990.Google Scholar
  22. 22.
    Wolf M.E. and Lam M.: An Algorithmic Approach to Compound Loop Transformations, in Advances in Languages and Compilers for Parallel Processing, eds Nicolau A., Gelernter D., Gross T. and Padua D., Pitman London, 1991.Google Scholar
  23. 23.
    Wolf M.E. and Lam M.: A Data Locality Optimizing Algorithm, ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Michael O'Boyle
    • 1
  • G. A. Hedayat
    • 1
  1. 1.Department of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations