Skip to main content

Generalized overlap regions for communication optimization in data-parallel programs

  • Communication Optimization
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Abstract

Data-parallel languages such as High Performance Fortran, Vienna Fortran and Fortran D include directives for alignment and distribution that describe how data and computation are mapped onto the processors in a distributed-memory multiprocessor. A compiler for these language that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access non-local data. While the address generation problem has received much attention, issues in communication have not been dealt with extensively. A novel approach for the management of communication sets and strategies for local storage of remote references is presented. Algorithms for deriving communication patterns are discussed first. Then, two schemes that extend the notion of a local array by providing storage for non-local elements (called overlap regions) interspersed throughout the storage for the local portion are presented. The two schemes, namely course padding and column padding enhance locality of reference significantly at the cost of a small overhead due to unpacking of messages. The performance of these schemes are compared to the traditional buffer-based approach and improvements of up to 30% in total time are demonstrated. Several message optimizations such as offset communication, message aggregation and coalescing are also discussed.

Supported in part by an NSF Young Investigator Award CCR-9457768, and NSF grant CCR-9210422, and by the Louisiana Board of Regents through contract LEQSF (1991–94)-RD-A-09.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. To appear in Scientific Programming, 1996.

    Google Scholar 

  2. S. Benkner. Handling block-cyclic distributed arrays in Vienna Fortran 90. In Proc. International Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995.

    Google Scholar 

  3. B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Scientific Programming, 1(1):31–50, Fall 1992.

    Google Scholar 

  4. S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng. Generating local addresses and communication sets for data parallel programs. Journal of Parallel and Distributed Computing, 26(1):72–84, 1995.

    Article  Google Scholar 

  5. G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu. Fortran D language specification. Technical Report CRPC-TR90079, Rice University, December 1990.

    Google Scholar 

  6. M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3):171–193, September 1990.

    Google Scholar 

  7. S. Gupta, S. Kaushik, C. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. To appear in Journal of Parallel and Distributed Computing.

    Google Scholar 

  8. High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1–2): 1–170, 1993.

    Google Scholar 

  9. K. Kennedy, N. Nedeljkovic, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proc. of Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.

    Google Scholar 

  10. K. Kennedy, N. Nedeljkovic, and A. Sethi. Communication generation for cyclic(k) distributions. In Languages, Compilers, and Run-Time Systems for Scalable Computers, B. Szymanski and B. Sinharoy (Eds.), Kluwer Academic Publishers, 1995.

    Google Scholar 

  11. C. Koelbel. Compile-time generation of communication for scientific programs. In Proc. Supercomputing '91, pages 101–110, November 1991.

    Google Scholar 

  12. C. Koelbel, D. Loveman, R. Schreiber, G. Steele, and M. Zosel. High Performance Fortran Handbook. The MIT Press, 1994.

    Google Scholar 

  13. J. Ramanujam. Non-unimodular transformations of nested loops. In Proc. Supercomputing 92, pages 214–223, November 1992.

    Google Scholar 

  14. C. van Reeuwijk, H.J. Sips, W. Denissen, and E. M. Paalvast. Implementing HPF distributed arrays on a message-passing parallel computer system. CP Technical Report series, TR9506, Delft University of Technology, 1995.

    Google Scholar 

  15. J. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie-Mellon University, February 1993.

    Google Scholar 

  16. E. Su, A. Lain, S. Ramaswamy, D.J. Palermo, E.W. Hodges IV, and P. Banerjee. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers. In Proc. 1995 ACM International Conference on Supercomputing, Barcelona, Spain, July 1995.

    Google Scholar 

  17. A. Thirumalai. Code generation and optimization for High Performance Fortran. M.S. Thesis, Department of Electrical and Computer Engineering, Louisiana State University, August 1995.

    Google Scholar 

  18. A. Thirumalai and J. Ramanujam. An efficient compile-time approach to compute address sequences in data parallel programs. In Proc. 5th International Workshop on Compilers for Parallel Computers, Malaga, Spain, pages 581–605, June 1995.

    Google Scholar 

  19. A. Thirumalai and J. Ramanujam. Fast address sequence generation for data-parallel programs using integer lattices. In Languages and Compilers for Parallel Computing, P. Sadayappan et al. (Eds.), Lecture Notes in Computer Science, Springer-Verlag, 1996.

    Google Scholar 

  20. A. Thirumalai, J. Ramanujam, and A. Venkatachar. Communication generation and optimization for HPF. In Languages, Compilers, and Run-Time Systems for Scalable Computers, B. Szymanski and B. Sinharoy (Eds.), Kluwer Academic Publishers, 1995.

    Google Scholar 

  21. A. Thirumalai and J. Ramanujam. Efficient computation of address sequences in data parallel programs using closed forms for basis vectors. Journal of Parallel and Distributed Computing, 38(2): 188–203, November 1996.

    Article  Google Scholar 

  22. M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Co., Redwood City, CA, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Venkatachar, A., Ramanujam, J., Thirumalai, A. (1997). Generalized overlap regions for communication optimization in data-parallel programs. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017266

Download citation

  • DOI: https://doi.org/10.1007/BFb0017266

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63091-3

  • Online ISBN: 978-3-540-69128-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics