A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs
In this paper, we present an efficient technique for optimising data replication under the data parallel programming model. We propose a precise mathematical representation for data replication which allows handling replication as an explicit, separate stage in the parallel data placement problem. This representation takes the form of an invertible mapping. We argue that this property is key to making data replication amenable to good mathematical optimisation algorithms. We further outline an algorithm for optimising data replication, based on this representation, which performs interprocedural data placement optimisation over a sequence of loop nests. We have implemented the algorithm and show performance figures.
KeywordsLoop Nest Data Replication Data Placement Simple Reduction Copy Function
Unable to display preview. Download preview PDF.
- 1.R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, USA, 1994.Google Scholar
- 2.O. Beckmann and P. H. J. Kelly. Efficient interprocedural data placement optimisation in a parallel library. In D. OrsHallaron, editor LCR98: Fourth International Workshop on Languages, Compilers and Run-time Systems for Scalable Computers, volume 1511 of LNCS, pages123–138. Springer-Verlag, May 1998.CrossRefGoogle Scholar
- 3.A. N. Burton and P. H. J. Kelly. Tracing and reexecuting operating system calls for reproducible performance experiments. Journal of Computers and Electrical Engineering—Special Issue on Performance Evaluation of High Performance Computing and Computers, 1999. To appear.Google Scholar
- 4.S. Chatterjee, J. R. Gilbert, and R. Schreiber. Mobile and replicated alignment of arrays in data-parallel programs. In Proceedings of Supercomputing’ 93, pages 420–429, Nov. 1993.Google Scholar
- 5.S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng. Automatic array alignment in data-parallel programs. In Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, South Carolina, January 10–13,1992, pages16–28. ACM Press, 1993.Google Scholar
- 7.J. A. Green. Sets and Groups. Routledge & Kegan Paul, second edition, 1988.Google Scholar
- 8.C. H. Koelbel, D. B. Loveman, R. S. Schreiber, G. L. Steele Jr., and M. E. Zosel. The High Performance Fortran Handbook. MIT Press, Cambridge, MA, USA, Jan. 1994.Google Scholar
- 9.V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Benjamin/Cummings, 1993.Google Scholar
- 10.C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran usage. ACM Transactions on Mathematical Software, 5(3):308–323, Sept. 1979.Google Scholar
- 11.Z. Li. Array privatization for parallel execution of loops. In 1992 International Conference on Supercomputing, Washington, DC, pages 313–322. ACM Press, 1992.Google Scholar
- 12.L. Snyder. A Programmer’s Guide to ZPL. Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, Jan. 1999. Verion 6.3.Google Scholar
- 13.S. A. M. Talbot. Shared-Memory Multiprocessors with Stable Performance. PhD thesis, Department of Computing, Imperial College London, UK, 1999.Google Scholar