Skip to main content

Optimizing array distributions in data-parallel programs

  • When Your Program Runs (Finally)
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 892))

Abstract

Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored graph. We present a technique that records the run time inter-node communication caused by the movement of array data between nodes during execution and builds the colored graph, and provide a simple algorithm that optimizes the coloring of this graph to describe new data distributions that would result in less inter-node communication. From the distribution information, we write compiler pragmas to be used in the application program.

Using these techniques, we traced the execution of a real data-parallel application (written in CM Fortran) and collected the array access information. We computed new distributions that should provide an overall reduction in program execution time. However, compiler optimizations and poor interfaces between the compiler and runtime systems counteracted any potential benefit from the new data layouts. In this context, we provide a set of recommendations for compiler writers that we think are needed to both write efficient programs and to build the next generation of tools for parallel systems.

The techniques that we have developed form the basis for future work in monitoring array access patterns and generate on-the-fly redistributions of arrays.

This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Material Command, USAF, under grant F33615-94-1-1525 (ARPA order no. B550), NSF Grants CCR-9100968 and CDA-9024618, Department of Energy Grant DE-FG02-93ER25176, and Office of Naval Research Grant N00014-89-J-1222. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Wright Laboratory Avionics Directorate or the U.S. Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. D. Hillis and G. L. Steele, Data Parallel Algorithms, Communications of the ACM, December 1986, 1170–1183.

    Google Scholar 

  2. CMFortran Reference Manual (Online document), Thinking Machines Corp. Version 2.2.1-2.

    Google Scholar 

  3. C*: C-star Reference Manual (Online document), Thinking Machines Corp. Version 7.1.

    Google Scholar 

  4. G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kramer and C. Tseng, Fortran-D Language Specification, Technical Report, Computer TR90-141, Rice University, 1990.

    Google Scholar 

  5. High Performance Fortran Language Specification, High Performance Fortran Forum Version 1.0 (May 1993).

    Google Scholar 

  6. U. Kremer, J. Mellor-Crummey, K. Kennedy and A. Carle, Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment, Technical Report CRPC-TR93-298-S, Rice University,.

    Google Scholar 

  7. A. Rogers and K. Pingali, Process Decomposition Through Locality of Reference, Proc. of the 1989 Conf. on Programming Language Design and Implementation, Portland, Oregon, June 1989, 69–80.

    Google Scholar 

  8. U. Kremer, NP-Completeness of Dynamic Remapping, Proceedings of the Fourth International Workshop on Compilers for Parallel Computers, December 1993, 135–141.

    Google Scholar 

  9. L. D. Whitley, Foundations of Genetic Algorithms, M. Kaufmann Publishers, San Mateo, California, 1993.

    Google Scholar 

  10. D. S. Johnson, C. R. Aragon, L. A. McGeoch and C. Schevon, Optimization by Simulated Annealing: An Experimental Evaluation, Operations Research 39, 3 (May–June 1991), 378–406.

    Google Scholar 

  11. J. R. Evans and E. Minieka, Optimization Algorithms for Networks and Graphs, M. Dekker, New York, 1992.

    Google Scholar 

  12. B. H. McCormick, T. A. DeFanti and M. D. Brown, Visualization in Scientific Computing, Computer Graphics 21, 6 (November 1987).

    Google Scholar 

  13. J. R. Larus and T. Ball, Rewriting Executable Files to Measure Program Behavior, Software-Practice & Experience 24, 2 (Feb, 1994), 197–218.

    Google Scholar 

  14. J. K. Hollingsworth, B. P. Miller and J. Cargille, Dynamic Program Instrumentation for Scalable Performance Tools, 1994 Scalable High-Performance Computing Conf., Knoxville, Tenn., 1994.

    Google Scholar 

  15. B. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell Systems Technical Journal 49 (1970), 291–307.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keshav Pingali Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kunchithapadam, K., Miller, B.P. (1995). Optimizing array distributions in data-parallel programs. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025897

Download citation

  • DOI: https://doi.org/10.1007/BFb0025897

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58868-9

  • Online ISBN: 978-3-540-49134-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics