Optimizing for a multiprocessor: Balancing synchronization costs against parallelism in straight-line code
This paper reports on the status of a research project to develop compiler techniques to optimize programs for execution on an asynchronous multiprocessor. We adopt a simplified model of a multiprocessor, consisting of several identical processors, all sharing access to a common memory. Synchronization must be done explicitly, using two special operations that take a period of time comparable to the cost of data operations. Our treatment differs from other attempts to generate code for such machines because we treat the necessary synchronization overhead as an integral part of the cost of a parallel code sequence. We are particularly interested in heuristics that can be used to generate good code sequences, and local optimizations that can then be applied to improve them. Our current efforts are concentrated on generating straight-line code for high-level, algebraic languages.
We compare the code generated by two heuristics, and observe how local optimization schemes can gradually improve its quality. We are implementing our techniques in an experimental compiler that will generate code for Cm*, a real multiprocessor, having several characteristics of our model computer.
KeywordsBasic Block Dependency Graph Execution Cost Initial Allocation Parallel Code
Unable to display preview. Download preview PDF.
- S. J. Allan and A. E. Oldehoeft. A Flow Analysis Procedure for the Translation of High Level Languages to a Data Flow Language. In Oscar N. Garcia (editor), Proceedings of the 1979 International Conference on Parallel Processing, pages 26–34. IEEE Computer Society, Long Beach, California, 1979.Google Scholar
- U. Banerjee, S. C. Chen, D. J. Kuck, and R. A. Towle. Time and Parallel Processor Bounds for Fortran-Like Loops. IEEE Transactions on Computers C-28(9):660–670, September, 1979.Google Scholar
- G. Baudet. Asynchronous Iterative Methods for Multiprocessors. Technical Report, Department of Computer Science, Carnegie-Mellon University, 1976.Google Scholar
- A. J. Bernstein. Analysis of Programs for Parallel Processing. IEEE Transactions on Electronic Computers EC-15(5):757–763, October, 1966.Google Scholar
- R. P. Brent. The Parallel Evaluation of General Arithmetic Expressions. Journal of the ACM 21(2):201–206, April, 1974.Google Scholar
- A. J. Catto and J. R. Gurd. Resource Management in Dataflow. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pages 77–84. Association for Computing Machinery, 1981.Google Scholar
- M. J. Gonzalez Jr. and C. V. Ramamoorthy. Parallel Task Execution in a Decentralized System. IEEE Transactions on Computers C-21(12):1310–1322, December, 1972.Google Scholar
- M. S. Hecht. Programming Language Series: Flow Analysis of Computer Programs. Elsevier, New York, New York, 1977.Google Scholar
- A. K. Jones and E. F. Gehringer. The Cm* Multiprocessor Project: A Research Review. Technical Report, Department of Computer Science, Carnegie-Mellon University, July, 1980.Google Scholar
- D. J. Kuck, Y. Muraoka, and S. C. Chen. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup. IEEE Transactions on Computers C-21(12):1293–1310, December, 1972.Google Scholar