Abstract
This paper presents an overview of a parallelizing compiler to automatically generate efficient code for large-scale parallel architectures from sequential input programs. This research focuses on loop-level parallelism in dense matrix computations. We illustrate the basic techniques the compiler uses by describing the entire compilation process for a simple example.
Our compiler is organized into three major phases: analyzing array references, allocating the computation and data to the processors to optimize parallelism and locality, and generating code.
An optimizing compiler for scalable parallel machines requires more sophisticated program analysis than the traditional data dependence analysis. Our compiler uses a precise data-flow analysis technique to identify the producer of the value read by each instance of a read access. In order to allocate the computation and data to the processors, the compiler first transforms the program to expose loop-level parallelism in the computation. It then finds a decomposition of the computation and data such that parallelism is exploited and the communication overhead is minimized. The compiler will trade off extra degrees of parallelism to reduce or eliminate communication. Finally, the compiler generates code to manage the multiple address spaces and to communicate data across processors.
This research was supported in part by DARPA contracts N00039-91-C-0138 and DABT63-91-K-0003, an NSF Young Investigator Award and fellowships from Digital Equipment Corporation's Western Research Laboratory and Intel Corporation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Scientific Programming, 1(1):31–50, Fall 1992.
P. Feautrier. Parametric integer programming. Technical Report 209, Laboratoire Methodologie and Architecture Des Systemes Informatiques, January 1988.
Paul Feautrier. Array expansion. In International Conference on Supercomputing, pages 429–442, July 1988.
Paul Feautrier. Dataflow analysis of scalar and array references. Journal of Parallel and Distributed Computing, 20(1):23–53, February 1991.
High Performance Fortran Forum. High Performance Fortran Language Specification, January 1993. Draft Version 1.0.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66–80, August 1992.
Intel Corporation, Santa Clara, CA. iPSC/2 and iPSC/860 User's Guide, June 1990.
F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation, pages 319–329, January 1988.
C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory architectures. In Proceedings of the Second ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 177–186, March 1990.
D. Lenoski, K. Gharachorloo, J. Laudon, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63–79, March 1992.
D. E. Maydan. Accurate Analysis of Array References. PhD thesis, Stanford University, September 1992. Published as CSL-TR-92-547.
D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array data flow analysis and its use in array privatization. In Proc. 20th Annual ACM Symposium on Principles of Programming Languages, January 1993.
A. Rogers and K. Pingali. Compiling for locality. In Proceedings of the 1990 International Conference on Parallel Processing, pages 142–146, June 1990.
C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Published as Rice COMP TR93-199.
P.-S. Tseng. A Parallelizing Compiler for Distributed Memory Parallel Computers. PhD thesis, Carnegie Mellon University, May 1989. Published as CMU-CS-89-148.
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30–44, June 1991.
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Transactions on Parallel and Distributed Systems, 2(4):452–470, October 1991.
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.
H. P. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD / SIMD parallelization. Parallel Computing, 6(1):1–18, January 1988.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amarasinghe, S.P., Anderson, J.M., Lam, M.S., Lim, A.W. (1994). An overview of a compiler for scalable parallel machines. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_15
Download citation
DOI: https://doi.org/10.1007/3-540-57659-2_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57659-4
Online ISBN: 978-3-540-48308-3
eBook Packages: Springer Book Archive