Abstract
The proliferation of parallel platforms over the last ten years has been dramatic. Parallel platforms come in different flavors, including desk-top multiprocessor PCs and workstations with a few processors, networks of PCs and workstations, and supercomputers with hundreds of processors or more. This diverse collection of parallel platforms provide not only computing cycles, but other important resources for scientific computing as well, such as large amounts of main memory and fast I/O capabilities. As a result of the proliferation of parallel platforms, the “typical profile” of a potential user of such systems has changed considerably. The specialist user who has a good understanding of the complexities of the target parallel system has been replaced by a user who is largely unfamiliar with the underlying system characteristics. While the specialist’s main concern is peak performance, the non-specialist user may be willing to trade off performance for ease of programming.
Recent languages such as High Performance Fortran (HPF) and SGI Parallel Fortran are a significant step towards making parallel platforms truly usable for a broadening user community. However, non-trivial user input is required to produce efficient parallel programs. The main challenge for a user is to understand the performance implications of a specified data layout, which requires knowledge about issues such as code generation and analysis strategies of the HPF compiler and its node compiler, and the performance characteristics of the target architecture. This paper discusses our preliminary experiences with the design and implementation of Fortran RED, a tool that supports Fortran as a deterministic, sequential programming model on different parallel target systems. The tool is not part of a compiler. Fortran RED uses HPF as its intermediate program representation since the language is portable across many parallel platforms, and commercial and research HPF compilers are widely available. Fortran RED is able to support different target HPF compilers and target architectures, and allows multi-dimensional distributions in addition to dynamic remapping. This paper focuses on the discussion of the performance prediction component of the tool and reports preliminary results for a single scientific kernel on two target systems, namely PGI’s and IBM’s HPF compilers with IBM’s SP-2 as the target architecture.
This research was supported by DARPA contract DABT 63-93-C-0064 and experiments were conducted using resources at the Cornell Theory Center.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
V. Adve, A. Carle, E. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, C-W. Tseng, and S. Warren. Requirements for data-parallel programming environments. IEEE Parallel and Distributed Technology, 2(3):48–58, 1994.
J. Anderson. Automatic Computation and Data Decomposition for Multiprocessors. PhD thesis, Stanford University, March 1997.
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, pages 112–125, 1993.
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proceedings of the 5th Distributed Memory Computing Conference, pages 1160–1170, 1990.
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 213–223, 1991.
D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodghill. Solving alignment using elementary linear algebra. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.
D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.
S. Chatterjee, J. R. Gilbert, R. Schreiber, and T. Sheffler. Array distribution in data-parallel programs. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.
S. Chatterjee, J. R. Gilbert, R. Schreiber, and S-H. Teng. Automatic array alignment in data-parallel programs. In Proceedings of the Twentieth Annual ACM Symposium on the Principles of Programming Languages, pages 16–28, 1993.
T. Fahringer and H.P. Zima. A static parameter based performance prediction tool for parallel programs. In Proceedings of the 1993 ACM International Conference on Supercomputing, Tokyo, Japan, July 1993.
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing-LNCS, Springer Verlag, 589:328–343, 1991.
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors. Prentice-Hall, Englewood Cliffs, NJ, 1988.
J. Garcia. Automatic Data Distribution for Massively Parallel Processors. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, April 1997.
J. Garcia, E. Ayguadé, and J. Labarta. Dynamic data distribution with control flow analysis. In Proceedings of Supercomputing’ 96, 1996.
M. Gerndt. Updating distributed variables in local computations. Concurrency—Practice & Experience, 2(3):171–193, September 1990.
M. Gupta. Automatic Data Partitioning on Distributed Memory Multicomputers. PhD thesis, University of Illinois at Urbana-Champaign, September 1992.
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):300–360, July 1991.
J. Hennessy and D. Patterson. Computer Architecture A Quantitative Approach (2nd edition). Morgan Kaufmann Publishers, San Mateo, CA, 1996.
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1–170, 1993. 149
D. Hudak and S. Abraham. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proceedings of the 1990 ACM International Conference on Supercomputing, pages 187–200, 1990.
W. Kelly and W. Pugh. Minimizing communication while preserving parallelism. In Proceedings of the 1996 ACM International Conference on Supercomputing, pages 52–60, Philadelphia, PA, May 1996.
K. Kennedy and U. Kremer. Automatic data lay out for distributed memory machines. ACM Transactions on Programming Languages and Systems, 20(4):869–916, July 1998.
K. Knobe, J. D. Lukas, and W.J. Dally. Dynamic alignment on distributed memory systems. In Proceedings of the Third Workshop on Compilers for Parallel Computers, Vienna, Austria, July 1992.
P. Lee and T-B. Tsai. Compiling efficient programs for tightly-coupled distributed memory computers. In Proceedings of the 1993 International Conference on Parallel Processing, pages II161–II165, St. Charles, IL, August 1993.
J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In Frontiers90: The 3rd Symposium on the Frontiers of Massively Parallel Computation, College Park, MD, October 1990.
J. Mellor-Crummey, V. Adve, and C. Koelbel. The Compiler’s Role in Analysis and Tuning of Data-Parallel Programs. In Proceedings of The Second Workshop on environments and Tools for Parallel Scientific Computing, pages 211–220, 1994.
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.
Q. Ning, V. V. Dongen, and G. R. Gao. Automatic data and computation decomposition for distributed memory machines. In Proceedings of the 28th Annual Hawaii International Conference on System Sciences, Maui, Hawaii, January 1995.
D. B. Noonburg and J. P. Shen. Theoretical modeling of superscalar processor performance. In Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, CA, December 1994.
D. Palermo. Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Multicomputers. PhD thesis, University of Illinois at Urbana-Champaign, June 1996. Available as CRHC-96-09.
D. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Technical Report CRHC-95-09, University of Illinois at Urbana-Champaign, April 1995.
D. Palermo, E. Su, J. A. Chandy, and P. Banerjee. Communication optimizations used in the PARADIGM compiler for distributed-memory multicomputers. In Proceedings of the 1994 International Conference on Parallel Processing, 1994.
M. Parashar, S. Hariri, H. Haupt, and G. Fox. Interpreting the performance of HPF/Fortran90D. In Proceedings of Supercomputing’ 94, 1994.
J. Ramanujam and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proceedings of Supercomputing’ 89, pages 637–646, Reno, NV, November 1989.
R. H. Saavedra and A. J. Smith. Performance characterization of optimizing compilers. IEEE Transactions on Software Engineering, 21(7):615–628, July 1995.
R. H. Saavedra-Barrera. CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking. PhD thesis, U.C. Berkeley, February 1992. UCB/CSD-92-684.
T. Sheffer, R. Schreiber, W. Pugh, J. R. Gilbert, and S. Chatterjee. Efficient distribution analysis via graph contraction. International Journal of Parallel Programming, 24(6):599–620, December 1996.
Silicon Graphics Inc. F77 User’s Manual: Chapter 6 — Parallel Programming on Origin2000, 1997.
C-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Rice COMP TR93-199.
K-Y. Wang. Precise compile-time performance prediction for superscalar-based computers. In Proceedings of the SIGPLAN’ 94 Conference on Programming Language Design and Implementation, Orlando, FL, June 1994.
D. Wood and M. Hill. Cost–effective parallel computing. IEEE Computer, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kremer, U. (1999). Fortran RED — A Retargetable Environment for Automatic Data Layout. In: Chatterjee, S., et al. Languages and Compilers for Parallel Computing. LCPC 1998. Lecture Notes in Computer Science, vol 1656. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48319-5_10
Download citation
DOI: https://doi.org/10.1007/3-540-48319-5_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66426-0
Online ISBN: 978-3-540-48319-9
eBook Packages: Springer Book Archive