Fortran RED — A Retargetable Environment for Automatic Data Layout

Kremer, Ulrich

doi:10.1007/3-540-48319-5_10

Ulrich Kremer⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1656))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

268 Accesses
1 Citations

Abstract

The proliferation of parallel platforms over the last ten years has been dramatic. Parallel platforms come in different flavors, including desk-top multiprocessor PCs and workstations with a few processors, networks of PCs and workstations, and supercomputers with hundreds of processors or more. This diverse collection of parallel platforms provide not only computing cycles, but other important resources for scientific computing as well, such as large amounts of main memory and fast I/O capabilities. As a result of the proliferation of parallel platforms, the “typical profile” of a potential user of such systems has changed considerably. The specialist user who has a good understanding of the complexities of the target parallel system has been replaced by a user who is largely unfamiliar with the underlying system characteristics. While the specialist’s main concern is peak performance, the non-specialist user may be willing to trade off performance for ease of programming.

Recent languages such as High Performance Fortran (HPF) and SGI Parallel Fortran are a significant step towards making parallel platforms truly usable for a broadening user community. However, non-trivial user input is required to produce efficient parallel programs. The main challenge for a user is to understand the performance implications of a specified data layout, which requires knowledge about issues such as code generation and analysis strategies of the HPF compiler and its node compiler, and the performance characteristics of the target architecture. This paper discusses our preliminary experiences with the design and implementation of Fortran RED, a tool that supports Fortran as a deterministic, sequential programming model on different parallel target systems. The tool is not part of a compiler. Fortran RED uses HPF as its intermediate program representation since the language is portable across many parallel platforms, and commercial and research HPF compilers are widely available. Fortran RED is able to support different target HPF compilers and target architectures, and allows multi-dimensional distributions in addition to dynamic remapping. This paper focuses on the discussion of the performance prediction component of the tool and reports preliminary results for a single scientific kernel on two target systems, namely PGI’s and IBM’s HPF compilers with IBM’s SP-2 as the target architecture.

This research was supported by DARPA contract DABT 63-93-C-0064 and experiments were conducted using resources at the Cornell Theory Center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

V. Adve, A. Carle, E. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, C-W. Tseng, and S. Warren. Requirements for data-parallel programming environments. IEEE Parallel and Distributed Technology, 2(3):48–58, 1994.
Article Google Scholar
J. Anderson. Automatic Computation and Data Decomposition for Multiprocessors. PhD thesis, Stanford University, March 1997.
Google Scholar
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, pages 112–125, 1993.
Google Scholar
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proceedings of the 5th Distributed Memory Computing Conference, pages 1160–1170, 1990.
Google Scholar
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 213–223, 1991.
Google Scholar
D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodghill. Solving alignment using elementary linear algebra. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.
Google Scholar
D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.
Google Scholar
S. Chatterjee, J. R. Gilbert, R. Schreiber, and T. Sheffler. Array distribution in data-parallel programs. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.
Google Scholar
S. Chatterjee, J. R. Gilbert, R. Schreiber, and S-H. Teng. Automatic array alignment in data-parallel programs. In Proceedings of the Twentieth Annual ACM Symposium on the Principles of Programming Languages, pages 16–28, 1993.
Google Scholar
T. Fahringer and H.P. Zima. A static parameter based performance prediction tool for parallel programs. In Proceedings of the 1993 ACM International Conference on Supercomputing, Tokyo, Japan, July 1993.
Google Scholar
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing-LNCS, Springer Verlag, 589:328–343, 1991.
Google Scholar
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors. Prentice-Hall, Englewood Cliffs, NJ, 1988.
Google Scholar
J. Garcia. Automatic Data Distribution for Massively Parallel Processors. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, April 1997.
Google Scholar
J. Garcia, E. Ayguadé, and J. Labarta. Dynamic data distribution with control flow analysis. In Proceedings of Supercomputing’ 96, 1996.
Google Scholar
M. Gerndt. Updating distributed variables in local computations. Concurrency—Practice & Experience, 2(3):171–193, September 1990.
Article Google Scholar
M. Gupta. Automatic Data Partitioning on Distributed Memory Multicomputers. PhD thesis, University of Illinois at Urbana-Champaign, September 1992.
Google Scholar
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):300–360, July 1991.
Article Google Scholar
J. Hennessy and D. Patterson. Computer Architecture A Quantitative Approach (2nd edition). Morgan Kaufmann Publishers, San Mateo, CA, 1996.
MATH Google Scholar
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1–170, 1993. 149
Google Scholar
D. Hudak and S. Abraham. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proceedings of the 1990 ACM International Conference on Supercomputing, pages 187–200, 1990.
Google Scholar
W. Kelly and W. Pugh. Minimizing communication while preserving parallelism. In Proceedings of the 1996 ACM International Conference on Supercomputing, pages 52–60, Philadelphia, PA, May 1996.
Google Scholar
K. Kennedy and U. Kremer. Automatic data lay out for distributed memory machines. ACM Transactions on Programming Languages and Systems, 20(4):869–916, July 1998.
Article Google Scholar
K. Knobe, J. D. Lukas, and W.J. Dally. Dynamic alignment on distributed memory systems. In Proceedings of the Third Workshop on Compilers for Parallel Computers, Vienna, Austria, July 1992.
Google Scholar
P. Lee and T-B. Tsai. Compiling efficient programs for tightly-coupled distributed memory computers. In Proceedings of the 1993 International Conference on Parallel Processing, pages II161–II165, St. Charles, IL, August 1993.
Google Scholar
J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In Frontiers90: The 3rd Symposium on the Frontiers of Massively Parallel Computation, College Park, MD, October 1990.
Google Scholar
J. Mellor-Crummey, V. Adve, and C. Koelbel. The Compiler’s Role in Analysis and Tuning of Data-Parallel Programs. In Proceedings of The Second Workshop on environments and Tools for Parallel Scientific Computing, pages 211–220, 1994.
Google Scholar
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.
Article Google Scholar
Q. Ning, V. V. Dongen, and G. R. Gao. Automatic data and computation decomposition for distributed memory machines. In Proceedings of the 28th Annual Hawaii International Conference on System Sciences, Maui, Hawaii, January 1995.
Google Scholar
D. B. Noonburg and J. P. Shen. Theoretical modeling of superscalar processor performance. In Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, CA, December 1994.
Google Scholar
D. Palermo. Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Multicomputers. PhD thesis, University of Illinois at Urbana-Champaign, June 1996. Available as CRHC-96-09.
Google Scholar
D. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Technical Report CRHC-95-09, University of Illinois at Urbana-Champaign, April 1995.
Google Scholar
D. Palermo, E. Su, J. A. Chandy, and P. Banerjee. Communication optimizations used in the PARADIGM compiler for distributed-memory multicomputers. In Proceedings of the 1994 International Conference on Parallel Processing, 1994.
Google Scholar
M. Parashar, S. Hariri, H. Haupt, and G. Fox. Interpreting the performance of HPF/Fortran90D. In Proceedings of Supercomputing’ 94, 1994.
Google Scholar
J. Ramanujam and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proceedings of Supercomputing’ 89, pages 637–646, Reno, NV, November 1989.
Google Scholar
R. H. Saavedra and A. J. Smith. Performance characterization of optimizing compilers. IEEE Transactions on Software Engineering, 21(7):615–628, July 1995.
Article Google Scholar
R. H. Saavedra-Barrera. CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking. PhD thesis, U.C. Berkeley, February 1992. UCB/CSD-92-684.
Google Scholar
T. Sheffer, R. Schreiber, W. Pugh, J. R. Gilbert, and S. Chatterjee. Efficient distribution analysis via graph contraction. International Journal of Parallel Programming, 24(6):599–620, December 1996.
Google Scholar
Silicon Graphics Inc. F77 User’s Manual: Chapter 6 — Parallel Programming on Origin2000, 1997.
Google Scholar
C-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Rice COMP TR93-199.
Google Scholar
K-Y. Wang. Precise compile-time performance prediction for superscalar-based computers. In Proceedings of the SIGPLAN’ 94 Conference on Programming Language Design and Implementation, Orlando, FL, June 1994.
Google Scholar
D. Wood and M. Hill. Cost–effective parallel computing. IEEE Computer, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rutgers University, New Bruswick, NJ, USA
Ulrich Kremer

Authors

Ulrich Kremer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of North Carolina, Chapel Hill, NC, 27599-3175, USA
Siddhartha Chatterjee & Jan F. Prins &
Department of Computer Science and Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Larry Carter & Jeanne Ferrante &
Department of Computer Science, Purdue University, 1398 Computer Science Building, West Lafayette, IN, 47907, USA
Zhiyuan Li
Intel Corporation, 2200 Mission College Boulevard, RN6-18, Santa Clara, CA, 95052, USA
David Sehr
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Pen-Chung Yew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kremer, U. (1999). Fortran RED — A Retargetable Environment for Automatic Data Layout. In: Chatterjee, S., et al. Languages and Compilers for Parallel Computing. LCPC 1998. Lecture Notes in Computer Science, vol 1656. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48319-5_10

Download citation

DOI: https://doi.org/10.1007/3-540-48319-5_10
Published: 12 May 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66426-0
Online ISBN: 978-3-540-48319-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics