Skip to main content

Fortran RED — A Retargetable Environment for Automatic Data Layout

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1656))

Abstract

The proliferation of parallel platforms over the last ten years has been dramatic. Parallel platforms come in different flavors, including desk-top multiprocessor PCs and workstations with a few processors, networks of PCs and workstations, and supercomputers with hundreds of processors or more. This diverse collection of parallel platforms provide not only computing cycles, but other important resources for scientific computing as well, such as large amounts of main memory and fast I/O capabilities. As a result of the proliferation of parallel platforms, the “typical profile” of a potential user of such systems has changed considerably. The specialist user who has a good understanding of the complexities of the target parallel system has been replaced by a user who is largely unfamiliar with the underlying system characteristics. While the specialist’s main concern is peak performance, the non-specialist user may be willing to trade off performance for ease of programming.

Recent languages such as High Performance Fortran (HPF) and SGI Parallel Fortran are a significant step towards making parallel platforms truly usable for a broadening user community. However, non-trivial user input is required to produce efficient parallel programs. The main challenge for a user is to understand the performance implications of a specified data layout, which requires knowledge about issues such as code generation and analysis strategies of the HPF compiler and its node compiler, and the performance characteristics of the target architecture. This paper discusses our preliminary experiences with the design and implementation of Fortran RED, a tool that supports Fortran as a deterministic, sequential programming model on different parallel target systems. The tool is not part of a compiler. Fortran RED uses HPF as its intermediate program representation since the language is portable across many parallel platforms, and commercial and research HPF compilers are widely available. Fortran RED is able to support different target HPF compilers and target architectures, and allows multi-dimensional distributions in addition to dynamic remapping. This paper focuses on the discussion of the performance prediction component of the tool and reports preliminary results for a single scientific kernel on two target systems, namely PGI’s and IBM’s HPF compilers with IBM’s SP-2 as the target architecture.

This research was supported by DARPA contract DABT 63-93-C-0064 and experiments were conducted using resources at the Cornell Theory Center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Adve, A. Carle, E. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, C-W. Tseng, and S. Warren. Requirements for data-parallel programming environments. IEEE Parallel and Distributed Technology, 2(3):48–58, 1994.

    Article  Google Scholar 

  2. J. Anderson. Automatic Computation and Data Decomposition for Multiprocessors. PhD thesis, Stanford University, March 1997.

    Google Scholar 

  3. J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, pages 112–125, 1993.

    Google Scholar 

  4. V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proceedings of the 5th Distributed Memory Computing Conference, pages 1160–1170, 1990.

    Google Scholar 

  5. V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 213–223, 1991.

    Google Scholar 

  6. D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodghill. Solving alignment using elementary linear algebra. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.

    Google Scholar 

  7. D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.

    Google Scholar 

  8. S. Chatterjee, J. R. Gilbert, R. Schreiber, and T. Sheffler. Array distribution in data-parallel programs. In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.

    Google Scholar 

  9. S. Chatterjee, J. R. Gilbert, R. Schreiber, and S-H. Teng. Automatic array alignment in data-parallel programs. In Proceedings of the Twentieth Annual ACM Symposium on the Principles of Programming Languages, pages 16–28, 1993.

    Google Scholar 

  10. T. Fahringer and H.P. Zima. A static parameter based performance prediction tool for parallel programs. In Proceedings of the 1993 ACM International Conference on Supercomputing, Tokyo, Japan, July 1993.

    Google Scholar 

  11. J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing-LNCS, Springer Verlag, 589:328–343, 1991.

    Google Scholar 

  12. G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors. Prentice-Hall, Englewood Cliffs, NJ, 1988.

    Google Scholar 

  13. J. Garcia. Automatic Data Distribution for Massively Parallel Processors. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, April 1997.

    Google Scholar 

  14. J. Garcia, E. Ayguadé, and J. Labarta. Dynamic data distribution with control flow analysis. In Proceedings of Supercomputing’ 96, 1996.

    Google Scholar 

  15. M. Gerndt. Updating distributed variables in local computations. Concurrency—Practice & Experience, 2(3):171–193, September 1990.

    Article  Google Scholar 

  16. M. Gupta. Automatic Data Partitioning on Distributed Memory Multicomputers. PhD thesis, University of Illinois at Urbana-Champaign, September 1992.

    Google Scholar 

  17. P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):300–360, July 1991.

    Article  Google Scholar 

  18. J. Hennessy and D. Patterson. Computer Architecture A Quantitative Approach (2nd edition). Morgan Kaufmann Publishers, San Mateo, CA, 1996.

    MATH  Google Scholar 

  19. High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1–170, 1993. 149

    Google Scholar 

  20. D. Hudak and S. Abraham. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proceedings of the 1990 ACM International Conference on Supercomputing, pages 187–200, 1990.

    Google Scholar 

  21. W. Kelly and W. Pugh. Minimizing communication while preserving parallelism. In Proceedings of the 1996 ACM International Conference on Supercomputing, pages 52–60, Philadelphia, PA, May 1996.

    Google Scholar 

  22. K. Kennedy and U. Kremer. Automatic data lay out for distributed memory machines. ACM Transactions on Programming Languages and Systems, 20(4):869–916, July 1998.

    Article  Google Scholar 

  23. K. Knobe, J. D. Lukas, and W.J. Dally. Dynamic alignment on distributed memory systems. In Proceedings of the Third Workshop on Compilers for Parallel Computers, Vienna, Austria, July 1992.

    Google Scholar 

  24. P. Lee and T-B. Tsai. Compiling efficient programs for tightly-coupled distributed memory computers. In Proceedings of the 1993 International Conference on Parallel Processing, pages II161–II165, St. Charles, IL, August 1993.

    Google Scholar 

  25. J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In Frontiers90: The 3rd Symposium on the Frontiers of Massively Parallel Computation, College Park, MD, October 1990.

    Google Scholar 

  26. J. Mellor-Crummey, V. Adve, and C. Koelbel. The Compiler’s Role in Analysis and Tuning of Data-Parallel Programs. In Proceedings of The Second Workshop on environments and Tools for Parallel Scientific Computing, pages 211–220, 1994.

    Google Scholar 

  27. K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.

    Article  Google Scholar 

  28. Q. Ning, V. V. Dongen, and G. R. Gao. Automatic data and computation decomposition for distributed memory machines. In Proceedings of the 28th Annual Hawaii International Conference on System Sciences, Maui, Hawaii, January 1995.

    Google Scholar 

  29. D. B. Noonburg and J. P. Shen. Theoretical modeling of superscalar processor performance. In Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, CA, December 1994.

    Google Scholar 

  30. D. Palermo. Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Multicomputers. PhD thesis, University of Illinois at Urbana-Champaign, June 1996. Available as CRHC-96-09.

    Google Scholar 

  31. D. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Technical Report CRHC-95-09, University of Illinois at Urbana-Champaign, April 1995.

    Google Scholar 

  32. D. Palermo, E. Su, J. A. Chandy, and P. Banerjee. Communication optimizations used in the PARADIGM compiler for distributed-memory multicomputers. In Proceedings of the 1994 International Conference on Parallel Processing, 1994.

    Google Scholar 

  33. M. Parashar, S. Hariri, H. Haupt, and G. Fox. Interpreting the performance of HPF/Fortran90D. In Proceedings of Supercomputing’ 94, 1994.

    Google Scholar 

  34. J. Ramanujam and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proceedings of Supercomputing’ 89, pages 637–646, Reno, NV, November 1989.

    Google Scholar 

  35. R. H. Saavedra and A. J. Smith. Performance characterization of optimizing compilers. IEEE Transactions on Software Engineering, 21(7):615–628, July 1995.

    Article  Google Scholar 

  36. R. H. Saavedra-Barrera. CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking. PhD thesis, U.C. Berkeley, February 1992. UCB/CSD-92-684.

    Google Scholar 

  37. T. Sheffer, R. Schreiber, W. Pugh, J. R. Gilbert, and S. Chatterjee. Efficient distribution analysis via graph contraction. International Journal of Parallel Programming, 24(6):599–620, December 1996.

    Google Scholar 

  38. Silicon Graphics Inc. F77 User’s Manual: Chapter 6 — Parallel Programming on Origin2000, 1997.

    Google Scholar 

  39. C-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Rice COMP TR93-199.

    Google Scholar 

  40. K-Y. Wang. Precise compile-time performance prediction for superscalar-based computers. In Proceedings of the SIGPLAN’ 94 Conference on Programming Language Design and Implementation, Orlando, FL, June 1994.

    Google Scholar 

  41. D. Wood and M. Hill. Cost–effective parallel computing. IEEE Computer, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kremer, U. (1999). Fortran RED — A Retargetable Environment for Automatic Data Layout. In: Chatterjee, S., et al. Languages and Compilers for Parallel Computing. LCPC 1998. Lecture Notes in Computer Science, vol 1656. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48319-5_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-48319-5_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66426-0

  • Online ISBN: 978-3-540-48319-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics