Implementation Issues of Loop-Level Speculative Run-Time Parallelization

  • Devang Patel
  • Lawrence Rauchwerger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1575)


Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We advocate a novel framework for the identification of parallel loops. It speculatively executes a loop as a doall and applies a fully parallel data dependence test to check for any unsatisfied data dependencies; if the test fails, then the loop is re-executed serially. We will present the principles of the design and implementation of a compiler that employs both run-time and static techniques to parallelize dynamic applications. Run-time optimizations always represent a tradeoff between a speculated potential benefit and a certain (sure) overhead that must be paid. We will introduce techniques that take advantage of classic compiler methods to reduce the cost of run-time optimization thus tilting the outcome of speculation in favor of significant performance gains. Experimental results from the PERFECT, SPEC and NCSA Benchmark suites show that these techniques yield speedups not obtainable by any other known method.


Data Dependence Access Pattern Parallel Loop Speculative Execution Loop Parallelization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Charmm: A program for macromolecular energy, minimization, and dynamics calculations. J. of Computational Chemistry   4(6) (1983)Google Scholar
  2. 2.
    Abraham, S.: Private Communication. In: Hewlett Packard Laboratories (1994)Google Scholar
  3. 3.
    Banerjee, U.: Loop Parallelization. Kluwer Publishers, Norwell (1994)zbMATHGoogle Scholar
  4. 4.
    Berryman, H., Saltz, J.: A manual for PARTI runtime primitives. Interim Report 90-13, ICASE (1990)Google Scholar
  5. 5.
    Blume, W., et al.: Advanced Program Restructuring for High-Performance Computerswith Polaris. IEEE Computer 29(12), 78–82 (1996)Google Scholar
  6. 6.
    Blume, W., Eigenmann, R.: Performance Analysis of Parallelizing Compilers on the Perfect BenchmarksTM Programs. IEEE Trans. on Parallel and Distributed Systems 3(6), 643–656 (1992)CrossRefGoogle Scholar
  7. 7.
    Blume, W., et al.: Effective automatic parallelization with Polaris. In: IJPP (May 1995)Google Scholar
  8. 8.
    Blume, W., et al.: Polaris: The next generation in parallelizing compilers. In: Proc. of the 7-th Workshop on Languages and Compilers for Parallel Computing (1994)Google Scholar
  9. 9.
    Cooper, K., et al.: The parascope parallel programming environment. Proc. of IEEE, 84–89 (February 1993)Google Scholar
  10. 10.
    Hall, M., et al.: Maximizing multiprocessor performance with the Suif compiler. IEEE Computer  29(12), pp. 84–89 (1996)Google Scholar
  11. 11.
    Lawrence, T.: Implementation of run time techniques in the polaris fortran restructurer. TR 1501, CSRD, Univ. of Illinois at Urbana-Champaign (July 1995)Google Scholar
  12. 12.
    Leung, S., Zahorjan, J.: Improving the performance of runtime parallelization. In: 4th PPOPP, May 1993, pp. 83–91 (1993)Google Scholar
  13. 13.
    Li, Z.: Array privatization for parallel execution of loops. In: Proceedings of the 19th International Symposium on Computer Architecture, pp. 313–322 (1992)Google Scholar
  14. 14.
    Frisch, M.J., et al.: Gaussian 1994. Gaussian, Inc., Pittsburgh (1995)Google Scholar
  15. 15.
    Maydan, D.E., Amarasinghe, S.P., Lam, M.S.: Data dependence and dataflow analysis of arrays. In: Proc. 5th Workshop on Programming Languages and Compilers for Parallel Computing (August 1992)Google Scholar
  16. 16.
    Nagel, L.: SPICE2: A Computer Program to Simulate Semiconductor Circuits. PhD thesis, University of California (May 1975)Google Scholar
  17. 17.
    Paek, Y., Hoeflinger, J., Padua, D.: Simplification of Array Access Patterns for Compiler Optimizat ions. In: Proc. of the SIGPLAN 1998 Conf. on Programming Language Design and Implementation, Montreal, Canada (June 1998)Google Scholar
  18. 18.
    Patel, D., Rauchwerger, L.: Principles of speculative run–time parallelization. In: Proceedings 11th Annual Workshop on Programming Languages and Compilers for Parallel Computing, August 1998, pp. 330–351 (1998)Google Scholar
  19. 19.
    Polychronopoulos, C., et al.: Parafrase-2: A New Generation Parallelizing Compiler. In: Proc. of 1989 Int. Conf. on Parallel Processing, St. Charles, IL, vol. II, pp. 39–48 (1989)Google Scholar
  20. 20.
    Pugh, W.: A practical algorithm for exact array dependence analysis. Comm. of the ACM 35(8), 102–114 (1992)CrossRefGoogle Scholar
  21. 21.
    Rauchwerger, L., Amato, N., Padua, D.: A scalable method for run-time loop parallelization. In: IJPP, July 1995, vol. 26(6), pp. 537–576 (1995)Google Scholar
  22. 22.
    Rauchwerger, L.: Run–time parallelization: A framework for parallel computation. In: UIUCDCS-R-95-1926, Univ. of Illinois, Urbana, IL (September 1995)Google Scholar
  23. 23.
    Rauchwerger, L., Padua, D.: Parallelizing WHILE Loops for Multiprocessor Systems. In: Proc. of 9th International Parallel Processing Symposium (April 1995)Google Scholar
  24. 24.
    Rauchwerger, L., Padua, D.: The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In: Proc. of the SIGPLAN 1995 Conf. on Programming Language Design and Implementation, La Jolla, CA, June 1995, pp. 218–232 (1995)Google Scholar
  25. 25.
    Saltz, J., Mirchandaney, R., Crowley, K.: Run-time parallelization and scheduling of loops. IEEE Trans. Comput. 40(5) (May 1991)Google Scholar
  26. 26.
    Tu, P., Padua, D.: Array privatization for shared and distributed memory machines. In: Proc. 2nd Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Machines (September 1992)Google Scholar
  27. 27.
    Whirley. R., Engelmann. B.: DYNA3D: A Nonlinear, Explicit, Three- Dimensional Finite Element Code For Solid and Structural Mechanics. Lawrence Livermore National Laboratory (November 1993)Google Scholar
  28. 28.
    Zhu, C., Yew, P.C.: A scheme to enforce data dependence on large multiprocessor systems. IEEE Trans. Softw. Eng. 13(6), 726–739 (1987)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Devang Patel
    • 1
  • Lawrence Rauchwerger
    • 1
  1. 1.Dept. of Computer ScienceTexas A&M University College Station

Personalised recommendations