Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach

  • Peter Thoman
  • Herbert Jordan
  • Simone Pellegrini
  • Thomas Fahringer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7312)


The scheduling of parallel loops in OpenMP has been a research topic for over a decade. While many methods have been proposed, most focus on adapting the loop schedule purely at runtime, and without regard for the overall system state. We present a fully automatic loop scheduling policy that can adapt to both the characteristics of the input program as well as the current runtime behaviour of the system, including external load. Using state of the art polyhedral compiler analysis, we generate effort estimation functions that are then used by the runtime system to derive the optimal loop schedule for a given loop, work group size, iteration range and system state. We demonstrate performance improvements of up to 82% compared to default scheduling in an unloaded scenario, and up to 471% in a scenario with external load. We further show that even in the worst case, the results achieved by our automated system stay within 3% of the performance of a manually tuned strategy.


Schedule Policy Runtime System Chunk Size Parallel Loop Polyhedral Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OpenMP Architecture Review Board: OpenMP Application Program Interface. Version 3.1 (July 2011)Google Scholar
  2. 2.
    The Insieme Compiler Project,
  3. 3.
    GOMP – An OpenMP implementation for GCC,
  4. 4.
    Bailey, D., Barton, J., Lasinski, T., Simon, H.: The NAS Parallel Benchmarks. NAS Technical Report RNR-91-002, NASA Ames Research Center, Moffett Field, CA (1991)Google Scholar
  5. 5.
    Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP Task Scheduling Strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Knafla, B., Leopold, C.: Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP. In: Proc. Parallel Computing (ParCo), pp. 219–226 (2007)Google Scholar
  7. 7.
    Bastoul, C.: Improving Data Locality in Static Control Programs. PhD thesis, University Paris 6, Pierre et Marie Curie, France (2004)Google Scholar
  8. 8.
    Trifunovic, K., Cohen, A., et al.: GRAPHITE Two Years After: First Lessons Learned From Real-World Polyhedral Compilation. In: GCC Research Opportunities Workshop (GROW) (2010)Google Scholar
  9. 9.
    Wang, Z., O’Boyle, M.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) (2009)Google Scholar
  10. 10.
    Zhang, Y., Burcea, M., Cheng, V., Ho, R., Voss, M.: An Adaptive OpenMP Loop Scheduler for Hyperthreaded SMPs. In: Proc. of PDCS 2004: International Conference on Parallel and Distributed Computing Systems (2004)Google Scholar
  11. 11.
    Tzen, T., Tzen, T.H., Ni, L., Ni, L.M.: Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Transactions on Parallel and Distributed Systems (1993)Google Scholar
  12. 12.
    Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the Schedule Clause Really Necessary in OpenMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–160. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Bondhugula, U., Ramanujam, J., et al.: PLuTo: A practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI) (2008)Google Scholar
  14. 14.
    Baskaran, M., Vydyanathan, N., Bondhugula, U., Ramanujam, J., Rountev, A., Sadayappan, P.: Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) (2009)Google Scholar
  15. 15.
    Basupalli, V., Yuki, T., Rajopadhye, S., Morvan, A., Derrien, S., Quinton, P., Wonnacott, D.: ompVerify: Polyhedral Analysis for the OpenMP Programmer. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 37–53. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., Bastoul, C.: The Polyhedral Model Is More Widely Applicable Than You Think. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 283–303. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Verdoolaege, S.: barvinok: User Guide,
  18. 18.
    Somenzi, F.: CUDD: CU Decision Diagram Package,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Peter Thoman
    • 1
  • Herbert Jordan
    • 1
  • Simone Pellegrini
    • 1
  • Thomas Fahringer
    • 1
  1. 1.Distributed and Parallel Systems GroupUniversity of InnsbruckInnsbruckAustria

Personalised recommendations