International Journal of Parallel Programming

, Volume 47, Issue 5–6, pp 874–906 | Cite as

PolyJIT: Polyhedral Optimization Just in Time

  • Andreas SimbürgerEmail author
  • Sven Apel
  • Armin Größlinger
  • Christian Lengauer


While polyhedral optimization appeared in mainstream compilers during the past decade, its profitability in scenarios outside its classic domain of linear-algebra programs has remained in question. Recent implementations, such as the LLVM plugin Polly, produce promising speedups, but the restriction to affine loop programs with control flow known at compile time continues to be a limiting factor. PolyJIT combines polyhedral optimization with multi-versioning at run time, at which one has access to knowledge enabling polyhedral optimization, which is not available at compile time. By means of a fully-fledged implementation of a light-weight just-in-time compiler and a series of experiments on a selection of real-world and benchmark programs, we demonstrate that the consideration of run-time knowledge helps in tackling compile-time violations of affinity and, consequently, offers new opportunities of optimization at run time.


JIT compilation Loop parallelization Polyhedron model 



All four authors received finanical support by the Deutsche Forschungsgemeinschaft (DFG). The respective projects are PolyJIT (LE 912/14), SafeSPL (AP 206/4) and SafeSPL++ (AP 206/6).


  1. 1.
    Android Developers: Art and Dalvik (2016). Accessed 25 Feb 2018
  2. 2.
    Banerjee, U.: Loop nest parallelization. In: Padua, D., et al. (eds.) Encyclopedia of Parallel Computing, vol. 2, pp. 1068–1079. Springer, Berlin (2011)Google Scholar
  3. 3.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proceedings of 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 7–16. IEEE Computer Society (2004)Google Scholar
  4. 4.
    Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Proceedings of 17th International Conference on Compiler Construction (CC). Springer (2008)Google Scholar
  5. 5.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral program optimization system. In: Proceedings of 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM (2008)Google Scholar
  6. 6.
    Caamaño, J.M.M., Selva, M., Clauss, P., Baloian, A., Wolff, W.: Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code-bones. Concurr. Comput. Pract. Exp. 29(15), 4192:1–4192:16 (2016). (Special Issue on Euro-Par 2016)Google Scholar
  7. 7.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE Computer Society (2009)Google Scholar
  8. 8.
    Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Morgan Kaufmann (2013)Google Scholar
  9. 9.
    Davis, M.: Hilbert’s tenth problem is unsolvable. Am. Math. Mon. 80(3), 233–269 (1973)MathSciNetCrossRefGoogle Scholar
  10. 10.
    FFmpeg Developers: FFmpeg Automated Testing Environment (2016). Accessed 25 Feb 2018
  11. 11.
    Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: International Conference on Parallel Processing (ICPP), pp. 124–131 (2009)Google Scholar
  12. 12.
    Feautrier, P., Lengauer, C.: Polyhedron model. In: Padua, D., et al. (eds.) Encyclopedia of Parallel Computing, vol. 4, pp. 1581–1592. Springer, Berlin (2011)Google Scholar
  13. 13.
    Grosser, T., Cohen, A., Holewinski, J., Sadayappan, P., Verdoolaege, S.: Hybrid hexagonal/classical tiling for GPUs. In: Proceedings of 12th International Symposium on Code Generation and Optimization (CGO). ACM (2014). (Article 66, 10 pp)Google Scholar
  14. 14.
    Grosser, T., Größlinger, A., Lengauer, C.: Polly-Performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters (PPL) 22(4) 1250010:1-1250010:28 (2012)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Grosser, T., Ramanujam, J., Pouchet, L.N., Sadayappan, P., Pop, S.: Optimistic delinearization of parametrically sized arrays. In: Proceedings of 29th ACM International Conference on Supercomputing (ICS), pp. 351–360. ACM (2015)Google Scholar
  16. 16.
    Grosser, T., Zheng, H., Alor, R., Simbürger, A., Größlinger, A., Pouchet, L.N.: Polly—polyhedral optimization in LLVM. In: Alias, C., Bastoul, C. (eds.) Proceedings of First International Workshop on Polyhedral Compilation Techniques (IMPACT). INRIA Grenoble Rhône-Alpes (2011)Google Scholar
  17. 17.
    Größlinger, A.: The challenges of non-linear parameters and variables in automatic loop parallelisation. Doctoral thesis, Department of Computer Science and Mathematics, University of Passau (2009)Google Scholar
  18. 18.
    Hintze, J.L., Nelson, R.D.: Violin plots: a box plot-density trace synergism. Am. Stat. 52(2), 181–184 (1998)Google Scholar
  19. 19.
    Irigoin, F.: Tiling. In: Padua, D., et al. (eds.) Encyclopedia of Parallel Computing, vol. 4, pp. 2041–2049. Springer, Berlin (2011)Google Scholar
  20. 20.
    Jimborean, A.: Adapting the polytope model for dynamic and speculative parallelization. Doctoral thesis, Image Sciences, Computer Sciences and Remote Sensing Laboratory, University of Strasbourg (2012)Google Scholar
  21. 21.
    Jimborean, A., Loechner, V., Clauss, P.: Handling multi-versioning in LLVM: Code tracking and cloning. In: Proceedings of International Workshop on Intermediate Representations (WIR). IEEE Computer Society (2011)Google Scholar
  22. 22.
    Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of Second International Symposium on Code Generation and Optimization (CGO), pp. 75–86. IEEE Computer Society (2004)Google Scholar
  23. 23.
    Mehta, S., Beeraka, G., Yew, P.: Tile size selection revisited. ACM Trans. Archit. Code Optim. (TACO) 10(4), 35:1–35:27 (2013)Google Scholar
  24. 24.
    Paleczny, M., Vick, C., Click, C.: The Java Hotspot server compiler. In: Proceedings of 1st Symposium on Java Virtual Machine Research and Technology (JVM). USENIX Association (2001)Google Scholar
  25. 25.
    Pozo, R., Miller, B.R.: SciMark2 (2017). Accessed 25 Feb 2018
  26. 26.
    Simbürger, A., Apel, S., Größlinger, A., Lengauer, C.: The potential of polyhedral optimization: An empirical study. In: Proceedings of 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 508–518. IEEE Computer Society (2013)Google Scholar
  27. 27.
    Simbürger, A., Größlinger, A.: On the variety of static control parts in real-world programs: from affine via multi-dimensional to polynomial and just-in-time. In: Proceedings of 4th International Workshop on Polyhedral Compilation Techniques (IMPACT) (2014)Google Scholar
  28. 28.
    Simbürger, A., Sattler, F., Größlinger, A., Lengauer, C.: BenchBuild: A large-scale empirical-research toolkit. Technical Report MIP-1602, Faculty of Computer Science and Mathematics, University of Passau (2016)Google Scholar
  29. 29.
    Stojanov, A., Toskov, I., Rompf, T., Püschel, M.: SIMD intrinsics on managed language runtimes. In: Proceedings of 15th International Symposium on Code Generation and Optimization (CGO), pp. 2–15. ACM (2018)Google Scholar
  30. 30.
    Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010)Google Scholar
  31. 31.
    Streit, K., Hammacher, C., Zeller, A., Hack, S.: Sambamba: a runtime system for online adaptive parallelization. In: Franke, B. (ed.) Proceedings of 21st International Conference on Compiler Construction (CC), pp. 240–243. Springer, Berlin (2012)CrossRefGoogle Scholar
  32. 32.
    Strzodka, R., Shaheen, M., Pajak, D., Seidel, H.P.: Cache accurate time skewing in iterative stencil computations. In: Proceedings of International Conference on Parallel Processing (ICPP), pp. 571–581. IEEE Computer Society (2011)Google Scholar
  33. 33.
    Tavarageri, S., Pouchet, L., Ramanujam, J., Rountev, A., Sadayappan, P.: Dynamic selection of tile sizes. In: Proceedings of 18th International Conference on High Performance Computing (HiPC), pp. 1–10 (2011)Google Scholar
  34. 34.
    Trifunovic, K., Cohen, A., Edelsohn, D., Li, F., Grosser, T., Jagasia, H., Ladelsky, R., Pop, S., Sjödin, J., Upadrasta, R.: GRAPHITE two years after: first lessons learned from real-world polyhedral compilation. In: Proceedings of International Workshop on GCC Research Opportunities (GROW), pp. 1–13 (2010). Accessed 25 Feb 2018
  35. 35.
    Vanhatalo, J., Völzer, H., Koehler, J.: The refined process structure tree. Data Knowl. Eng. 68(9), 793–818 (2009)CrossRefGoogle Scholar
  36. 36.
    Xue, J.: Loop Tiling for Parallelism, vol. 575. Springer, Berlin (2012)zbMATHGoogle Scholar
  37. 37.
    Yuki, T., Renganarayanan, L., Rajopadhye, S.V., Anderson, C., Eichenberger, A.E., O’Brien, K.: Automatic creation of tile size selection models. In: Proceedings of 8th International Symposium on Code Generation and Optimization (CGO), pp. 190–199 (2010)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of PassauPassauGermany

Personalised recommendations