Advertisement

Fast Heuristic-Based GPU Compiler Sequence Specialization

  • Ricardo NobreEmail author
  • Luís Reis
  • João M. P. Cardoso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)

Abstract

Iterative compilation focused on specialized phase orders (i.e., custom selections of compiler passes and orderings for each program or function) can significantly improve the performance of compiled code. However, phase ordering specialization typically needs to deal with large solution space. A previous approach, evaluated by targeting an x86 CPU, mitigates this issue by first using a training phase on reference codes to produce a small set of high-quality reusable phase orders. This approach then uses these phase orders to compile new codes, without any code analysis. In this paper, we evaluate the viability of using this approach to optimize the GPU execution performance of OpenCL kernels. In addition, we propose and evaluate the use of a heuristic to further reduce the number of evaluated phase orders, by comparing the speedups of the resulting binaries with those of the training phase for each phase order. This information is used to predict which untested phase order is most likely to produce good results (e.g., highest speedup). We performed our measurements using the PolyBench/GPU OpenCL benchmark suite on an NVIDIA Pascal GPU. Without heuristics, we can achieve a geomean execution speedup of 1.64\(\times \), using cross-validation, with 5 non-standard phase orders. With the heuristic, we can achieve the same speedup with only 3 non-standard phase orders. This is close to the geomean speedup achieved in our iterative compilation experiments exploring thousands of phase orders. Given the significant reduction in exploration time and other advantages of this approach, we believe that it is suitable for a wide range of compiler users concerned with performance.

Keywords

GPU Phase ordering Optimization 

Notes

Acknowledgements

This work was partially supported by the TEC4Growth project, “NORTE-01-0145-FEDER-000020”, financed by the North Portugal Regional Operational Programme (NORTE 2020) under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF). Reis acknowledges the support by FCT through PD/BD/105804/2014.

References

  1. 1.
    Agakov, F., et al.: Using machine learning to focus iterative optimization. In: CGO 2006, pp. 295–305. IEEE Computer Society, Washington, DC (2006)Google Scholar
  2. 2.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2006)zbMATHGoogle Scholar
  3. 3.
    Almagor, L., et al.: Finding effective compilation sequences. In: LCTES 2004, pp. 231–239. ACM, New York (2004)CrossRefGoogle Scholar
  4. 4.
    Ashouri, A.H., Bignoli, A., Palermo, G., Silvano, C., Kulkarni, S., Cavazos, J.: Micomp: mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM TACO 14(3), 29 (2017)Google Scholar
  5. 5.
    Ashouri, A.H., Bignoli, A., Palermo, G., Silvano, C.: Predictive modeling methodology for compiler phase-ordering. In: PARMA-DITAM 2016, pp. 7–12. ACM, New York (2016)Google Scholar
  6. 6.
    Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE IISWC, October 2009Google Scholar
  7. 7.
    Cooper, K.D., et al.: Exploring the structure of the space of compilation sequences using randomized search algorithms. J. Supercomput. 36(2), 135–151 (2006)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: LCTES 1999, pp. 1–9. ACM, New York (1999)Google Scholar
  9. 9.
    Eide, E., Regehr, J.: Volatiles are miscompiled, and what to do about it. In: Proceedings of the 8th ACM International Conference on Embedded Software, EMSOFT 2008, pp. 255–264. ACM, New York (2008)Google Scholar
  10. 10.
    Huang, Q., et al.: The effect of compiler optimizations on high-level synthesis-generated hardware. ACM TRETS 8(3), 14:1–14:26 (2015)Google Scholar
  11. 11.
    Kulkarni, S., Cavazos, J.: Mitigating the compiler optimization phase-ordering problem using machine learning. In: OOPSLA 2012, pp. 147–162. ACM, New York (2012)CrossRefGoogle Scholar
  12. 12.
    Martins, L.G.A., Nobre, R., Cardoso, J.M.P., Delbem, A.C.B., Marques, E.: Clustering-based selection for the exploration of compiler optimization sequences. ACM TACO 13(1), 8:1–8:28 (2016)Google Scholar
  13. 13.
    Nobre, R.: Identifying sequences of optimizations for HW/SW compilation. In: FPL 2013, pp. 1–2, September 2013Google Scholar
  14. 14.
    Nobre, R., Martins, L.G.A., Cardoso, J.a.M.P.: A graph-based iterative compiler pass selection and phase ordering approach. In: LCTES 2016, pp. 21–30. ACM, New York (2016)Google Scholar
  15. 15.
    Nobre, R., Reis, L., Cardoso, J.M.P.: Impact of compiler phase ordering when targeting GPUs. In: Heras, D.B., Bougé, L. (eds.) Euro-Par 2017. LNCS, vol. 10659, pp. 427–438. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-75178-8_35CrossRefGoogle Scholar
  16. 16.
    Purini, S., Jain, L.: Finding good optimization sequences covering program space. ACM TACO 9(4), 56:1–56:23 (2013)Google Scholar
  17. 17.
    Scott Grauer-Gray, L.N.P.: Polybench/GPU: Implementation of Polybench codes for GPU processing (2012). http://web.cs.ucla.edu/~pouchet/software/polybench/GPU/index.html
  18. 18.
    Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: IISWC 2011, pp. 137–148. IEEE Computer Society, Washington, DC (2011)Google Scholar
  19. 19.
    Sher, G., Martin, K., Dechev, D.: Preliminary results for neuroevolutionary optimization phase order generation for static compilation. In: ODES 2014, pp. 33–40. ACM, New York (2014)Google Scholar
  20. 20.
    Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formal verification of SSA-based optimizations for LLVM. SIGPLAN Not. 48(6), 175–186 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ricardo Nobre
    • 1
    Email author
  • Luís Reis
    • 1
  • João M. P. Cardoso
    • 1
  1. 1.Faculty of Engineering, University of Porto, INESC TECPortoPortugal

Personalised recommendations