Advertisement

DQN-based OpenCL workload partition for performance optimization

  • Sanghyun Park
  • Taeweon SuhEmail author
Article
  • 3 Downloads

Abstract

This paper proposes a deep Q network (DQN)-based method for the workload partition problem in OpenCL. The DQN, a reinforcement learning algorithm, optimizes the workload partition for each processing unit by the self-training, based on the accumulated performance data on the computing environment. Our experiments reveal that the DQN-based partition provides the performance improvement by up to 62.2% and 6.9% in JPEG decoding, compared to the LuxMark-based and target-based partitions, respectively. The DQN is able to capture the low-level contention in slave devices such as caches and memory, and the communication bottleneck between devices, and reflect it to the workload partition ratio.

Keywords

OpenCL DQN Workload partition 

Notes

Acknowledgements

This work was partially supported by the National Research Foundation of Korea under Grant NRF-2017R1D1A1B03028926.

References

  1. 1.
    Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim (TACO) 9(4):57Google Scholar
  2. 2.
    Cano A (2018) A survey on graphic processing unit computing for large-scale data mining. Wiley Interdiscip Rev Data Min Knowl Discov 8(1):e1232MathSciNetCrossRefGoogle Scholar
  3. 3.
    Choi HJ, Son DO, Kang SG, Kim JM, Lee HH, Kim CH (2013) An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. J Supercomput 65(2):886–902CrossRefGoogle Scholar
  4. 4.
    Constantinides GA (2017) FPGAs in the cloud. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, pp 167–167Google Scholar
  5. 5.
    Gaster B, Howes L, Kaeli DR, Mistry P, Schaa D (2012) Heterogeneous computing with OpenCL: revised OpenCL. 1.2 edn. Morgan KaufmannGoogle Scholar
  6. 6.
    Gregg C, Brantley J, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX workshop on hot topics in parallelism, HotPar, Berkeley, CAGoogle Scholar
  7. 7.
    Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305Google Scholar
  8. 8.
    Group KOW et al. (2011) The OpenCL specification version 1.1. http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf. Accessed 21 Apr 2018
  9. 9.
    Helal AE, Feng Wc, Jung C, Hanafy YY (2017) AutoMatch: an automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems. In: Workload characterization (IISWC), 2017 IEEE international symposium on. IEEE, pp 32–42Google Scholar
  10. 10.
    Kasim H, March V, Zhang R, See S (2008) Survey on parallel programming model. In: IFIP International Conference on Network and Parallel Computing. Springer, pp 266–275Google Scholar
  11. 11.
    Li HF, Liang TY, Chiu JY (2013) A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66(1):381–405CrossRefGoogle Scholar
  12. 12.
    Li L, Li X, Tan G, Chen M, Zhang P (2011) Experience of parallelizing cryo-EM 3D reconstruction on a CPU–GPU heterogeneous system. In: Proceedings of the 20th international symposium on High performance distributed computing. ACM, pp 195–204Google Scholar
  13. 13.
    Lu F, Song J, Cao X, Zhu X (2012) CPU/GPU computing for long-wave radiation physics on large GPU clusters. Comput Geosci 41:47–55CrossRefGoogle Scholar
  14. 14.
    LuxCoreRender: Luxmark, an OpenCL benchmark based on LuxCoreRender. http://luxmark.info/. Accessed 3 Mar 2018
  15. 15.
    Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks. ACM, pp 50–56Google Scholar
  16. 16.
    Mittal S, Vetter JS (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv (CSUR) 47(4):69CrossRefGoogle Scholar
  17. 17.
    Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529CrossRefGoogle Scholar
  18. 18.
    Munir A, Koushanfar F, Gordon-Ross A, Ranka S (2013) High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study. J Supercomput 66(1):431–487CrossRefGoogle Scholar
  19. 19.
    Navarro A, Vilches A, Corbera F, Asenjo R (2014) Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures. J Supercomput 70(2):756–771CrossRefGoogle Scholar
  20. 20.
    Ogata Y, Endo T, Maruyama N, Matsuoka S (2008) An efficient, model-based CPU–GPU heterogeneous FFT library. In: Parallel and distributed processing, 2008. IPDPS 2008. IEEE international symposium on. IEEE, pp 1–10Google Scholar
  21. 21.
    Sodsong W, Hong J, Chung S, Lim Y, Kim SD, Burgstaller B (2014) Dynamic partitioning-based jpeg decompression on heterogeneous multicore architectures. In: Proceedings of programming models and applications on multicores and manycores. ACM, p 80Google Scholar
  22. 22.
    Steuwer M, Gorlatch S (2014) SkelCL: a high-level extension of OpenCL for multi-GPU systems. J Supercomput 69(1):25–33CrossRefGoogle Scholar
  23. 23.
    Sutton RS, Barto AG (1998) Introduction to reinforcement learning, vol 135. MIT Press, CambridgeGoogle Scholar
  24. 24.
    Tang W, Lease M (2011) Semi-supervised consensus labeling for crowdsourcing. In: SIGIR 2011 workshop on crowdsourcing for information retrieval (CIR), pp 1–6Google Scholar
  25. 25.
    Taylor B, Marco VS, Wang Z (2017) Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In: ACM SIGPLAN notices, vol 52. ACM, pp 11–20Google Scholar
  26. 26.
    Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292zbMATHGoogle Scholar
  27. 27.
    Windh S, Ma X, Halstead RJ, Budhkar P, Luna Z, Hussaini O, Najjar WA (2015) High-level language tools for reconfigurable computing. Proc IEEE 103(3):390–408CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Hanwha Systems Co., Ltd.SeoulKorea
  2. 2.Computer Science and EngineeringKorea UniversitySeoulKorea

Personalised recommendations