Abstract
Heterogeneous multi- and many-core systems are increasingly prevalent in the desktop and mobile domains. On these systems it is common for programs to compete with co-running programs for resources. While multi-task scheduling for CPUs is a well-studied area, how to partitioning and map computing tasks onto the heterogeneous system in the presence of GPU contention (i.e. multiple programs compete for the GPU) remains an outstanding problem.
In this paper we consider the problem of partitioning OpenCL kernels on a CPU-GPU based system in the presence of contention on the GPU. We propose a machine learning-based approach that predicts the optimal partitioning of OpenCL kernels, explicitly taking GPU contention into account. Our predictive model achieves a speed-up of 1.92 over a scheme that always uses the GPU. When compared to two state-of-the-art dynamic approaches our model achieves speed-ups of 1.54 and 2.56 respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
NVIDIA GPUs allow concurrent executions of kernels from the same application but not from different applications.
References
AMD. Accelerated parallel processing (APP) SDK (2013)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Conference on Computational Learning Theory, pp. 144–152 (1992)
Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: LCTES ’99, pp. 1–9 (1999)
Eyerman, S., Eeckhout, L.: Probabilistic job symbiosis modeling for SMT processor scheduling. In: ASPLOS ’10, pp. 91–102
Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011)
Grewe, D., Wang, Z., O’Boyle, M.F.P.: A workload-aware mapping approach for data-parallel programs. In: HiPEAC ’11 (2011)
Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: GPGPU ’09
Hormati, A., Samadi, M., Woh, M., Mudge, T., Mahlke, S.: Sponge: portable stream programming on graphics engines. In: ASPLOS ’11
Intel. Intel SDK for OpenCL applications 2013 — intel developer zone (2013)
Kim, J., Kim, H., Lee, J.H. Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: PPoPP ’11
LLVM. Clang: a C language family frontend for LLVM. http://clang.llvm.org/
Long, S., O’Boyle, M.F.P.: Adaptive java optimisation using instance-based learning. In: ICS ’04
Luk, C.-K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO 42 (2009)
Raman, A., Zaks, A., Lee, J.W., August, D.I.: Parcae: a system for exible parallel execution. In: PLDI ’12, pp. 133–144
Ravi, V.T. Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: SC, pp. 137–146 (2010)
Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreaded processor. In: ASPLOS-IX, pp. 234–244 (2000)
Wang, Z., O’Boyle, M.F.P.: Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim. 10(3) (2013)
Wang, Z., O’Boyle, M.F.P., Emani, M.K.: Smart, adaptive mapping of parallelism in the presence of external workload. In: CGO ’13 (2013)
Wang, Z., O’Boyle, M.F.P.: Mapping parallelism to multi-cores: a machine learning based approach. In: PPoPP ’09 (2008)
Wang, Z., O’Boyle, M.F.P.: Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: PACT ’10 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Grewe, D., Wang, Z., O’Boyle, M.F.P. (2014). OpenCL Task Partitioning in the Presence of GPU Contention. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-09967-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)