Skip to main content

OpenCL Task Partitioning in the Presence of GPU Contention

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8664))

Abstract

Heterogeneous multi- and many-core systems are increasingly prevalent in the desktop and mobile domains. On these systems it is common for programs to compete with co-running programs for resources. While multi-task scheduling for CPUs is a well-studied area, how to partitioning and map computing tasks onto the heterogeneous system in the presence of GPU contention (i.e. multiple programs compete for the GPU) remains an outstanding problem.

In this paper we consider the problem of partitioning OpenCL kernels on a CPU-GPU based system in the presence of contention on the GPU. We propose a machine learning-based approach that predicts the optimal partitioning of OpenCL kernels, explicitly taking GPU contention into account. Our predictive model achieves a speed-up of 1.92 over a scheme that always uses the GPU. When compared to two state-of-the-art dynamic approaches our model achieves speed-ups of 1.54 and 2.56 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    NVIDIA GPUs allow concurrent executions of kernels from the same application but not from different applications.

References

  1. AMD. Accelerated parallel processing (APP) SDK (2013)

    Google Scholar 

  2. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)

    Google Scholar 

  3. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Conference on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  4. Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: LCTES ’99, pp. 1–9 (1999)

    Google Scholar 

  5. Eyerman, S., Eeckhout, L.: Probabilistic job symbiosis modeling for SMT processor scheduling. In: ASPLOS ’10, pp. 91–102

    Google Scholar 

  6. Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011)

    Google Scholar 

  7. Grewe, D., Wang, Z., O’Boyle, M.F.P.: A workload-aware mapping approach for data-parallel programs. In: HiPEAC ’11 (2011)

    Google Scholar 

  8. Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level  directive-based language for GPU programming. In: GPGPU ’09

    Google Scholar 

  9. Hormati, A., Samadi, M., Woh, M., Mudge, T., Mahlke, S.:  Sponge: portable stream programming on graphics engines. In: ASPLOS ’11

    Google Scholar 

  10. Intel. Intel SDK for OpenCL applications 2013 — intel developer zone (2013)

    Google Scholar 

  11. Kim, J., Kim, H., Lee, J.H. Lee, J.: Achieving a single  compute device image in OpenCL for multiple GPUs. In: PPoPP ’11

    Google Scholar 

  12. LLVM. Clang: a C language family frontend for LLVM. http://clang.llvm.org/

  13. Long, S., O’Boyle, M.F.P.: Adaptive java optimisation using instance-based learning. In: ICS ’04

    Google Scholar 

  14. Luk, C.-K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO 42 (2009)

    Google Scholar 

  15. Raman, A., Zaks, A., Lee, J.W., August, D.I.: Parcae: a system for  exible parallel execution. In: PLDI ’12, pp. 133–144

    Google Scholar 

  16. Ravi, V.T. Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: SC, pp. 137–146 (2010)

    Google Scholar 

  17. Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreaded processor. In: ASPLOS-IX, pp. 234–244 (2000)

    Google Scholar 

  18. Wang, Z., O’Boyle, M.F.P.: Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim. 10(3) (2013)

    Google Scholar 

  19. Wang, Z., O’Boyle, M.F.P., Emani, M.K.: Smart, adaptive mapping of parallelism in the presence of external workload. In: CGO ’13 (2013)

    Google Scholar 

  20. Wang, Z., O’Boyle, M.F.P.: Mapping parallelism to multi-cores: a machine learning based approach. In: PPoPP ’09 (2008)

    Google Scholar 

  21. Wang, Z., O’Boyle, M.F.P.: Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: PACT ’10 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Grewe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Grewe, D., Wang, Z., O’Boyle, M.F.P. (2014). OpenCL Task Partitioning in the Presence of GPU Contention. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09967-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09966-8

  • Online ISBN: 978-3-319-09967-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics