Skip to main content

Adaptive Simultaneous Multi-tenancy for GPUs

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11332))

Included in the following conference series:

  • 452 Accesses

Abstract

Graphics Processing Units (GPUs) are energy-efficient massively parallel accelerators that are increasingly deployed in multi-tenant environments such as data-centers for general-purpose computing as well as graphics applications. Using GPUs in multi-tenant setups requires an efficient and low-overhead method for sharing the device among multiple users that improves system throughput while adapting to the changes in workload. This requires mechanisms to control the resources allocated to each kernel, and an efficient policy to make decisions about this allocation.

In this paper, we propose adaptive simultaneous multi-tenancy to address these issues. Adaptive simultaneous multi-tenancy allows for sharing the GPU among multiple kernels, as opposed to single kernel multi-tenancy that only runs one kernel on the GPU at any given time and static simultaneous multi-tenancy that does not adapt to events in the system. Our proposed system dynamically adjusts the kernels’ parameters at run-time when a new kernel arrives or a running kernel ends. Evaluations using our prototype implementation show that, compared to sequentially executing the kernels, system throughput is improved by an average of 9.8% (and up to 22.4%) for combinations of kernels that include at least one low-utilization kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The recently announced NVIDIA Volta architecture solves the head-of-line blocking at the GPU block scheduler by dividing the GPU into smaller virtual GPUs, but it lacks the flexibility provided by persistent threads.

  2. 2.

    Scratchpad memory in NVIDIA terminology is called shared memory.

References

  1. Adriaens, J.T., Compton, K., Kim, N.S., Schulte, M.J.: The case for GPGPU spatial multitasking. In: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture. HPCA 2012, pp. 1–12. IEEE Computer Society, Washington, DC (2012). http://dx.doi.org/10.1109/HPCA.2012.6168946

  2. Amazon Web Services: Elastic GPUS (2017). https://aws.amazon.com/ec2/Elastic-GPUs/

  3. Basaran, C., Kang, K.D.: Supporting preemptive task executions and memory copies in GPGPUS. In: Proceedings of the 2012 24th Euromicro Conference on Real-Time Systems. ECRTS 2012, pp. 287–296. IEEE Computer Society, Washington, DC (2012). http://dx.doi.org/10.1109/ECRTS.2012.15

  4. Chase, J.S., Anderson, D.C., Thakar, P.N., Vahdat, A.M., Doyle, R.P.: Managing energy and server resources in hosting centers. In: Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles. SOSP 2001, pp. 103–116. ACM, New York (2001). http://doi.acm.org/10.1145/502034.502045

  5. Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A characterization of the Rodinia benchmark suite with comparison to contemporary cmp workloads. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC 2010), pp. 1–11. IISWC 2010. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/IISWC.2010.5650274

  6. Chen, G., Zhao, Y., Shen, X., Zhou, H.: Effisha: a software framework for enabling effficient preemptive scheduling of GPU. In: Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 3–16. PPoPP 2017, ACM, New York (2017). http://doi.acm.org/10.1145/3018743.3018748

  7. Danalis, A., et al.: The scalable heterogeneous computing (shoc) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74. GPGPU-3, ACM, New York (2010). http://doi.acm.org/10.1145/1735688.1735702

  8. Eyerman, S., Eeckhout, L.: System-level performance metrics for multiprogram workloads. IEEE Micro 28(3), 42–53 (2008)

    Article  Google Scholar 

  9. Google: Google cloud platforms (2017). https://cloud.google.com/gpu/

  10. Gregg, C., Dorn, J., Hazelwood, K., Skadron, K.: Fine-grained Resource Sharing for Concurrent GPGPU Kernels. In: Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism. HotPar 2012, p. 10. USENIX Association, Berkeley, (2012). http://dl.acm.org/citation.cfm?id=2342788.2342798

  11. Gupta, K., Stuart, J.A., Owens, J.D.: A study of persistent threads style GPU programming for GPGPU workloads. In: 2012 Innovative Parallel Computing (InPar), pp. 1–14, May 2012

    Google Scholar 

  12. Jiao, Q., Lu, M., Huynh, H.P., Mitra, T.: Improving GPGPU energy-efficiency through concurrent kernel execution and DVFs. In: Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization. CGO 2015, pp. 1–11. IEEE Computer Society, Washington, DC (2015). http://dl.acm.org/citation.cfm?id=2738600.2738602

  13. Jones, S.: Introduction to dynamic parallelism. In: Nvidia GPU Technology Conference. NVIDIA (2012). http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0338-GTC2012-CUDA-Programming-Model.pdf

  14. Liang, Y., Huynh, H.P., Rupnow, K., Goh, R.S.M., Chen, D.: Efficient gpu spatial-temporal multitasking. IEEE Trans. Parall. Distrib. Syst. 26(3), 748–760 (2015)

    Article  Google Scholar 

  15. Microsoft: Microsoft azure (2016). https://azure.microsoft.com/en-us/blog/azure-n-series-general-availability-on-december-1/

  16. Nvidia: CUDA programming guide (2008). https://docs.nvidia.com/cuda/cuda-c-programming-guide/

  17. Nvidia: Next generation CUDA computer architecture Kepler GK110 (2012)

    Google Scholar 

  18. NVIDIA: Multi-process service (2015). https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

  19. NVIDIA: Pascal architecture whitepaper, June 2015. http://www.nvidia.com/object/pascal-architecture-whitepaper.html

  20. NVIDIA: Volta architecture whitepaper, June 2015. http://www.nvidia.com/object/volta-architecture-whitepaper.html

  21. Pai, S., Thazhuthaveetil, M.J., Govindarajan, R.: Improving GPGPU concurrency with elastic kernels. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 407–418. ASPLOS 2013, ACM, New York (2013). http://doi.acm.org/10.1145/2451116.2451160

  22. Park, J.J.K., Park, Y., Mahlke, S.: Chimera: collaborative preemption for multitasking on a shared GPU. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2015, pp. 593–606. ACM, New York (2015). http://doi.acm.org/10.1145/2694344.2694346

  23. Park, J.J.K., Park, Y., Mahlke, S.: Dynamic resource management for efficient utilization of multitasking GPUs. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2017, pp. 527–540. ACM, New York (2017). http://doi.acm.org/10.1145/3037697.3037707

  24. Randles, M., Lamb, D., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 551–556, April 2010

    Google Scholar 

  25. Shahar, S., Bergman, S., Silberstein, M.: Activepointers: a case for software address translation on GPUs. In: Proceedings of the 43rd International Symposium on Computer Architecture. ISCA 2016, pp. 596–608. IEEE Press, Piscataway (2016). https://doi.org/10.1109/ISCA.2016.58

  26. Stratton, J.A., et al.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. Technical report (2012). https://scholar.google.com/scholar?oi=bibs&hl=en&cluster=14097255143770688510

  27. Tanasic, I., Gelado, I., Cabezas, J., Ramirez, A., Navarro, N., Valero, M.: Enabling preemptive multiprogramming on GPUs. In: Proceeding of the 41st Annual International Symposium on Computer Architecuture, pp. 193–204. ISCA 2014, IEEE Press, Piscataway (2014). http://dl.acm.org/citation.cfm?id=2665671.2665702

  28. Wang, Z., Yang, J., Melhem, R., Childers, B., Zhang, Y., Guo, M.: Simultaneous multikernel GPU: Multi-tasking throughput processors via fine-grained sharing. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 358–369, March 2016

    Google Scholar 

  29. Wu, B., Chen, G., Li, D., Shen, X., Vetter, J.: Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In: Proceedings of the 29th ACM on International Conference on Supercomputing. ICS 2015, pp. 119–130. ACM, New York (2015). http://doi.acm.org/10.1145/2751205.2751213

  30. Wu, B., Liu, X., Zhou, X., Jiang, C.: Flep: enabling flexible and efficient preemption on GPUs. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 483–496. ASPLOS 2017, ACM, New York (2017). http://doi.acm.org/10.1145/3037697.3037742

  31. Xu, Q., Jeon, H., Kim, K., Ro, W.W., Annavaram, M.: Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 230–242, June 2016

    Google Scholar 

  32. Zhong, J., He, B.: Kernelet: high-throughput gpu kernel executions with dynamic slicing and scheduling. IEEE Trans. Parallel Distrib. Syst. 25(6), 1522–1532 (2014). https://doi.org/10.1109/TPDS.2013.257

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Science Foundation (CCF-1335443) and equipment donations from NVIDIA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramin Bashizade .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bashizade, R., Li, Y., Lebeck, A.R. (2019). Adaptive Simultaneous Multi-tenancy for GPUs. In: Klusáček, D., Cirne, W., Desai, N. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2018. Lecture Notes in Computer Science(), vol 11332. Springer, Cham. https://doi.org/10.1007/978-3-030-10632-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10632-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10631-7

  • Online ISBN: 978-3-030-10632-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics