Memory Contention Aware Power Management for High Performance GPUs

  • Hong Jun Choi
  • Dong Oh Son
  • Cheol Hong KimEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 931)


To improve the performance of the GPU, more parallelism should be exploited and the GPU should be operated at higher clock frequency. However, high parallelism and high clock frequency cause serious memory contention problems, resulting in significant power consumption and increased idle cycles in the GPU. This paper proposes a new memory contention aware (MC-aware) power management scheme to reduce the power consumption of the GPU with little impact on the performance. When serious memory contention problems occur in the GPU, the proposed MC-aware scheme changes the mode of the SM (Streaming Multiprocessor) to power saving mode with little performance degradation. The proposed scheme monitors the degree of memory contention, since severe memory contention causes serious performance degradation. The proposed GPU architecture includes SM management unit that generates the control signals based on the estimated degree of memory contention. According to our simulation results, the proposed MC-aware scheme can increase the power efficiency, IPC per watt, by up to 31.4% compared to the conventional architecture.


GPU Performance Memory contention Power efficiency Streaming multiprocessor 



This study was financially supported by Chonnam National University (Grant number: 2017-2727).


  1. 1.
    Luebke, D., Humphreys, G.: How GPUs work. J. Comput. 40, 96–100 (2007)Google Scholar
  2. 2.
    Buck, I., et al.: Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23, 777–786 (2004)CrossRefGoogle Scholar
  3. 3.
    General-purpose computation on graphics hardware.
  4. 4.
  5. 5.
    Jing, N., et al.: An energy-efficient and scalable eDRAM-Based register file architecture for GPGPU. ACM SIGARCH Comput. Arch. News 41, 344–355 (2013)CrossRefGoogle Scholar
  6. 6.
    Rhu, M., Erez, M.: The dual-path execution model for efficient GPU Control Flow. In: High Performance Computer Architecture, pp. 591–602 (2013)Google Scholar
  7. 7.
    Gilani, S.Z., Kim, N.S., Schulte, M.J.: Power-efficient computing for compute-intensive GPGPU applications. In: High Performance Computer Architecture, pp. 330–341 (2013)Google Scholar
  8. 8.
    Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic warp formation and scheduling for efficient GPU Control Flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407–420 (2007)Google Scholar
  9. 9.
    Thornton, J.E.: Parallel operation in the control data 6600. In: Fall Joint Computer Conference, Part II: Very High Speed Computer Systems, AMC (1964)Google Scholar
  10. 10.
  11. 11.
    Abdalla, K.M., et al.: US Patent US20130185725: Scheduling and Execution of Compute Tasks (2013)Google Scholar
  12. 12.
    Abdel-Majeed, M., et al.: Gating aware scheduling and power gating for GPGPUs. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 111–122 (2013)Google Scholar
  13. 13.
    Wang, P.-H., Yang, C.-L., Chen, Y.-M., Cheng, Y.-J.: Power gating strategies on GPUs. ACM Trans. Arch. Code Optim., 8, (2011)CrossRefGoogle Scholar
  14. 14.
    Leng, J., et al.: GPUWattch: enabling energy optimizations in GPGPUs. In: Proceedings of the International Symposium Computer Architecture, pp. 487–498 (2013)CrossRefGoogle Scholar
  15. 15.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Performance Analysis of Systems and Software, pp. 163–174 (2009)Google Scholar
  16. 16.
    Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Microarchitecture MICRO-42, pp. 469–480 (2009)Google Scholar
  17. 17.
  18. 18.
    Goodrum, M.A., Trotter, M.J., Aksel, A., Acton, S.T., Skadron, K.: Parallelization of particle filter algorithms. In: Varbanescu, A.L., Molnos, A., van Nieuwpoort, R. (eds.) ISCA 2010. LNCS, vol. 6161, pp. 139–149. Springer, Heidelberg (2011). Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.The Attached Institute of ETRIDaejeonKorea
  2. 2.Avionics R&D Lab, LIG Nex1DaejeonKorea
  3. 3.School of Electronics and Computer EngineeringChonnam National UniversityGwangjuKorea

Personalised recommendations