Advertisement

GPU Memory Management Solution Supporting Incomplete Pages

  • Li ShenEmail author
  • Shiqing Zhang
  • Yaohua Yang
  • Zhiying Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11276)

Abstract

Despite the increasing investment in integrated GPUs and next-generation interconnect research, discrete GPUs connected by PCI Express still account for the dominant position of the market, the management of data communication between CPU and GPU continues to evolve. This paper analyze the address translation overhead and migration latency introduced by this paged memory management solution in CPU-GPU heterogeneous systems. Based on the analysis, a new memory management scheme is proposed: paged memory management solution supporting incomplete pages, which can limit both address translation overhead and migration delay. “Incomplete” refers to a page that has only been partially migrated. This new memory management solution modifies the address translation and data migration process with only minor changes in hardware.

References

  1. 1.
    Harris, M.: Unified memory in CUDA 6. GTC On-Demand, NVIDIA (2013)Google Scholar
  2. 2.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. Proc. IEEE Micro 28(2), 39–55 (2008)CrossRefGoogle Scholar
  3. 3.
    Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.: An investigation of unified memory access performance in CUDA. In: Proceedings of IEEE High Performance Extreme Computing Conference, pp. 1–6 (2014)Google Scholar
  4. 4.
    Zheng, T., Nellans, D., Zulfiqar, A., Stephenson, M., Keckler, S.W.: Towards high performance paged memory for GPUs. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture, pp. 345–357 (2016)Google Scholar
  5. 5.
    Lustig, D., Martonosi, M.: Reducing GPU offload latency via fine-grained CPU-GPU synchronization. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture, pp. 354–365 (2013)Google Scholar
  6. 6.
    Agarwal, N., Nellans, D., Stephenson, M., O’Connor, M., Keckler, S.W.: Page placement strategies for GPUs within heterogeneous memory systems. ACM SIGPLAN Not. 50, 607–618 (2015)CrossRefGoogle Scholar
  7. 7.
    Vesely, J., Basu, A., Oskin, M., Loh, G.H., Bhattacharjee, A.: Observations and opportunities in architecting shared virtual memory for heterogeneous systems. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, pp. 161–171 (2016)Google Scholar
  8. 8.
    Ausavarungnirun, R., et al.: Mosaic: a GPU memory manager with application-transparent support for multiple page sizes. Carnegie Mellon University, SAFARI Research Group, Technical report TR-2017-003 (2017)Google Scholar
  9. 9.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174 (2009)Google Scholar
  10. 10.
    Aamodt, T.M., et al.: GPGPU-Sim 3.x manual (2012)Google Scholar
  11. 11.
    Ajanovic, J.: PCI express 3.0 overview. In: Proceedings of Hot Chips: A Symposium on High Performance Chips (2009)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Li Shen
    • 1
    Email author
  • Shiqing Zhang
    • 1
  • Yaohua Yang
    • 1
  • Zhiying Wang
    • 1
  1. 1.National University of Defense TechnologyChangshaChina

Personalised recommendations