Skip to main content
Log in

Demand look-ahead memory access scheduling for 3D graphics processing units

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid growing complexity of 3D applications, the memory subsystem has become the most bandwidth-exhausting bottleneck in a Graphics Processing Unit (GPU). To produce realistic images, tens to hundreds of thousands of primitives are used. Furthermore, each primitive generates thousands of pixels, and these pixels are computed by shaders with special effects, even to blend multiple texture pixels from external memory to obtain a final color. To hide the long latency texture operations, the shaders are usually highly multithreaded to increase its throughput. However, conventional memory scheduling mechanisms are unaware of the producer-consumer relationship between primitives and pixels. The conventional scheduling mechanisms neither assume that all initiators are independent nor that they use a fixed priority scheme. This paper proposes Demand Look-Ahead (DLA) memory access scheduling based on the statuses of each unit in the GPU, and dynamically generates priority for the memory request scheduler. By considering the producer-consumer relationship, the proposed mechanism reschedules most urgent requests to be serviced first. Experimental results show that the proposed DLA improves 1.47 % and 1.44 % in FPS and IPC, respectively, than First-Ready First-Come-First-Serve (FR-FCFS). By integrating DLA with Bank-level Parallelism Awareness (BPA), DLA-BPA improves FPS and IPC by 7.28 % and 6.55 %, respectively. Furthermore, shader thread performance is improved by 22.06 % and increases the attainable bandwidth by 5.91 % with DLA-BPA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In the following sections, the term “rendering batch” refers to the batch used in graphics rendering. Otherwise, the batch represents a group of memory requests for memory access scheduling.

References

  1. Ausavarungnirun R, Chang K-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: Proceedings of the 39th International Symposium on Computer Architecture, pp 416–427

  2. Ebrahimi E, Miftakhutdinov R, Falling C, Lee CJ, Joao JA, Mutlu O, Patt YN (2011) Parallel application memory scheduling. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp 362–373

  3. Hong S, Mckee S, Salinas M, Klenke R, Aylor J, Wulf W (1999) Access order and effective bandwidth for streams on a direct rambus memory. In: Proceeding of High-Performance Computer Architecture, pp 80–89

  4. Hynix (2006) 512M (16Mx32) GDDR3 SDRAM HY5RS123235FP Specification

  5. Jeong MK, Erez M, Sudanthi C, Paver N (2012) A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In: Proceeding of Design Automation Conference, pp 850–855

  6. Joao JA, Suleman AM, Mutlu O, Patt YN (2012) Bottleneck identification and scheduling in multithreaded applications. In: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, pp 223–234

  7. Juffa N, Coon B (2011) Maximized memory throughput using cooperative thread arrays. US Patent 7,925,860 B1 Apr 1998

  8. Kim Y, Han D, Mutlu O, Harcol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: Proceedings of the 16th International Symposium on High-Performance Computer Architecture, pp 1–12

  9. Kim Y, Papamichael M, Mutlu O, Harcol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp 65–76

  10. Kruger F (2008) High bandwidth memory technology: system architecture implications and perspective. In: Hot chips 20

  11. Lee J, Lakshminarayana N, Kim H, Vuduc R (2010) Many-thread aware prefetching mechanisms for GPGPU applications. In: Proceeding of International Symposium on Microarchitecture, pp 213–224

  12. Mantor M (2007) AMD’s Radeon HD 2900 2nd Generation Unified Shader Architecture. In: Hot Chips 19

  13. Mizuyabu C, Chow P, Swan P, Wang C (2003) Method and apparatus for memory access scheduling in a video graphics system. US Patent 6,297,832 B1 May 2003

  14. Moya V, Gonzalez C, Roca J, Fernandez A, Espana R (2006) ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In: Proceeding of IEEE International Symposium on Performance Analysis of Systems and Software, pp 231–241

  15. Moya V, Gonzalez C, Solis C, Fernandez A, Espana R (2006) Workload characterization of 3D games. In: Proceeding of IEEE International Symposium on Workload Characterization, pp 17–26

  16. Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: Proceeding of International Symposium on Microarchitecture, pp 146–160

  17. Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems. In: Proceeding of International Symposium on Computer Architecture, pp 63–74

  18. Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: Proceeding of International Symposium on Microarchitecture, pp 208–222

  19. Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69

    Article  Google Scholar 

  20. Rafique N, Lim W-T, Thottethodi M (2007) Effective management of DRAM bandwidth in multicore processors. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp 245–258

  21. Rixner S, Dally W, Kapsi U, Matton P, Owens J (2000) Memory access scheduling. In: Proceeding of International Symposium on Computer Architecture, pp 128–138

  22. Shao J, Davis B (2007) A burst scheduling access reordering mechanism. In: Proceeding of High-Performance Computer Architecture, pp 285–294.

  23. Therdsteerasukdi K, Byun G, Cong J, Chang M-F, Reinman G (2012) Effective management of DRAM bandwidth in multicore processors utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system. ACM Trans Archit Code Optim 8(4):51–69

    Article  Google Scholar 

  24. Van Hook T, Tang M-K (2001) Memory processing system and method for accessing memory including reordering memory requests to reduce mode switching. US Patent 6,564,304 B1 Oct 2001

  25. Wu C-C, Pean D-L, Chen C (1998) Look-ahead memory consistency model. In: Proceeding of the International Conference on Parallel and Distributed Systems, pp 504–510

  26. Yuan G, Bakhoda A, Aamodt T (2009) Complexity effective memory access scheduling for many-core accelerator architectures. In: Proceeding of International Symposium on Microarchitecture, pp 34–44

  27. Zheng H, Lin J, Zhang Z, Zhu Z (2008) Memory access scheduling schemes for systems with multi-core processors. In: Proceeding of International Conference on Parallel Processing, pp 406–413

  28. Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent 5,630,096 May 1997

Download references

Acknowledgments

This work is supported in part by the National Science Council of Republic of China, Taiwan under Grant NSC 101-2221-E-033-049.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chih-Chieh Hsiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsiao, CC., Lo, MJ. & Chu, SL. Demand look-ahead memory access scheduling for 3D graphics processing units. Multimed Tools Appl 73, 1391–1416 (2014). https://doi.org/10.1007/s11042-013-1639-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1639-x

Keywords

Navigation