Demand look-ahead memory access scheduling for 3D graphics processing units

Hsiao, Chih-Chieh; Lo, Min-Jen; Chu, Slo-Li

doi:10.1007/s11042-013-1639-x

Demand look-ahead memory access scheduling for 3D graphics processing units

Published: 06 August 2013

Volume 73, pages 1391–1416, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chih-Chieh Hsiao¹,
Min-Jen Lo¹ &
Slo-Li Chu¹

175 Accesses
3 Altmetric
Explore all metrics

Abstract

With the rapid growing complexity of 3D applications, the memory subsystem has become the most bandwidth-exhausting bottleneck in a Graphics Processing Unit (GPU). To produce realistic images, tens to hundreds of thousands of primitives are used. Furthermore, each primitive generates thousands of pixels, and these pixels are computed by shaders with special effects, even to blend multiple texture pixels from external memory to obtain a final color. To hide the long latency texture operations, the shaders are usually highly multithreaded to increase its throughput. However, conventional memory scheduling mechanisms are unaware of the producer-consumer relationship between primitives and pixels. The conventional scheduling mechanisms neither assume that all initiators are independent nor that they use a fixed priority scheme. This paper proposes Demand Look-Ahead (DLA) memory access scheduling based on the statuses of each unit in the GPU, and dynamically generates priority for the memory request scheduler. By considering the producer-consumer relationship, the proposed mechanism reschedules most urgent requests to be serviced first. Experimental results show that the proposed DLA improves 1.47 % and 1.44 % in FPS and IPC, respectively, than First-Ready First-Come-First-Serve (FR-FCFS). By integrating DLA with Bank-level Parallelism Awareness (BPA), DLA-BPA improves FPS and IPC by 7.28 % and 6.55 %, respectively. Furthermore, shader thread performance is improved by 22.06 % and increases the attainable bandwidth by 5.91 % with DLA-BPA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

In-Memory Computing with 6T SRAM for Multi-operator Logic Design

Article 19 August 2023

Notes

In the following sections, the term “rendering batch” refers to the batch used in graphics rendering. Otherwise, the batch represents a group of memory requests for memory access scheduling.

References

Ausavarungnirun R, Chang K-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: Proceedings of the 39th International Symposium on Computer Architecture, pp 416–427
Ebrahimi E, Miftakhutdinov R, Falling C, Lee CJ, Joao JA, Mutlu O, Patt YN (2011) Parallel application memory scheduling. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp 362–373
Hong S, Mckee S, Salinas M, Klenke R, Aylor J, Wulf W (1999) Access order and effective bandwidth for streams on a direct rambus memory. In: Proceeding of High-Performance Computer Architecture, pp 80–89
Hynix (2006) 512M (16Mx32) GDDR3 SDRAM HY5RS123235FP Specification
Jeong MK, Erez M, Sudanthi C, Paver N (2012) A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In: Proceeding of Design Automation Conference, pp 850–855
Joao JA, Suleman AM, Mutlu O, Patt YN (2012) Bottleneck identification and scheduling in multithreaded applications. In: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, pp 223–234
Juffa N, Coon B (2011) Maximized memory throughput using cooperative thread arrays. US Patent 7,925,860 B1 Apr 1998
Kim Y, Han D, Mutlu O, Harcol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: Proceedings of the 16th International Symposium on High-Performance Computer Architecture, pp 1–12
Kim Y, Papamichael M, Mutlu O, Harcol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp 65–76
Kruger F (2008) High bandwidth memory technology: system architecture implications and perspective. In: Hot chips 20
Lee J, Lakshminarayana N, Kim H, Vuduc R (2010) Many-thread aware prefetching mechanisms for GPGPU applications. In: Proceeding of International Symposium on Microarchitecture, pp 213–224
Mantor M (2007) AMD’s Radeon HD 2900 2nd Generation Unified Shader Architecture. In: Hot Chips 19
Mizuyabu C, Chow P, Swan P, Wang C (2003) Method and apparatus for memory access scheduling in a video graphics system. US Patent 6,297,832 B1 May 2003
Moya V, Gonzalez C, Roca J, Fernandez A, Espana R (2006) ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In: Proceeding of IEEE International Symposium on Performance Analysis of Systems and Software, pp 231–241
Moya V, Gonzalez C, Solis C, Fernandez A, Espana R (2006) Workload characterization of 3D games. In: Proceeding of IEEE International Symposium on Workload Characterization, pp 17–26
Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: Proceeding of International Symposium on Microarchitecture, pp 146–160
Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems. In: Proceeding of International Symposium on Computer Architecture, pp 63–74
Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: Proceeding of International Symposium on Microarchitecture, pp 208–222
Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69
Article Google Scholar
Rafique N, Lim W-T, Thottethodi M (2007) Effective management of DRAM bandwidth in multicore processors. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp 245–258
Rixner S, Dally W, Kapsi U, Matton P, Owens J (2000) Memory access scheduling. In: Proceeding of International Symposium on Computer Architecture, pp 128–138
Shao J, Davis B (2007) A burst scheduling access reordering mechanism. In: Proceeding of High-Performance Computer Architecture, pp 285–294.
Therdsteerasukdi K, Byun G, Cong J, Chang M-F, Reinman G (2012) Effective management of DRAM bandwidth in multicore processors utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system. ACM Trans Archit Code Optim 8(4):51–69
Article Google Scholar
Van Hook T, Tang M-K (2001) Memory processing system and method for accessing memory including reordering memory requests to reduce mode switching. US Patent 6,564,304 B1 Oct 2001
Wu C-C, Pean D-L, Chen C (1998) Look-ahead memory consistency model. In: Proceeding of the International Conference on Parallel and Distributed Systems, pp 504–510
Yuan G, Bakhoda A, Aamodt T (2009) Complexity effective memory access scheduling for many-core accelerator architectures. In: Proceeding of International Symposium on Microarchitecture, pp 34–44
Zheng H, Lin J, Zhang Z, Zhu Z (2008) Memory access scheduling schemes for systems with multi-core processors. In: Proceeding of International Conference on Parallel Processing, pp 406–413
Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent 5,630,096 May 1997

Download references

Acknowledgments

This work is supported in part by the National Science Council of Republic of China, Taiwan under Grant NSC 101-2221-E-033-049.

Author information

Authors and Affiliations

Department of Information and Computer Engineering, Chung Yuan Christian University, 200, Chung Pei Rd., Chung Li, 32023, Taiwan
Chih-Chieh Hsiao, Min-Jen Lo & Slo-Li Chu

Authors

Chih-Chieh Hsiao
View author publications
You can also search for this author in PubMed Google Scholar
Min-Jen Lo
View author publications
You can also search for this author in PubMed Google Scholar
Slo-Li Chu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chih-Chieh Hsiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsiao, CC., Lo, MJ. & Chu, SL. Demand look-ahead memory access scheduling for 3D graphics processing units. Multimed Tools Appl 73, 1391–1416 (2014). https://doi.org/10.1007/s11042-013-1639-x

Download citation

Published: 06 August 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11042-013-1639-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Demand look-ahead memory access scheduling for 3D graphics processing units

Abstract

Access this article

Similar content being viewed by others

In-memory database acceleration on FPGAs: a survey

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

In-Memory Computing with 6T SRAM for Multi-operator Logic Design

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Demand look-ahead memory access scheduling for 3D graphics processing units

Abstract

Access this article

Similar content being viewed by others

In-memory database acceleration on FPGAs: a survey

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

In-Memory Computing with 6T SRAM for Multi-operator Logic Design

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation