Power-Aware L1 and L2 Caches for GPGPUs

Atoofian, Ehsan; Manzak, Ali

doi:10.1007/978-3-319-09873-9_30

Ehsan Atoofian¹⁶ &
Ali Manzak¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2736 Accesses
1 Citations

Abstract

General Purpose Graphics Processing Units (GPGPUs) employ several levels of memory to execute hundreds of threads concurrently. L₁ and L₂ caches are critical to performance of GPGPUs but they are extremely power hungry due to the large number of cores they need to serve. This paper focuses on power consumption of L₁ data caches and L₂ cache in GPGPUs and proposes two optimization techniques: the first optimization technique places idle cache blocks into drowsy state to reduce leakage power. Our evaluations show that cache blocks are idle for long intervals and putting them into drowsy mode immediately after each access reduces leakage power dramatically with negligible impact on performance. The second optimization technique reduces dynamic power of caches. In GPGPU applications, many warps have inactive threads due to branch divergence. Existing GPGPU architectures access cache blocks for both active and inactive threads, wasting power of caches. We use active mask of GPGPUs and access only the portion of cache blocks that are required by active threads. By dynamically disabling unnecessary sections of cache blocks, we are able to reduce dynamic power of caches significantly.

Download to read the full chapter text

Chapter PDF

A Study on L1 Data Cache Bypassing Methods for High-Performance GPUs

CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs

Article 01 February 2018

Power-efficient prefetching on GPGPUs

Article 05 December 2014

Keywords

References

Kaxiras, S., Hu, Z., Martonosi, M.: Cache decay: Exploiting generational behavior to reduce cache leakage power. In: Proceedings of ISCA, pp. 240–251 (2001)
Google Scholar
Gebhart, M., et al.: Unifying primary cache, scratch, and register file memories in a throughput processor. In: Proceedings of MICRO-45, pp. 96–106 (2012)
Google Scholar
Bakhoda, A., Yuan, G., Fung, W., Wong, H., Aamodt, T.: Analyzing CUDA workloads using a detailed GPU simulator. In: Proceedings of ISPASS (April 2009)
Google Scholar
Bakhoda, A., Kim, J., Aamodt, T.: Throughput-effective On-chip Networks for Manycore Accelerators. In: MICRO (2010)
Google Scholar
Fung, W., Sham, I., Yuan, G., Aamodt, T.: DynamicWarp Formation and Scheduling for Efficient GPU Control Flow. In: MICRO (2007)
Google Scholar
Boettcher, M., et al.: MALEC: A Multiple Access Low Energy Cache. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 368–373 (2013)
Google Scholar
Sankaranarayanan, A., Ardestani, E.K., Briz, J.L., Renau, J.: An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches. In: ISLPED, pp. 9–14 (2013)
Google Scholar
Flautner, K., et al.: Drowsy caches: Simple techniques for reducing leakage power. In: Proceedings of ISCA, pp. 148–157 (2002)
Google Scholar
NVIDIA Corp. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi (2009)
Google Scholar
NVIDIA. CUDA Programming Guide Version 5.0 (2013)
Google Scholar
NVIDIA Corp. NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110 (2012)
Google Scholar
Arizona state university predictive technology model, http://ptm.asu.edu
Demers, E.: Evolution of AMD graphics, AMD Fusion Developer Summit (2011)
Google Scholar
Agrawal, A., Jain, P., Ansari, A., Torrellas, J.: Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies. In: Proceedings of HPCA (2013)
Google Scholar
Muralimanoharet, N., et al.: Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In: Proceedings of MICRO (2007)
Google Scholar
Abdel-Majeed, M., Annavaram, M.: Warped Register File: A Power Efficient Register File for GPGPUs. In: Proceedings of HPCA (2013)
Google Scholar
Gebhart, M., et al.: Energy-efficient mechanisms for managing thread context in throughput processors. In: Proceedings of the ISCA, pp. 235–246 (2011)
Google Scholar
NVIDIA. CUDA C/C++ SDK code samples (2013)
Google Scholar
Atoofian, E.: Reducing Static and Dynamic Power of L1 Data Caches in GPGPUs. In: Proceedings of HPPAC, Phoenix AZ (2014)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.-H., Skadron, K.: Rodinia: A Benchmark Suite for Heterogeneous Computing. In: IISWC (2009)
Google Scholar
Stratton, J.A., et al.: Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing (2012)
Google Scholar
Zhou, H., et al.: Adaptive mode-control: A static-power-efficient cache design. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques (2001)
Google Scholar
Yoshimoto, M., et al.: A divided word-line structure in the static ram and its application to a 64k full cmos ram. IEEE Journal of Solid-State Circuits 18(5), 479–485 (1983)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, Lakehead University, Thunder Bay, Canada
Ehsan Atoofian & Ali Manzak

Authors

Ehsan Atoofian
View author publications
You can also search for this author in PubMed Google Scholar
Ali Manzak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Atoofian, E., Manzak, A. (2014). Power-Aware L₁ and L₂ Caches for GPGPUs. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Power-Aware L₁ and L₂ Caches for GPGPUs

Abstract

Chapter PDF

Similar content being viewed by others

A Study on L1 Data Cache Bypassing Methods for High-Performance GPUs

CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs

Power-efficient prefetching on GPGPUs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Power-Aware L1 and L2 Caches for GPGPUs

Abstract

Chapter PDF

Similar content being viewed by others

A Study on L1 Data Cache Bypassing Methods for High-Performance GPUs

CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs

Power-efficient prefetching on GPGPUs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Power-Aware L₁ and L₂ Caches for GPGPUs