An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

Nasri, Adnan; Fathy, Mahmood; Broumandnia, Ali

doi:10.1007/s11227-017-2180-x

An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

Published: 04 December 2017

Volume 74, pages 1547–1561, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Adnan Nasri¹,
Mahmood Fathy² &
Ali Broumandnia³

355 Accesses
4 Citations
Explore all metrics

Abstract

This paper focuses on energy consumption which is a major problem in the dark silicon era. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Here, we show how spin-transfer torque random access memory (STT-RAM) can be used as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access and non-volatility make STT-RAM a significant technology for on-chip memories. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running emerging scale-out workloads. Experiment results on the CloudSuite benchmarks show that the proposed method reduces energy by 51% (on average) and improves energy delay product by 37% (on average) where instruction per cycle degradation is only 22% (on average) compared to the SRAM method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPs

Improving the Performance of STT-MRAM LLC Through Enhanced Cache Replacement Policy

STT-RAM Cache Hierarchy Design and Exploration with Emerging Magnetic Devices

References

Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18
Article Google Scholar
Awada U, Li K, Shen Y (2014) Energy consumption in cloud computing data centers. Int J Cloud Comput Serv Sci 3(3):145–162. https://search.proquest.com/openview/ba8d06da1291e9a4326e00c63654707f/1?pq-origsite=gscholar&cbl=1686342
Rong H, Zhang H, Xiao S, Li C, Chunhua H (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691
Article Google Scholar
Toosi AN, Calheiros RN, Buyya R (2014) Interconnected cloud computing environments: challenges, taxonomy, and survey. ACM Comput Surv (CSUR) 47(1):7
Article Google Scholar
Mihailescu M, Teo YM (2010) Dynamic resource pricing on federated clouds. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society, pp 513–517
Ferdman M, Adileh A, Kocberber O et al (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-out: microarchitecting a scale-out processor. In Proceedings of the 2012 45th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 177–187
Johnson P, Marker T (2009) Data center energy efficiency product profile. In: Pitt & Sherry, report to equipment energy efficiency committee (E3) of the Australian Government Department of the Environment, Water, Heritage and the Arts (DEWHA)
Rong H, Zhang H, Xiao S, Li C, Hu C (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691
Article Google Scholar
Wang Q, Shen L, Wang Z (2013) Research on scale-out workloads and optimal design of multicore processors. In: Proceedings of International Conference on Soft Computing Techniques and Engineering Application
Apalkov D, Khvalkovskiy A, Watts S, Nikitin V, Tang X, Lottis D, Driskill-Smith A (2013) Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J Emerg Technol Comput Syst 9(2):13
Article Google Scholar
Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr VLSI Syst 24(3):954–967
Article Google Scholar
Lotfi-Kamran P, Modarressi M, Sarbazi-Azad H (2016) An efficient hybrid-switched network-on-chip for chip multiprocessors. IEEE Trans Comput 65(5):1656–1662
Article MathSciNet MATH Google Scholar
Karakostas V, Unsal OS, Nemirovsky M, Cristal A, Swift M (2014) Performance analysis of the memory management unit under scale-out workloads. In: 2014 IEEE international symposium on Workload characterization (IISWC). IEEE, pp 1–12
Jevdjic D, Loh GH, Kaynak C, Falsafi B (2014) Unison cache: a scalable and effective die-stacked DRAM cache. In: 2014 47th annual IEEE/ACM international symposium on microarchitecture. IEEE, pp 25–37
Jevdjic D, Volos S, Falsafi B (2013) Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. ACM SIGARCH Comput Archit News 41(3):404–415
Article Google Scholar
Wang Z, Jiménez DA, Xu C, Sun G, Xie Y (2014) Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In: High performance computer architecture (HPCA), pp 13–24
Chen Y-T, Cong J, Huang H, Liu B, Liu C, Potkonjak M, Reinman G (2012) Dynamically reconfigurable hybrid cache: an energy efficient last-level cache design. In: DATE’12, pp 45–50
Ahn J, Yoo S, Choi K (2015) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput 65(3):940–951
Article MathSciNet MATH Google Scholar
Valero A, Sahuquillo J, Petit S, Lopez P, Duato J (2015) Design of hybrid second-level caches. IEEE Trans Comput 64(7):1884–1897
Article MathSciNet MATH Google Scholar
Zhou Z, Ju L, Jia Z, Li X (2015) Managing hybrid on-chip scratchpad and cache memories for multi-tasking embedded systems. In: 20th Asia and South Pacific Design Automation Conference (ASP-DAC’15), pp 423–428
Qian C, Huang L, Xie P, Xiao N, Wang Z (2015) A study on non-volatile 3d stacked memory for big data applications. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 103–118
Onsori S, Asad A, Raahemifar K, Fathy M (2016) An energy-efficient heterogeneous memory architecture for future dark silicon embedded chip-multiprocessors. IEEE Trans Emerg Top Comput. https://doi.org/10.1109/TETC.2016.2563323
Asad A, Ozturk O, Fathy M, Jahed-Motlagh MR (2017) Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy. Microprocess Microsyst 51:76–98
Article Google Scholar
Onsori S, Asad A, Raahemifar K, Fathy M (2016) OptMem: dark-silicon aware low latency hybrid memory design. In: 2016 International Conference on VLSI Systems, Architectures, Technology and Applications (VLSI-SATA). IEEE, pp 1–5
Onsori S, Asad A, Ozturk O, Fathy M (2015) Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach. In: 2015 Sixth International on Green Computing Conference and Sustainable Computing Conference (IGSC). IEEE, pp 1–7
Senni S, Torres L, Sassatelli G, Gamatie A, Mussard B (2016) Exploring MRAM technologies for energy efficient systems-on-chip. IEEE J Emerg Sel Top Circuits Syst 6(3):279–292
Article Google Scholar
Gordon A, Amit N, Har’El N, Ben-Yehuda M, Landau A, Assaf S, Tsafrir D (2012) ELI: bare-metal performance for I/O virtualization. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, London, pp 411–422
Li D, Liao X, Jin H, Zhou B, Zhang Q (2013) A new disk I/O model of virtualized cloud environment. IEEE Trans Parallel Distrib Syst 24(6):1129–1138
Article Google Scholar
Duolikun D, Enokido T, Takizawa M (2017) An energy-aware algorithm to migrate virtual machines in a server cluster. Int J Space Based Situat Comput 7(1):32–42
Article Google Scholar
Xilong Q, Peng X (2015) An energy-efficient virtual machine scheduler based on cpu share-reclaiming policy. Int J Grid Util Comput 6(2):113–120
Article Google Scholar
Wang J, Zhang J, Zhang W, Qiu K, Li T, Wu M (2015) Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, p 4
Pahlevan A, Picorel J, Zarandi AP, Rossi D, Zapater M, Bartolini A et al (2016) Towards near-threshold server processors. In: 2016 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, pp 7–12
Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C et al (2005)A novel non-volatile memory with spin torque transfer magnetization switching: spin-ram. In: IEEE international electron devices meeting, 2005. IEDM technical digest. IEEE, pp 459–462
Niknam S, Asad A, Fathy M, Rahmani AM (2015) Energy efficient 3D Hybrid processor-memory architecture for the dark silicon age. In: 2015 10th International symposium on reconfigurable communication-centric systems-on-chip (ReCoSoC). IEEE, pp 1–8
Loh GH (2008) 3D-stacked memory architectures for multi-core processors. ACM SIGARCH Comput Archit News 36(3):453–464
Article Google Scholar
Wenisch T, Wunderlich R, Ferdman M, Ailamaki A, Falsafi B, Hoe J (2006) SimFlex: statistical sampling of computer system simulation. IEEE Micro 26(4):18–31
Article Google Scholar
Palesi M, Kumar S, Patti D (2010) Noxim: network-on-chip simulator. http://noxim.sourceforge.net. Accessed 28 Feb 2017
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Annual IEEE/ACM international symposium on micro-42, pp 469–480
Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. HP Laboratories, technical report
Dong X, Xu C, Jouppi N, and Xie Y (2014) NVSim: a circuit-level performance, energy, and area model for emerging non-volatile memory. In: Xie Y (ed) Emerging memory technologies. Springer, New York, pp 15–50. http://www.springer.com/gp/book/9781441995506?wt_mc=ThirdParty.SpringerLink.3.EPR653.About_eBook
CloudSuite 1.0 (2012) [Online]. http://parsa.epfl.ch/cloudsuite. Accessed 10 Mar 2017
Vazquez C, Krishnan R, John E (2014) Cloud computing benchmarking: a survey. In: Proceedings of the International Conference on Grid Computing and Applications (GCA)
Breternitz M, Lowery K, Charnoff A, Kaminski P, Piga L (2012) Cloud workload analysis with SWAT. In: 2012 IEEE 24th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 92–99
Chen E, Lottis D, Driskill-Smith A, Druist D, Nikitin V, Watts S, Tang X, Apalkov D (2010) Non-volatile spin-transfer torque RAM (STT-RAM). In: Device Research Conference (DRC). IEEE, pp 249–252

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Adnan Nasri
Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
Mahmood Fathy
Department of Computer Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran
Ali Broumandnia

Authors

Adnan Nasri
View author publications
You can also search for this author in PubMed Google Scholar
Mahmood Fathy
View author publications
You can also search for this author in PubMed Google Scholar
Ali Broumandnia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahmood Fathy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nasri, A., Fathy, M. & Broumandnia, A. An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads. J Supercomput 74, 1547–1561 (2018). https://doi.org/10.1007/s11227-017-2180-x

Download citation

Published: 04 December 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11227-017-2180-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

Abstract

Access this article

Similar content being viewed by others

An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPs

Improving the Performance of STT-MRAM LLC Through Enhanced Cache Replacement Policy

STT-RAM Cache Hierarchy Design and Exploration with Emerging Magnetic Devices

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

Abstract

Access this article

Similar content being viewed by others

An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPs

Improving the Performance of STT-MRAM LLC Through Enhanced Cache Replacement Policy

STT-RAM Cache Hierarchy Design and Exploration with Emerging Magnetic Devices

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation