The Journal of Supercomputing

, Volume 74, Issue 4, pp 1547–1561 | Cite as

An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

  • Adnan Nasri
  • Mahmood Fathy
  • Ali Broumandnia


This paper focuses on energy consumption which is a major problem in the dark silicon era. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Here, we show how spin-transfer torque random access memory (STT-RAM) can be used as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access and non-volatility make STT-RAM a significant technology for on-chip memories. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running emerging scale-out workloads. Experiment results on the CloudSuite benchmarks show that the proposed method reduces energy by 51% (on average) and improves energy delay product by 37% (on average) where instruction per cycle degradation is only 22% (on average) compared to the SRAM method.


Scale-out workloads Data center Cloud processors Nonvolatile memory Energy efficient 


  1. 1.
    Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18CrossRefGoogle Scholar
  2. 2.
    Awada U, Li K, Shen Y (2014) Energy consumption in cloud computing data centers. Int J Cloud Comput Serv Sci 3(3):145–162.
  3. 3.
    Rong H, Zhang H, Xiao S, Li C, Chunhua H (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691CrossRefGoogle Scholar
  4. 4.
    Toosi AN, Calheiros RN, Buyya R (2014) Interconnected cloud computing environments: challenges, taxonomy, and survey. ACM Comput Surv (CSUR) 47(1):7CrossRefGoogle Scholar
  5. 5.
    Mihailescu M, Teo YM (2010) Dynamic resource pricing on federated clouds. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society, pp 513–517Google Scholar
  6. 6.
    Ferdman M, Adileh A, Kocberber O et al (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)Google Scholar
  7. 7.
    Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-out: microarchitecting a scale-out processor. In Proceedings of the 2012 45th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 177–187Google Scholar
  8. 8.
    Johnson P, Marker T (2009) Data center energy efficiency product profile. In: Pitt & Sherry, report to equipment energy efficiency committee (E3) of the Australian Government Department of the Environment, Water, Heritage and the Arts (DEWHA)Google Scholar
  9. 9.
    Rong H, Zhang H, Xiao S, Li C, Hu C (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691CrossRefGoogle Scholar
  10. 10.
    Wang Q, Shen L, Wang Z (2013) Research on scale-out workloads and optimal design of multicore processors. In: Proceedings of International Conference on Soft Computing Techniques and Engineering ApplicationGoogle Scholar
  11. 11.
    Apalkov D, Khvalkovskiy A, Watts S, Nikitin V, Tang X, Lottis D, Driskill-Smith A (2013) Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J Emerg Technol Comput Syst 9(2):13CrossRefGoogle Scholar
  12. 12.
    Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr VLSI Syst 24(3):954–967CrossRefGoogle Scholar
  13. 13.
    Lotfi-Kamran P, Modarressi M, Sarbazi-Azad H (2016) An efficient hybrid-switched network-on-chip for chip multiprocessors. IEEE Trans Comput 65(5):1656–1662MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Karakostas V, Unsal OS, Nemirovsky M, Cristal A, Swift M (2014) Performance analysis of the memory management unit under scale-out workloads. In: 2014 IEEE international symposium on Workload characterization (IISWC). IEEE, pp 1–12Google Scholar
  15. 15.
    Jevdjic D, Loh GH, Kaynak C, Falsafi B (2014) Unison cache: a scalable and effective die-stacked DRAM cache. In: 2014 47th annual IEEE/ACM international symposium on microarchitecture. IEEE, pp 25–37Google Scholar
  16. 16.
    Jevdjic D, Volos S, Falsafi B (2013) Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. ACM SIGARCH Comput Archit News 41(3):404–415CrossRefGoogle Scholar
  17. 17.
    Wang Z, Jiménez DA, Xu C, Sun G, Xie Y (2014) Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In: High performance computer architecture (HPCA), pp 13–24Google Scholar
  18. 18.
    Chen Y-T, Cong J, Huang H, Liu B, Liu C, Potkonjak M, Reinman G (2012) Dynamically reconfigurable hybrid cache: an energy efficient last-level cache design. In: DATE’12, pp 45–50Google Scholar
  19. 19.
    Ahn J, Yoo S, Choi K (2015) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput 65(3):940–951MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Valero A, Sahuquillo J, Petit S, Lopez P, Duato J (2015) Design of hybrid second-level caches. IEEE Trans Comput 64(7):1884–1897MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Zhou Z, Ju L, Jia Z, Li X (2015) Managing hybrid on-chip scratchpad and cache memories for multi-tasking embedded systems. In: 20th Asia and South Pacific Design Automation Conference (ASP-DAC’15), pp 423–428Google Scholar
  22. 22.
    Qian C, Huang L, Xie P, Xiao N, Wang Z (2015) A study on non-volatile 3d stacked memory for big data applications. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 103–118Google Scholar
  23. 23.
    Onsori S, Asad A, Raahemifar K, Fathy M (2016) An energy-efficient heterogeneous memory architecture for future dark silicon embedded chip-multiprocessors. IEEE Trans Emerg Top Comput.
  24. 24.
    Asad A, Ozturk O, Fathy M, Jahed-Motlagh MR (2017) Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy. Microprocess Microsyst 51:76–98CrossRefGoogle Scholar
  25. 25.
    Onsori S, Asad A, Raahemifar K, Fathy M (2016) OptMem: dark-silicon aware low latency hybrid memory design. In: 2016 International Conference on VLSI Systems, Architectures, Technology and Applications (VLSI-SATA). IEEE, pp 1–5Google Scholar
  26. 26.
    Onsori S, Asad A, Ozturk O, Fathy M (2015) Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach. In: 2015 Sixth International on Green Computing Conference and Sustainable Computing Conference (IGSC). IEEE, pp 1–7Google Scholar
  27. 27.
    Senni S, Torres L, Sassatelli G, Gamatie A, Mussard B (2016) Exploring MRAM technologies for energy efficient systems-on-chip. IEEE J Emerg Sel Top Circuits Syst 6(3):279–292CrossRefGoogle Scholar
  28. 28.
    Gordon A, Amit N, Har’El N, Ben-Yehuda M, Landau A, Assaf S, Tsafrir D (2012) ELI: bare-metal performance for I/O virtualization. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, London, pp 411–422Google Scholar
  29. 29.
    Li D, Liao X, Jin H, Zhou B, Zhang Q (2013) A new disk I/O model of virtualized cloud environment. IEEE Trans Parallel Distrib Syst 24(6):1129–1138CrossRefGoogle Scholar
  30. 30.
    Duolikun D, Enokido T, Takizawa M (2017) An energy-aware algorithm to migrate virtual machines in a server cluster. Int J Space Based Situat Comput 7(1):32–42CrossRefGoogle Scholar
  31. 31.
    Xilong Q, Peng X (2015) An energy-efficient virtual machine scheduler based on cpu share-reclaiming policy. Int J Grid Util Comput 6(2):113–120CrossRefGoogle Scholar
  32. 32.
    Wang J, Zhang J, Zhang W, Qiu K, Li T, Wu M (2015) Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, p 4Google Scholar
  33. 33.
    Pahlevan A, Picorel J, Zarandi AP, Rossi D, Zapater M, Bartolini A et al (2016) Towards near-threshold server processors. In: 2016 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, pp 7–12Google Scholar
  34. 34.
    Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C et al (2005)A novel non-volatile memory with spin torque transfer magnetization switching: spin-ram. In: IEEE international electron devices meeting, 2005. IEDM technical digest. IEEE, pp 459–462Google Scholar
  35. 35.
    Niknam S, Asad A, Fathy M, Rahmani AM (2015) Energy efficient 3D Hybrid processor-memory architecture for the dark silicon age. In: 2015 10th International symposium on reconfigurable communication-centric systems-on-chip (ReCoSoC). IEEE, pp 1–8Google Scholar
  36. 36.
    Loh GH (2008) 3D-stacked memory architectures for multi-core processors. ACM SIGARCH Comput Archit News 36(3):453–464CrossRefGoogle Scholar
  37. 37.
    Wenisch T, Wunderlich R, Ferdman M, Ailamaki A, Falsafi B, Hoe J (2006) SimFlex: statistical sampling of computer system simulation. IEEE Micro 26(4):18–31CrossRefGoogle Scholar
  38. 38.
    Palesi M, Kumar S, Patti D (2010) Noxim: network-on-chip simulator. Accessed 28 Feb 2017
  39. 39.
    Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Annual IEEE/ACM international symposium on micro-42, pp 469–480Google Scholar
  40. 40.
    Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. HP Laboratories, technical reportGoogle Scholar
  41. 41.
    Dong X, Xu C, Jouppi N, and Xie Y (2014) NVSim: a circuit-level performance, energy, and area model for emerging non-volatile memory. In: Xie Y (ed) Emerging memory technologies. Springer, New York, pp 15–50.
  42. 42.
    CloudSuite 1.0 (2012) [Online]. Accessed 10 Mar 2017
  43. 43.
    Vazquez C, Krishnan R, John E (2014) Cloud computing benchmarking: a survey. In: Proceedings of the International Conference on Grid Computing and Applications (GCA)Google Scholar
  44. 44.
    Breternitz M, Lowery K, Charnoff A, Kaminski P, Piga L (2012) Cloud workload analysis with SWAT. In: 2012 IEEE 24th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 92–99Google Scholar
  45. 45.
    Chen E, Lottis D, Driskill-Smith A, Druist D, Nikitin V, Watts S, Tang X, Apalkov D (2010) Non-volatile spin-transfer torque RAM (STT-RAM). In: Device Research Conference (DRC). IEEE, pp 249–252Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Computer Engineering, Science and Research BranchIslamic Azad UniversityTehranIran
  2. 2.Department of Computer EngineeringIran University of Science and TechnologyTehranIran
  3. 3.Department of Computer Engineering, South Tehran BranchIslamic Azad UniversityTehranIran

Personalised recommendations