Advertisement

Exploiting Reconfigurable Vector Processing for Energy-Efficient Computation in 3D-Stacked Memories

  • João Paulo C. de LimaEmail author
  • Paulo C. Santos
  • Rafael F. de Moura
  • Marco A. Z. Alves
  • Antonio C. S. Beck
  • Luigi Carro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11444)

Abstract

Although Processing-in-Memory (PIM) architectures have helped to reduce the effect of the memory wall, the logic placed inside 3D-memories still faces the large disparity between DRAM and CMOS logic operations. Thereby, for a broad range of emerging data-intensive applications, the Functional Units (FUs) are usually underutilized, especially when the application presents poor temporal-locality. As applications demand irregular processing requirements on the different parts of their execution, this behavior can be used to reconfigure energy-reduction techniques, either by scaling frequency or by power-gating functional units. In this paper, we present the application-dependable characteristics that enable dynamic usage of energy-reduction techniques without performance degradation for highly constrained PIM designs. The experimental results show that the exploration of a reconfiguration mechanism can improve PIM system energy efficiency by 5\(\times \) and also can effectively benefit both memory-intensive and compute-intensive applications.

Keywords

Processing in Memory Reconfigurable vector architectures Energy-efficiency 

Notes

Acknowledgment

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and by the Serrapilheira Institute (grant number Serra-1709-16621).

References

  1. 1.
    de Lima, J.P.C., Santos, P.C., Alves, M.A., Beck, A., Carro, L.: Design space exploration for PIM architectures in 3D-stacked memories. In: International Conference on Computing Frontiers, pp. 113–120. ACM (2018)Google Scholar
  2. 2.
    Hu, X., Stow, D., Xie, Y.: Die stacking is happening. IEEE Micro 38(1), 22–28 (2018)CrossRefGoogle Scholar
  3. 3.
    Awan, A.J., Brorsson, M., Vlassov, V., Ayguade, E.: Performance characterization of in-memory data analytics on a modern cloud server. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud), pp. 1–8. IEEE (2015)Google Scholar
  4. 4.
    Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification Rev. 2.0 (2013). http://www.hybridmemorycube.org/
  5. 5.
    Lee, D.U., et al.: 25.2 A 1.2 V 8 GB 8-channel 128 GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29 nm process and TSV. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 432–433, February 2014Google Scholar
  6. 6.
    Zhu, Q., et al.: A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: International 3D Systems Integration Conference (2013)Google Scholar
  7. 7.
    Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not. 49(4), 269–284 (2014)Google Scholar
  8. 8.
    Mittal, S.: A survey of techniques for improving energy efficiency in embedded computing systems. arXiv preprint arXiv:1401.0765 (2014)
  9. 9.
    Nair, R., et al.: Active memory cube: a processing-in-memory architecture for exascale systems. IBM J. Res. Dev. 59(2/3), 17-1 (2015)CrossRefGoogle Scholar
  10. 10.
    Morad, A., Yavits, L., Kvatinsky, S., Ginosar, R.: Resistive GP-SIMD processing-in-memory. ACM Trans. Archit. Code Optim. (TACO) 12(4), 57 (2016)Google Scholar
  11. 11.
    Santos, P.C., Oliveira, G.F., Tome, D.G., Alves, M.A.Z., Almeida, E.C., Carro, L.: Operand size reconfiguration for big data processing in memory. In: 2017 Design, Automation Test in Europe Conference Exhibition (DATE), March 2017Google Scholar
  12. 12.
    Keramidas, G., Petoumenos, P., Kaxiras, S.: Cache replacement based on reuse-distance prediction. In: 25th International Conference on Computer Design, ICCD 2007, pp. 245–250. IEEE (2007)Google Scholar
  13. 13.
    Ding, W., Guttman, D., Kandemir, M.: Compiler support for optimizing memory bank-level parallelism. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 571–582. IEEE Computer Society (2014)Google Scholar
  14. 14.
    Sura, Z., et al.: Data access optimization in a processing-in-memory system. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, p. 6. ACM (2015)Google Scholar
  15. 15.
    Ahmed, H., et al.: A compiler for automatic selection of suitable processing-in-memory instructions. In: Design, Automation and Test in Europe Conference and Exhibition (DATE) (2019)Google Scholar
  16. 16.
    Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39, 1–7 (2011)CrossRefGoogle Scholar
  17. 17.
    Santos, P.C., de Lima, J.P.C., Moura, R.F., Alves, M.A., Beck, A., Carro, L.: Exploring IoT platform with technologically agnostic processing-in-memory framework. In: Proceedings of the Intelligent Embedded Systems Architectures and Applications Workshop. IEEE (2018)Google Scholar
  18. 18.
    Hsieh, K., et al.: Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems. ACM SIGARCH Comput. Archit. News 44(3), 204–216 (2016)CrossRefGoogle Scholar
  19. 19.
    Farmahini-Farahani, A., Ahn, J., Compton, K., Kim, N.: Drama: an architecture for accelerated processing near memory. Comput. Archit. Lett. 14(99), 26–29 (2014)Google Scholar
  20. 20.
    Gao, M., Kozyrakis, C.: HRL: efficient and flexible reconfigurable logic for near-data processing. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 126–137. IEEE (2016)Google Scholar
  21. 21.
    Drumond, M., et al.: The mondrian data engine. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 639–651. IEEE (2017)CrossRefGoogle Scholar
  22. 22.
    Saito, Y., et al.: Leakage power reduction for coarse grained dynamically reconfigurable processor arrays with fine grained power gating technique. In: International Conference on Engineering and Computer Education (2008)Google Scholar
  23. 23.
    Yamamoto, T., Hironaka, K., Hayakawa, Y., Kimura, M., Amano, H., Usami, K.: Dynamic \({\rm V}_{\rm DD}\) switching technique and mapping optimization in dynamically reconfigurable processor for efficient energy reduction. In: Koch, A., Krishnamurthy, R., McAllister, J., Woods, R., El-Ghazawi, T. (eds.) ARC 2011. LNCS, vol. 6578, pp. 230–241. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-19475-7_24CrossRefGoogle Scholar
  24. 24.
    Nowatzki, T., Gangadhar, V., Ardalani, N., Sankaralingam, K.: Stream-dataflow acceleration. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 416–429. IEEE (2017)Google Scholar
  25. 25.
    Stanic, M., et al.: An integrated vector-scalar design on an in-order ARM core. ACM Trans. Archit. Code Optim. (TACO) 14(2), 17 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • João Paulo C. de Lima
    • 1
    Email author
  • Paulo C. Santos
    • 1
  • Rafael F. de Moura
    • 1
  • Marco A. Z. Alves
    • 2
  • Antonio C. S. Beck
    • 1
  • Luigi Carro
    • 1
  1. 1.Informatics InstituteFederal University of Rio Grande do SulPorto AlegreBrazil
  2. 2.Department of InformaticsFederal University of ParanáCuritibaBrazil

Personalised recommendations