Towards Application-Centric Parallel Memories

  • Giulio StramondoEmail author
  • Cătălin Bogdan Ciobanu
  • Ana Lucia Varbanescu
  • Cees de Laat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)


Many applications running on parallel processors and accelerators are bandwidth bound. In this work, we explore the benefits of parallel (scratch-pad) memories to further accelerate such applications. To this end, we propose a comprehensive approach to designing and implementing application-centric parallel memories based on the polymorphic memory-model called PolyMem. Our approach enables the acceleration of a memory-bound region of an application by (1) analyzing the memory access to extract parallel accesses, (2) configuring PolyMem to deliver maximum speed-up for the detected accesses, and (3) building an actual FPGA-based parallel-memory accelerator for this region, with predictable performance. We validate our approach on 10 instances of Sparse-STREAM (a STREAM benchmark adaptation with sparse memory accesses), for which we design and benchmark the corresponding parallel-memory accelerators in hardware. Our results demonstrate that building parallel-memory accelerators is feasible and leads to performance gain, but their efficient integration in heterogeneous platforms remains a challenge.


Polymorphic parallel memory Memory bandwidth improvement Parallel-memory accelerator 


  1. 1.
  2. 2.
    The STREAM benchmark website.
  3. 3.
    Budnik, P., Kuck, D.J.: The organization and use of parallel memories. IEEE Trans. Comput. 100(12), 1566–1569 (1971)CrossRefGoogle Scholar
  4. 4.
    Chung, E.S., Hoe, J.C., Mai, K.: CoRAM: an in-fabric memory architecture for FPGA-based computing. In: FPGA 2011, pp. 97–106 (2011)Google Scholar
  5. 5.
    Ciobanu, C.B., Stramondo, G., de Laat, C., Varbanescu, A.L.: MAX-PolyMem: high-bandwidth polymorphic parallel memories for DFEs. In: IPDPSW 2018 (RAW 2018) (2018)Google Scholar
  6. 6.
    Ciobanu, C.: Customizable register files for multidimensional SIMD architectures. Ph.D. thesis, Delft University of Technology, Delft, Netherlands, March 2013Google Scholar
  7. 7.
    Gou, C., Kuzmanov, G., Gaydadjiev, G.N.: SAMS multi-layout memory: providing multiple views of data to boost SIMD performance. In: ICS, pp. 179–188. ACM (2010)Google Scholar
  8. 8.
    Harper, D.T.: Block, multistride vector, and FFT accesses in parallel memory systems. IEEE Trans. Parallel Distrib. Syst. 2(1), 43–51 (1991)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kuzmanov, G., Gaydadjiev, G., Vassiliadis, S.: Multimedia rectangularly addressable memory. IEEE Trans. Multimed. 8, 315–322 (2006)CrossRefGoogle Scholar
  10. 10.
    McCalpin, J.D.: A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newslett. 19, 25 (1995)CrossRefGoogle Scholar
  11. 11.
    Putnam, A.R., Bennett, D., Dellinger, E., Mason, J., Sundararajan, P.: CHiMPS: a high-level compilation flow for hybrid CPU-FPGA architectures. In: FPGA 2008, p. 261 (2008)Google Scholar
  12. 12.
    Vazirani, V.V.: Approximation Algorithms. Springer, Heidelberg (2013). Scholar
  13. 13.
    Wang, Y., Li, P., Zhang, P., Zhang, C., Cong, J.: Memory partitioning for multidimensional arrays in high-level synthesis. In: DAC, p. 12. ACM (2013)Google Scholar
  14. 14.
    Yang, H.J., Fleming, K., Winterstein, F., Chen, A.I., Adler, M., Emer, J.: Automatic construction of program-optimized FPGA memory networks. In: FPGA 2017, pp. 125–134 (2017)Google Scholar
  15. 15.
    Yin, S., Xie, Z., Meng, C., Liu, L., Wei, S.: Multibank memory optimization for parallel data access in multiple data arrays. In: ICCAD 2016, pp. 1–8 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Giulio Stramondo
    • 1
    Email author
  • Cătălin Bogdan Ciobanu
    • 1
  • Ana Lucia Varbanescu
    • 1
  • Cees de Laat
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations