Programmable HSA Accelerators for Zynq UltraScale+ MPSoC Systems

  • Wolfgang BauerEmail author
  • Philipp Holzinger
  • Marc Reichenbach
  • Steffen Vaas
  • Paul Hartke
  • Dietmar Fey
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)


Modern algorithms for virtual reality, machine learning or big data find its way into more and more application fields and result in stricter power per watt requirements. This challenges traditional homogeneous computing concepts and drives the development of new, heterogeneous architectures. One idea to attain a balance of high data throughput and flexibility are GPU-like soft-core processors combined with general purpose CPUs as hosts. However, the approaches proposed in recent years are still not sufficient regarding their integration in a shared hardware environment and unified software stack. The approach of the HSA Foundation provides a complete communication definition for heterogeneous systems but lacks FPGA accelerator support. Our work presents a methodology making soft-core processors HSA compliant within MPSoC systems. This enables high level software programming and therefore eases the accessibility of soft-core FPGA accelerators. Furthermore, the integration effort is kept low by fully utilizing the HSA Foundation standards and toolchains.


Heterogeneous system architecture FPGA Programmable accelerator HSA foundation Zynq ultrascale+ Nyuzi processor 



We want to thank Xilinx and Fidus Systems for providing the used Zynq hardware platforms necessary to conduct our research.


  1. 1.
    Al-Dujaili, A., Deragisch, F., Hagiescu, A., Wong, W.: Guppy: a GPU-like soft-core processor. In: 2012 International Conference on Field-Programmable Technology, FPT 2012, Seoul, Korea (South), 10–12 December 2012, pp. 57–60. IEEE (2012)Google Scholar
  2. 2.
    Altera: Implementing FPGA Design with the OpenCL Standard, November 2013.
  3. 3.
    AMD: ROCm: Open Platform For Development, Discovery and Education around GPU Computing, April 2016.
  4. 4.
    Andryc, K., Merchant, M., Tessier, R.: Flexgrip: A soft GPGPU for FPGAS. In: 2013 International Conference on Field-Programmable Technology (FPT), pp. 230–237, December 2013Google Scholar
  5. 5.
    Balasubramanian, R., et al.: MIAOW - an open source RTL implementation of a GPGPU. In: 2015 IEEE Symposium in Low-Power and High-Speed Chips, COOL CHIPS XVIII, Yokohama, Japan, 13–15 April 2015, pp. 1–3. IEEE (2015)Google Scholar
  6. 6.
    Bush, J., Dexter, P., Miller, T.N., Carpenter, A.: Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, Philadelphia, PA, USA, 29–31 March 2015, pp. 173–182. IEEE Computer Society (2015)Google Scholar
  7. 7.
    Bush, J., Khasawneh, M.A., Mahmoud, K.Z., Miller, T.N.: NyuziRaster: Optimizing rasterizer performance and energy in the Nyuzi open source GPU. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 204–213. IEEE (2016)Google Scholar
  8. 8.
    Canis, A., et al.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2011, ACM, New York, NY, USA, pp. 33–36 (2011)Google Scholar
  9. 9.
    Choi, J., Brown, S., Anderson, J.: From software threads to parallel hardware in high-level synthesis for FPGAs. In: 2013 International Conference on Field-Programmable Technology (FPT), pp. 270–277. IEEE (2013)Google Scholar
  10. 10.
    HSA Foundation: HSA Foundation Specification Version 1.1, May 2016.
  11. 11.
    Kadi, M.A., Huebner, M.: Integer computations with soft GPGPU on FPGAs. In: 2016 International Conference on Field-Programmable Technology (FPT), pp. 28–35, December 2016Google Scholar
  12. 12.
    Mukherjee, S., Sun, Y., Blinzer, P., Ziabari, A.K., Kaeli, D.: A comprehensive performance analysis of HSA and OpenCL 2.0. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 183–193. IEEE (2016)Google Scholar
  13. 13.
    Reichenbach, M., Holzinger, P., Häublein, K., Lieske, T., Blinzer, P., Fey, D.: LibHSA: one step towards mastering the era of heterogeneous hardware accelerators using FPGAs. In: 2017 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–6. IEEE (2017)Google Scholar
  14. 14.
    Samsung: A Mobile Processor That Goes Beyond Mobile Innovation, April 2016.
  15. 15.
    Vaas, S., Reichenbach, M., Fey, D.: An application-specific instruction set processor for power quality monitoring. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 181–188, May 2016Google Scholar
  16. 16.
    Xilinx: The Xilinx SDAccel Development Environment (2014).
  17. 17.
    Xilinx: Vivado Design Suite User Guide: High-Level Synthesis, October 2014.
  18. 18.
    Xilinx: Xilinx Zynq UltraScale+ Device Technical Reference Manual, December 2017.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Wolfgang Bauer
    • 1
    Email author
  • Philipp Holzinger
    • 1
  • Marc Reichenbach
    • 1
  • Steffen Vaas
    • 1
  • Paul Hartke
    • 2
  • Dietmar Fey
    • 1
  1. 1.Department of Computer Science, Chair of Computer ArchitectureFriedrich-Alexander-University Erlangen-Nürnberg (FAU)ErlangenGermany
  2. 2.Xilinx, Inc.San JoseUSA

Personalised recommendations