OpenMP Target Device Offloading for the SX-Aurora TSUBASA Vector Engine

  • Tim CramerEmail author
  • Manoel Römmer
  • Boris Kosmynin
  • Erich Focht
  • Matthias S. Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12043)


Driven by the heterogeneity trend in modern supercomputers, OpenMP provides support for heterogeneous systems since 2013. Having a single programming model for all kinds of accelerator-based systems decreases the burden of code porting to different device types. The acceptance of this heterogeneous paradigm requires the availability of corresponding OpenMP compiler and runtime environments supporting different target device architectures. The LLVM/Clang infrastructure is designated to extend the offloading features for any new target platform. However, this supposes a compatible compiler backend for the target architecture. In order to overcome this limitation we present a source-to-source code transformation technique which outlines the OpenMP code regions for the target device. By combining this technique with a corresponding communication layer, we enable OpenMP target offloading to the NEC SX-Aurora TSUBASA vector engine, which represents the new generation of vector computing.


HPC OpenMP Offloading Vector computing SIMD 


  1. 1.
  2. 2.
    Álvarez, Á., Ugarte, Í., Fernández, V., Sánchez, P.: OpenMP dynamic device offloading in heterogeneous platforms. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 109–122. Springer, Cham (2019). Scholar
  3. 3.
    Antao, S.F., et al.: Offloading support for OpenMP in Clang and LLVM. In: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC, LLVM-HPC 2016, pp. 1–11. IEEE Press, Piscataway (2016)Google Scholar
  4. 4.
    Bertolli, C., et al.: Integrating GPU support for OpenMP offloading directives into Clang. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. ACM, New York (2015)Google Scholar
  5. 5.
    Diaz, J.M., Pophale, S., Friedline, K., Hernandez, O., Bernholdt, D.E., Chandrasekaran, S.: Evaluating support for OpenMP offload features. In: Proceedings of the 47th International Conference on Parallel Processing Companion, ICPP 2018, pp. 31:1–31:10. ACM, New York (2018)Google Scholar
  6. 6.
    Diaz, J.M., Pophale, S., Hernandez, O., Bernholdt, D.E., Chandrasekaran, S.: OpenMP 4.5 validation and verification suite for device offload. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 82–95. Springer, Cham (2018). Scholar
  7. 7.
    Hart, A.: First experiences porting a parallel application to a hybrid supercomputer with OpenMP4.0 device constructs. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 73–85. Springer, Cham (2015). Scholar
  8. 8.
    Ishizaka, K., Marukawa, K., Focht, E., Moll, S., Kurtenacker, M., Hack, S.: NEC SX-Aurora - A Scalable Vector Architecture. LLVM Developers’ Meeting (2018)Google Scholar
  9. 9.
    Mitra, G., Stotzer, E., Jayaraj, A., Rendell, A.P.: Implementation and optimization of the OpenMP accelerator model for the TI keystone II architecture. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 202–214. Springer, Cham (2014). Scholar
  10. 10.
    Newburn, C.J., et al.: Offload compiler runtime for the Intel® Xeon Phi coprocessor. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 1213–1225, May 2013Google Scholar
  11. 11.
    OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 5.0, November 2018Google Scholar
  12. 12.
    Sommer, L., Korinth, J., Koch, A.: OpenMP device offloading to FPGA accelerators. In: 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 201–205, July 2017Google Scholar
  13. 13.
    Yamada, Y., Momose, S.: Vector Engine Processor of NEC’s Brand-New Supercomputer SX-Aurora TSUBASA. Hot Chips Symposium on High Performance Chips, August 2018. Accessed 05/19

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Tim Cramer
    • 1
    Email author
  • Manoel Römmer
    • 1
  • Boris Kosmynin
    • 1
  • Erich Focht
    • 2
  • Matthias S. Müller
    • 1
  1. 1.IT Center, RWTH Aachen UniversityAachenGermany
  2. 2.NEC CooperationStuttgartGermany

Personalised recommendations