OpenACC to Intel Offload: Automatic Translation and Optimization

  • Cheng Chen
  • Canqun Yang
  • Tao Tang
  • Qiang Wu
  • Pengfei Zhang
Part of the Communications in Computer and Information Science book series (CCIS, volume 396)


Heterogeneous architectures with both conventional CPUs and coprocessors become popular in the design of High Performance Computing systems. The programming problems on such architectures are frequently studied. OpenACC standard is proposed to tackle the problem by employing directive-based high-level programming for coprocessors. In this paper, we take advantage of OpenACC to program on the newly Intel MIC coprocessor. We achieve this by automatically translating the OpenACC source code to Intel Offload code. Two optimizations including communication and SIMD optimization are employed. Two kernels i.e. the matrix multiplication and JACOBI, are studied on the MIC-based platform (one knight Corner card) and the GPU-based platform (one NVIDIA Tesla k20c card). Performance evaluation shows that both kernels delivers a speedup of approximately 3 on one knight Corner card than on one Intel Xeon E5-2670 octal-core CPU. Moreover, the two kernels gain better performance on MIC-based platform than on the GPU-based one.


OpenACC Intel Offload Source to Source MIC GPU 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early Experiences with the Intel Many Integrated Cores Accelerated Computing Technology. In: TeraGrid 2011 (July 2011)Google Scholar
  2. 2.
    Elgar, T.: Intel Many Integrated Core (MIC) Architecture. In: 2nd UK GPU Computing Conference (December 2010)Google Scholar
  3. 3.
  4. 4.
    The OpenACC Application Programming Interface, Version 1.0 (November 2011)Google Scholar
  5. 5.
    Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    OpenMP: The OpenMP API Specication for Parallel Programming,
  7. 7.
    MPI-2: Extensions to the Message-Passing Interface, Message Passing Interface Forum (July 1997)Google Scholar
  8. 8.
    I. Corporation. The Intel Xeon phi coprocessor: Parallel processing, unparalleled discover. Intel’ Software Network (2007)Google Scholar
  9. 9.
    Knights Corner Software Developers Guide, revision 1.03 (April 27, 2012)Google Scholar
  10. 10.
    Wu, Q., Yang, C., Tang, T., Xiao, L.: MIC Acceleration of Short-Range Molecular Dynamics Simulations. In: CGOW (January 2013)Google Scholar
  11. 11.
    Reyes, R., Lopez, I., Fumero, J.J., de Sande, F.: Sande.accULL: A User-directed Approach to Heterogeneous Programming (2012)Google Scholar
  12. 12.
    Lee, S., Min, S., Eigenmann, R.: OpenMP to GPGPU: A compiler framework for automatic translation and optimization. SIGPLANNot. (February 2009)Google Scholar
  13. 13.
    Wei, H., Yu, J.: Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core ChipGoogle Scholar
  14. 14.
    Dave, C., Bae, H., Min, S.-J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: A source-to-source compiler infrastructure for multicores. Computer 42(12) (2009)Google Scholar
  15. 15.
    Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: accULL: An OpenACC Implementation with CUDA and OpenCL Support. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 871–882. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Reyes, R., de Sande, F.: Automatic code generation for GPUs in llc. The Journal of Supercomputing 58(3) (March 2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Cheng Chen
    • 1
  • Canqun Yang
    • 1
  • Tao Tang
    • 1
  • Qiang Wu
    • 1
  • Pengfei Zhang
    • 1
  1. 1.School of Computer ScienceNational University of Defense TechnologyChangshaChina

Personalised recommendations