Generating GPU Code from a High-Level Representation for Image Processing Kernels

  • Richard Membarth
  • Anton Lokhmotov
  • Jürgen Teich
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7155)


We present a framework for representing image processing kernels based on decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access pattern of a kernel. The framework performs source-to-source translation of kernels expressed in high-level framework-specific C++ classes into low-level CUDA or OpenCL code with effective device-dependent optimizations such as global memory padding for memory coalescing and optimal memory bandwidth utilization. We evaluate the framework on several image filters, comparing generated code against highly-optimized CPU and GPU versions in the popular OpenCV library.


Application Programming Interface Iteration Space Convolution Kernel Memory Access Pattern Kernel Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clang: Clang: A C Language Family Frontend for LLVM (2007–2011),
  2. 2.
    Cornwall, J., Howes, L., Kelly, P., Parsonage, P., Nicoletti, B.: High-Performance SIMT Code Generation in an Active Visual Effects Library. In: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 175–184. ACM (2009)Google Scholar
  3. 3.
    Du, P., Weber, R., Luszczek, P., Tomov, S., Peterson, G., Dongarra, J.: From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming. Tech. rep. (2010)Google Scholar
  4. 4.
    Howes, L., Lokhmotov, A., Donaldson, A.F., Kelly, P.H.J.: Towards Metaprogramming for Parallel Systems on a Chip. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 36–45. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Lin, C., Snyder, L.: Principles of Parallel Programming. Addison-Wesley Publishing Company, USA (2008)Google Scholar
  6. 6.
    NVIDIA: CUDA (2006–2011),
  7. 7.
    Ryoo, S., Rodrigues, C., Stone, S., Stratton, J., Ueng, S., Baghsorkhi, S., Hwu, W.: Program Optimization Carving for GPU Computing. Journal of Parallel and Distributed Computing 68(10), 1389–1401 (2008)CrossRefGoogle Scholar
  8. 8.
    The Khronos Group: OpenCL (2008–2011),
  9. 9.
    Willow Garage: Open Source Computer Vision (OpenCV) (1999–2011),

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Richard Membarth
    • 1
  • Anton Lokhmotov
    • 2
  • Jürgen Teich
    • 1
  1. 1.Hardware/Software Co-Design, Department of Computer ScienceUniversity of Erlangen-NurembergGermany
  2. 2.Media Processing Division, ARMCambridgeUnited Kingdom

Personalised recommendations