Skip to main content

Offload – Automating Code Migration to Heterogeneous Multicore Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Abstract

We present Offload, a programming model for offloading parts of a C++ application to run on accelerator cores in a heterogeneous multicore system. Code to be offloaded is enclosed in an offload scope; all functions called indirectly from an offload scope are compiled for the accelerator cores. Data defined inside/outside an offload scope resides in accelerator/host memory respectively, and code to move data between memory spaces is generated automatically by the compiler. This is achieved by distinguishing between host and accelerator pointers at the type level, and compiling multiple versions of functions based on pointer parameter configurations using automatic call-graph duplication. We discuss solutions to several challenging issues related to call-graph duplication, and present an implementation of Offload for the Cell BE processor, evaluated using a number of benchmarks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hofstee, H.P.: Power efficient processor architecture and the Cell processor. In: HPCA, pp. 258–262. IEEE, Los Alamitos (2005)

    Google Scholar 

  2. Hoines, E.: A proposal for standard graphics environments. IEEE Comput. Graph. Appl. 7, 3–5 (1987)

    Article  Google Scholar 

  3. Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: Supercomputing, p. 83. ACM, New York (2006)

    Google Scholar 

  4. Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the Cell BE architecture. In: Supercomputing, p. 86. ACM, New York (2006)

    Google Scholar 

  5. Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Buck, I.: Brook specification v0.2., http://merrimac.stanford.edu/brook/

  7. CAPS Enterprise: HMPP, http://www.caps-entreprise.com/hmpp.html

  8. Khronos Group: The OpenCL specification, http://www.khronos.org/opencl

  9. Cooper, K.D., Hall, M.W., Kennedy, K.: A methodology for procedure cloning. Comput. Lang. 19, 105–117 (1993)

    Article  MATH  Google Scholar 

  10. Metzger, R., Stroud, S.: Interprocedural constant propagation: An empirical study. LOPLAS 2, 213–232 (1993)

    Article  Google Scholar 

  11. Bik, A.J.C., Kreitzer, D.L., Tian, X.: A case study on compiler optimizations for the Intel Core 2 Duo processor. International Journal of Parallel Programming 36, 571–591 (2008)

    Article  Google Scholar 

  12. Das, D.: Optimizing subroutines with optional parameters in F90 via function cloning. SIGPLAN Notices 41, 21–28 (2006)

    Article  Google Scholar 

  13. Lokhmotov, A., Gaster, B.R., Mycroft, A., Hickey, N., Stuttard, D.: Revisiting SIMD programming. In: LCPC, Revised Selected Papers, pp. 32–46. Springer, Heidelberg (2008)

    Google Scholar 

  14. Yelick, K.A., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P.N., Graham, S.L., Gay, D., Colella, P., Aiken, A.: Titanium: A high-performance Java dialect. Concurrency - Practice and Experience 10, 825–836 (1998)

    Article  Google Scholar 

  15. Coarfa, C., Dotsenko, Y., Mellor-Crummey, J.M., Cantonnet, F., El-Ghazawi, T.A., Mohanti, A., Yao, Y., Chavarría-Miranda, D.G.: An evaluation of global address space languages: Co-array Fortran and Unified Parallel C. In: PPOPP, pp. 36–47. ACM, New York (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cooper, P., Dolinsky, U., Donaldson, A.F., Richards, A., Riley, C., Russell, G. (2010). Offload – Automating Code Migration to Heterogeneous Multicore Systems. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11515-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11514-1

  • Online ISBN: 978-3-642-11515-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics