Skip to main content

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8664))

Abstract

General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages of type safety, managed runtimes and precise exception semantics. Current approaches to enabling GPGPU computing in Java and other managed languages involve low-level interfaces to native code that compromise the semantic guarantees of managed languages, and are not readily accessible to mainstream programmers.

In this paper, we propose compile-time and runtime technique for accelerating Java programs with automatic generation of OpenCL while preserving precise exception semantics. Our approach includes (1) automatic generation of OpenCL kernels and JNI glue code from a Java-based parallel-loop construct (forall), (2) speculative execution of OpenCL kernels on GPUs, and (3) automatic generation of optimized and parallel exception-checking code for execution on the CPU. A key insight in supporting our speculative execution is that the GPU’s device memory is separate from the CPU’s main memory, so that, in the case of a mis-speculation (exception), any side effects in a GPU kernel can be ignored by simply not communicating results back to the CPU.

We demonstrate the efficiency of our approach using eight Java benchmarks on two GPU-equipped platforms. Experimental results show that our approach can significantly accelerate certain classes of Java programs on GPUs while maintaining precise exception semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The OpenCL runtime has to wait for the completion of the kernel execution even in the event of an exception because there is no OpenCL API to terminate kernel on device currently.

  2. 2.

    Theoretically this is unlikely, This is due to variation in timing.

References

  1. APARAPI. API for Data Parallel Java. http://code.google.com/p/aparapi/

  2. Artigas, P.V., et al.: Automatic loop transformations and parallelization for Java. In: Proceedings of the 14th International Conference on Supercomputing, ICS ’00, pp. 1–10. ACM, New York (2000)

    Google Scholar 

  3. Cavé, V., et al.: Habanero-Java: the new adventures of old X10. In: PPPJ’11: Proceedings of 9th International Conference on the Principles and Practice of Programming in Java (2011)

    Google Scholar 

  4. Android Developers. Renderscript. http://developer.android.com/guide/topics/renderscript/index.html

  5. Ebcioğlu, K., Saraswat, V., Sarkar, V.: X10: programming for hierarchical parallelism and nonuniform data access (extended abstract). In: Language Runtimes ’04 Workshop: Impact of Next Generation Processor Architectures On Virtual Machines (Colocated with OOPSLA 2004), October 2004. www.aurorasoft.net/workshops/lar04/lar04home.htm

  6. Hayashi, A., et al.: Accelerating Habanero-Java program with OpenCL generation. In: PPPJ’13: Proceedings of 10th International Conference on the Principles and Practice of Programming in Java (2013, under submission)

    Google Scholar 

  7. Dubach, C., et al.: Compiling a high-level language for GPUs: (via language support for architectures and compilers). In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 1–12. ACM, New York (2012)

    Google Scholar 

  8. Von Ronne, J., et al.: Safe bounds check annotations. Concurrency Computat. Pract. Exper. 21(1), 41–57 (2009)

    Article  MathSciNet  Google Scholar 

  9. Moreira, J.E., et al.: From flop to megaflops: Java for technical computing. ACM Trans. Program. Lang. Syst. 22(2), 265–295 (2000)

    Article  Google Scholar 

  10. Shirako, J., et al.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS ’08, pp. 277–288. ACM, New York (2008)

    Google Scholar 

  11. Shirako, J., et al.: Phaser accumulators: a new reduction construct for dynamic parallelism. In: IPDPS 2009 (2009)

    Google Scholar 

  12. Samadi, M., et al.: Paragon: collaborative speculative loop execution on GPU and CPU. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 64–73. ACM, New York (2012)

    Google Scholar 

  13. Pratt-Szeliga, P.C., et al.: Rootbeer: seamlessly using GPUs from Java. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), June 2012, pp. 375–380 (2012)

    Google Scholar 

  14. Bodík, R., et al.: ABCD: eliminating array bounds checks on demand. SIGPLAN Not. 35(5), 321–333 (2000)

    Article  Google Scholar 

  15. Chandra, S., et al.: Type inference for locality analysis of distributed data structures. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 11–22. ACM, New York (2008)

    Google Scholar 

  16. Würthinger, T., et al.: Array bounds check elimination for the Java HotSpot client compiler. In: Proceedings of the 5th International Symposium on Principles and Practice of Programming in Java, PPPJ ’07, pp. 125–133. ACM, New York (2007)

    Google Scholar 

  17. Yan, Y., Grossman, M., Sarkar, V.: JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Fan, Z., et al.: GPU cluster for high performance computing. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC ’04, p. 47. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  19. Guo, Y., et al.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS ’09: International Parallel and Distributed Processing Symposium (2009)

    Google Scholar 

  20. JGF. The Java Grande Forum benchmark suite. http://www.epcc.ed.ac.uk/javagrande/javag.html

  21. Lublinerman, R., et al.: Delegated isolation. In: OOPSLA ’11: Proceeding of the 26th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (2011)

    Google Scholar 

  22. Manavski, S.A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinform. 9(Suppl 2), S10 (2008)

    Article  Google Scholar 

  23. Parboil. Parboil benchmarks. http://impact.crhc.illinois.edu/parboil.aspx

  24. PolyBench. The polyhedral benchmark suite. http://www.cse.ohio-state.edu/pouchet/software/polybench

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akihiro Hayashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V. (2014). Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09967-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09966-8

  • Online ISBN: 978-3-319-09967-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics