Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

Hayashi, Akihiro; Grossman, Max; Zhao, Jisheng; Shirako, Jun; Sarkar, Vivek

doi:10.1007/978-3-319-09967-5_20

Akihiro Hayashi¹⁷,
Max Grossman¹⁷,
Jisheng Zhao¹⁷,
Jun Shirako¹⁷ &
…
Vivek Sarkar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8664))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

686 Accesses
4 Citations

Abstract

General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages of type safety, managed runtimes and precise exception semantics. Current approaches to enabling GPGPU computing in Java and other managed languages involve low-level interfaces to native code that compromise the semantic guarantees of managed languages, and are not readily accessible to mainstream programmers.

In this paper, we propose compile-time and runtime technique for accelerating Java programs with automatic generation of OpenCL while preserving precise exception semantics. Our approach includes (1) automatic generation of OpenCL kernels and JNI glue code from a Java-based parallel-loop construct (forall), (2) speculative execution of OpenCL kernels on GPUs, and (3) automatic generation of optimized and parallel exception-checking code for execution on the CPU. A key insight in supporting our speculative execution is that the GPU’s device memory is separate from the CPU’s main memory, so that, in the case of a mis-speculation (exception), any side effects in a GPU kernel can be ignored by simply not communicating results back to the CPU.

We demonstrate the efficiency of our approach using eight Java benchmarks on two GPU-equipped platforms. Experimental results show that our approach can significantly accelerate certain classes of Java programs on GPUs while maintaining precise exception semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The OpenCL runtime has to wait for the completion of the kernel execution even in the event of an exception because there is no OpenCL API to terminate kernel on device currently.
2.
Theoretically this is unlikely, This is due to variation in timing.

References

APARAPI. API for Data Parallel Java. http://code.google.com/p/aparapi/
Artigas, P.V., et al.: Automatic loop transformations and parallelization for Java. In: Proceedings of the 14th International Conference on Supercomputing, ICS ’00, pp. 1–10. ACM, New York (2000)
Google Scholar
Cavé, V., et al.: Habanero-Java: the new adventures of old X10. In: PPPJ’11: Proceedings of 9th International Conference on the Principles and Practice of Programming in Java (2011)
Google Scholar
Android Developers. Renderscript. http://developer.android.com/guide/topics/renderscript/index.html
Ebcioğlu, K., Saraswat, V., Sarkar, V.: X10: programming for hierarchical parallelism and nonuniform data access (extended abstract). In: Language Runtimes ’04 Workshop: Impact of Next Generation Processor Architectures On Virtual Machines (Colocated with OOPSLA 2004), October 2004. www.aurorasoft.net/workshops/lar04/lar04home.htm
Hayashi, A., et al.: Accelerating Habanero-Java program with OpenCL generation. In: PPPJ’13: Proceedings of 10th International Conference on the Principles and Practice of Programming in Java (2013, under submission)
Google Scholar
Dubach, C., et al.: Compiling a high-level language for GPUs: (via language support for architectures and compilers). In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 1–12. ACM, New York (2012)
Google Scholar
Von Ronne, J., et al.: Safe bounds check annotations. Concurrency Computat. Pract. Exper. 21(1), 41–57 (2009)
Article MathSciNet Google Scholar
Moreira, J.E., et al.: From flop to megaflops: Java for technical computing. ACM Trans. Program. Lang. Syst. 22(2), 265–295 (2000)
Article Google Scholar
Shirako, J., et al.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS ’08, pp. 277–288. ACM, New York (2008)
Google Scholar
Shirako, J., et al.: Phaser accumulators: a new reduction construct for dynamic parallelism. In: IPDPS 2009 (2009)
Google Scholar
Samadi, M., et al.: Paragon: collaborative speculative loop execution on GPU and CPU. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 64–73. ACM, New York (2012)
Google Scholar
Pratt-Szeliga, P.C., et al.: Rootbeer: seamlessly using GPUs from Java. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), June 2012, pp. 375–380 (2012)
Google Scholar
Bodík, R., et al.: ABCD: eliminating array bounds checks on demand. SIGPLAN Not. 35(5), 321–333 (2000)
Article Google Scholar
Chandra, S., et al.: Type inference for locality analysis of distributed data structures. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 11–22. ACM, New York (2008)
Google Scholar
Würthinger, T., et al.: Array bounds check elimination for the Java HotSpot client compiler. In: Proceedings of the 5th International Symposium on Principles and Practice of Programming in Java, PPPJ ’07, pp. 125–133. ACM, New York (2007)
Google Scholar
Yan, Y., Grossman, M., Sarkar, V.: JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009)
Chapter Google Scholar
Fan, Z., et al.: GPU cluster for high performance computing. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC ’04, p. 47. IEEE Computer Society, Washington, DC (2004)
Google Scholar
Guo, Y., et al.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS ’09: International Parallel and Distributed Processing Symposium (2009)
Google Scholar
JGF. The Java Grande Forum benchmark suite. http://www.epcc.ed.ac.uk/javagrande/javag.html
Lublinerman, R., et al.: Delegated isolation. In: OOPSLA ’11: Proceeding of the 26th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (2011)
Google Scholar
Manavski, S.A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinform. 9(Suppl 2), S10 (2008)
Article Google Scholar
Parboil. Parboil benchmarks. http://impact.crhc.illinois.edu/parboil.aspx
PolyBench. The polyhedral benchmark suite. http://www.cse.ohio-state.edu/pouchet/software/polybench

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rice University, Houston, TX, USA
Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako & Vivek Sarkar

Authors

Akihiro Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Max Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Jisheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shirako
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akihiro Hayashi .

Editor information

Editors and Affiliations

Silicon Valley, Qualcomm Research, San Jose, California, USA
Călin Cașcaval
Silicon Valley, Qualcomm Research, San Jose, California, USA
Pablo Montesinos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V. (2014). Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-09967-5_20
Published: 01 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics