Optimistic Parallelism on GPUs

Feng, Min; Gupta, Rajiv; Bhuyan, Laxmi N.

doi:10.1007/978-3-319-17473-0_1

Min Feng¹⁵,
Rajiv Gupta¹⁶ &
Laxmi N. Bhuyan¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

880 Accesses

Abstract

We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data parallelism, the latter three phases represent overhead costs of using speculation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our programming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.

This work is supported by NSF grants CNS-1157377 and CCF-0905509 to UCR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amini, M., Goubier, O., Guelton, S., Mcmahon, J.O., Pasquier, F.X., Pean, G., Villalon, P.: Par4All: from convex array regions to heterogeneous computing. In: IMPACT (2012)
Google Scholar
Ayguadé, E., Badia, R.M., Igual, F.D., Labarta, J., Mayo, R., Quintana-Ortí, E.S.: An extension of the StarSs programming model for platforms with multiple GPUs. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 851–862. Springer, Heidelberg (2009)
Chapter Google Scholar
Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A compiler framework for optimization of affine loop nests for GPGPUs. In: ICS, pp. 225–234 (2008)
Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Dang, F.H., Yu, H., Rauchwerger, L.: The R-LRPD test: speculative parallelization of partially parallel loops. In: IPDPS (2002)
Google Scholar
Diamos, G., Yalamanchili, S.: Speculative execution on multi-GPU systems. In: IPDPS, pp. 1–12 (2010)
Google Scholar
Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. In: PLDI, pp. 223–234 (2007)
Google Scholar
Feng, W., Xiao, S.: To GPU synchronize or not GPU synchronize? In: ISCAS, pp. 3801–3804 (2010)
Google Scholar
Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: PPoPP, pp. 101–110 (2009)
Google Scholar
Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Speculative execution on GPU: an exploratory study. In: ICPP, pp. 453–461 (2010)
Google Scholar
Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Value prediction and speculative execution on GPU. Int. J. Parallel Program. 39(5), 533–552 (2011)
Article Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008)
Article Google Scholar
Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: PLDI, pp. 218–232 (1995)
Google Scholar
Samadi, M., Hormati, A., Lee, J., Mahlke, S.: Paragon: collaborative speculative loop execution on GPU and CPU. In: GPGPU, pp. 64–73 (2012)
Google Scholar
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010)
Google Scholar
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Chapter Google Scholar
Wolfe, M.: Implementing the PGI accelerator model. In: GPGPU (2010)
Google Scholar
Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. In: PLDI, pp. 86–97 (2010)
Google Scholar
Zhang, C., Han, G., Wang, C.-L.: GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs. In: CCGrid, pp. 120–127 (2013)
Google Scholar
Zhang, E.Z., Jiang, Y., Guo, Z., Shen, X.: Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. In: ICS, pp. 115–126 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Laboratories America, Princeton, NJ, USA
Min Feng
University of California, Riverside, CA, USA
Rajiv Gupta & Laxmi N. Bhuyan

Authors

Min Feng
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Laxmi N. Bhuyan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Feng .

Editor information

Editors and Affiliations

Intel Corporation, Santa Clara, California, USA
James Brodman
Intel Corporation, Santa Clara, California, USA
Peng Tu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, M., Gupta, R., Bhuyan, L.N. (2015). Optimistic Parallelism on GPUs. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-17473-0_1
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics