Abstract
We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data parallelism, the latter three phases represent overhead costs of using speculation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our programming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.
This work is supported by NSF grants CNS-1157377 and CCF-0905509 to UCR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amini, M., Goubier, O., Guelton, S., Mcmahon, J.O., Pasquier, F.X., Pean, G., Villalon, P.: Par4All: from convex array regions to heterogeneous computing. In: IMPACT (2012)
Ayguadé, E., Badia, R.M., Igual, F.D., Labarta, J., Mayo, R., Quintana-OrtÃ, E.S.: An extension of the StarSs programming model for platforms with multiple GPUs. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 851–862. Springer, Heidelberg (2009)
Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A compiler framework for optimization of affine loop nests for GPGPUs. In: ICS, pp. 225–234 (2008)
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Dang, F.H., Yu, H., Rauchwerger, L.: The R-LRPD test: speculative parallelization of partially parallel loops. In: IPDPS (2002)
Diamos, G., Yalamanchili, S.: Speculative execution on multi-GPU systems. In: IPDPS, pp. 1–12 (2010)
Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. In: PLDI, pp. 223–234 (2007)
Feng, W., Xiao, S.: To GPU synchronize or not GPU synchronize? In: ISCAS, pp. 3801–3804 (2010)
Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: PPoPP, pp. 101–110 (2009)
Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Speculative execution on GPU: an exploratory study. In: ICPP, pp. 453–461 (2010)
Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Value prediction and speculative execution on GPU. Int. J. Parallel Program. 39(5), 533–552 (2011)
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008)
Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: PLDI, pp. 218–232 (1995)
Samadi, M., Hormati, A., Lee, J., Mahlke, S.: Paragon: collaborative speculative loop execution on GPU and CPU. In: GPGPU, pp. 64–73 (2012)
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010)
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Wolfe, M.: Implementing the PGI accelerator model. In: GPGPU (2010)
Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. In: PLDI, pp. 86–97 (2010)
Zhang, C., Han, G., Wang, C.-L.: GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs. In: CCGrid, pp. 120–127 (2013)
Zhang, E.Z., Jiang, Y., Guo, Z., Shen, X.: Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. In: ICS, pp. 115–126 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Feng, M., Gupta, R., Bhuyan, L.N. (2015). Optimistic Parallelism on GPUs. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-17473-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)