Abstract
GPUs are becoming a primary resource of computing power. They use a single instruction, multiple threads (SIMT) execution model that executes batches of threads in lockstep. If the control flow of threads within the same batch diverges, the different execution paths are scheduled sequentially; once the control flows reconverge, all threads are executed in lockstep again. Several thread batching mechanisms have been proposed, albeit without establishing their semantic validity or their scheduling properties. To increase the level of confidence in the correctness of GPU-accelerated programs, we formalize the SIMT execution model for a stack-based reconvergence mechanism in an operational semantics and prove its correctness by constructing a simulation between the SIMT semantics and a standard interleaved multi-thread semantics. We also demonstrate that the SIMT execution model produces unfair schedules in some cases. We discuss the problem of unfairness for different batching mechanisms like dynamic warp formation and a stack-less reconvergence strategy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
AMD. Evergreen Family Instruction Set Architecture, Reference Guide (2011)
Barnat, J., Brim, L., Ceska, M., Lamr, T.: CUDA Accelerated LTL Model Checking. In: Proc. 15th Int. Conf. Parallel and Distributed Systems (ICPADS 2009), pp. 34–41 (2009)
Bošnački, D., Edelkamp, S., Sulewski, D., Wijs, A.: GPU-PRISM: An Extension of PRISM for General Purpose Graphics Processing Units. In: Proc. 9th Int. Wsh. Parallel and Distributed Methods in Verification (PDMV 2010), pp. 17–19 (2010)
Collange, S.: Stack-less SIMT Reconvergence at Low Cost. Technical Report HAL-00622654, INRIA (2011)
Coon, B.W., Nickolls, J.R., Nyland, L., Mills, P.C., Lindholm, J.E.: Indirect Function Call Instructions in a Synchronous Parallel Thread Processor, United States Patent Application #2009/0240931 (2009)
Fung, W.W.L., Aamodt, T.M.: Thread Block Compaction for Efficient SIMT Control Flow. In: Proc. 17th IEEE Int. Symp. High Performance Computer Architecture (HPCA 2011), pp. 25–36 (2011)
Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In: Proc. 40th Ann. IEEE/ACM Int. Symp. Microarchitecture (MICRO 2007), pp. 407–420 (2007)
Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V.: Parallel Computing Experiences with CUDA. IEEE Micro 28(4), 13–27 (2008)
Habermaier, A.: The Model of Computation of CUDA and its Formal Semantics. Technical Report 2011-14, University of Augsburg (2011)
Habermaier, A., Knapp, A.: On the Correctness of the SIMT Execution Model of GPUs. Technical Report 2012-1, University of Augsburg (2012)
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 5th edn. Elsevier Science & Technology (2011)
Khronos Group Inc. The OpenGL Shading Language 4.20, Revision 6 (2011)
Khronos OpenCL Working Group. The OpenCL Specification 1.2, Revision 15 (2011)
Levinthal, A., Porter, T.: Chap – A SIMD Graphics Processor. SIGGRAPH Comput. Graph. 18, 77–82 (1984)
The LLVM Compiler Infrastructure, http://www.llvm.org/ (01/04/2012)
Mantor, M., Houston, M.: AMD Graphic Core Next: Low Power High Performance Graphics & Parallel Compute. Presentation at the AMD Fusion Developer Summit (2011)
Mark, W.: Future Graphics Architectures. ACM Queue 6, 54–64 (2008)
Meng, J., Tarjan, D., Skadron, K.: Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In: Proc. 37th Ann. Int. Symp. Computer Architecture (ISCA 2010), pp. 235–246 (2010)
Moy, S., Lindholm, J.E.: Method and System for Programmable Pipelined Graphics Processing with Branching Instructions, United States Patent #6,947,047 (2005)
Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc. (1997)
Nickolls, J.R., Dally, W.: The GPU Computing Era. IEEE Micro 30(2), 56–69 (2010)
NVIDIA. DirectCompute Programming Guide 3.2 (2010)
NVIDIA. cuobjdump. CUDA Toolkit 4.1 (2011)
NVIDIA. NVIDIA CUDA C Programming Guide 4.1 (2011)
NVIDIA. NVIDIA Opens Up CUDA Platform by Releasing Compiler Source Code (2011), http://tiny.cc/NvidiaLLVM (01/04/2012)
Reynolds, J.C.: Theories of Programming Languages. Cambridge University Press (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Habermaier, A., Knapp, A. (2012). On the Correctness of the SIMT Execution Model of GPUs. In: Seidl, H. (eds) Programming Languages and Systems. ESOP 2012. Lecture Notes in Computer Science, vol 7211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28869-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-28869-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28868-5
Online ISBN: 978-3-642-28869-2
eBook Packages: Computer ScienceComputer Science (R0)