On the Correctness of the SIMT Execution Model of GPUs

Habermaier, Axel; Knapp, Alexander

doi:10.1007/978-3-642-28869-2_16

On the Correctness of the SIMT Execution Model of GPUs

Axel Habermaier¹⁷ &
Alexander Knapp¹⁷

Conference paper

2376 Accesses
22 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7211))

Abstract

GPUs are becoming a primary resource of computing power. They use a single instruction, multiple threads (SIMT) execution model that executes batches of threads in lockstep. If the control flow of threads within the same batch diverges, the different execution paths are scheduled sequentially; once the control flows reconverge, all threads are executed in lockstep again. Several thread batching mechanisms have been proposed, albeit without establishing their semantic validity or their scheduling properties. To increase the level of confidence in the correctness of GPU-accelerated programs, we formalize the SIMT execution model for a stack-based reconvergence mechanism in an operational semantics and prove its correctness by constructing a simulation between the SIMT semantics and a standard interleaved multi-thread semantics. We also demonstrate that the SIMT execution model produces unfair schedules in some cases. We discuss the problem of unfairness for different batching mechanisms like dynamic warp formation and a stack-less reconvergence strategy.

Download to read the full chapter text

Chapter PDF

References

AMD. Evergreen Family Instruction Set Architecture, Reference Guide (2011)
Google Scholar
Barnat, J., Brim, L., Ceska, M., Lamr, T.: CUDA Accelerated LTL Model Checking. In: Proc. 15th Int. Conf. Parallel and Distributed Systems (ICPADS 2009), pp. 34–41 (2009)
Google Scholar
Bošnački, D., Edelkamp, S., Sulewski, D., Wijs, A.: GPU-PRISM: An Extension of PRISM for General Purpose Graphics Processing Units. In: Proc. 9th Int. Wsh. Parallel and Distributed Methods in Verification (PDMV 2010), pp. 17–19 (2010)
Google Scholar
Collange, S.: Stack-less SIMT Reconvergence at Low Cost. Technical Report HAL-00622654, INRIA (2011)
Google Scholar
Coon, B.W., Nickolls, J.R., Nyland, L., Mills, P.C., Lindholm, J.E.: Indirect Function Call Instructions in a Synchronous Parallel Thread Processor, United States Patent Application #2009/0240931 (2009)
Google Scholar
Fung, W.W.L., Aamodt, T.M.: Thread Block Compaction for Efficient SIMT Control Flow. In: Proc. 17th IEEE Int. Symp. High Performance Computer Architecture (HPCA 2011), pp. 25–36 (2011)
Google Scholar
Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In: Proc. 40th Ann. IEEE/ACM Int. Symp. Microarchitecture (MICRO 2007), pp. 407–420 (2007)
Google Scholar
Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V.: Parallel Computing Experiences with CUDA. IEEE Micro 28(4), 13–27 (2008)
Article Google Scholar
Habermaier, A.: The Model of Computation of CUDA and its Formal Semantics. Technical Report 2011-14, University of Augsburg (2011)
Google Scholar
Habermaier, A., Knapp, A.: On the Correctness of the SIMT Execution Model of GPUs. Technical Report 2012-1, University of Augsburg (2012)
Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 5th edn. Elsevier Science & Technology (2011)
Google Scholar
Khronos Group Inc. The OpenGL Shading Language 4.20, Revision 6 (2011)
Google Scholar
Khronos OpenCL Working Group. The OpenCL Specification 1.2, Revision 15 (2011)
Google Scholar
Levinthal, A., Porter, T.: Chap – A SIMD Graphics Processor. SIGGRAPH Comput. Graph. 18, 77–82 (1984)
Article Google Scholar
The LLVM Compiler Infrastructure, http://www.llvm.org/ (01/04/2012)
Mantor, M., Houston, M.: AMD Graphic Core Next: Low Power High Performance Graphics & Parallel Compute. Presentation at the AMD Fusion Developer Summit (2011)
Google Scholar
Mark, W.: Future Graphics Architectures. ACM Queue 6, 54–64 (2008)
Article Google Scholar
Meng, J., Tarjan, D., Skadron, K.: Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In: Proc. 37th Ann. Int. Symp. Computer Architecture (ISCA 2010), pp. 235–246 (2010)
Google Scholar
Moy, S., Lindholm, J.E.: Method and System for Programmable Pipelined Graphics Processing with Branching Instructions, United States Patent #6,947,047 (2005)
Google Scholar
Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc. (1997)
Google Scholar
Nickolls, J.R., Dally, W.: The GPU Computing Era. IEEE Micro 30(2), 56–69 (2010)
Article Google Scholar
NVIDIA. DirectCompute Programming Guide 3.2 (2010)
Google Scholar
NVIDIA. cuobjdump. CUDA Toolkit 4.1 (2011)
Google Scholar
NVIDIA. NVIDIA CUDA C Programming Guide 4.1 (2011)
Google Scholar
NVIDIA. NVIDIA Opens Up CUDA Platform by Releasing Compiler Source Code (2011), http://tiny.cc/NvidiaLLVM (01/04/2012)
Reynolds, J.C.: Theories of Programming Languages. Cambridge University Press (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Software and Systems Engineering, University of Augsburg, Germany
Axel Habermaier & Alexander Knapp

Authors

Axel Habermaier
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Knapp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technische Universität München, Boltzmannstrasse 3, 85748, Garching, Germany
Helmut Seidl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Habermaier, A., Knapp, A. (2012). On the Correctness of the SIMT Execution Model of GPUs. In: Seidl, H. (eds) Programming Languages and Systems. ESOP 2012. Lecture Notes in Computer Science, vol 7211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28869-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-28869-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28868-5
Online ISBN: 978-3-642-28869-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics