The Analysis of Generic SIMT Scheduling Model Extracted from GPU

  • Yuanxu Xu
  • Mingyan Yu
  • Chao Zhang
  • Bing Yang
Part of the Communications in Computer and Information Science book series (CCIS, volume 396)


To improve the performance of processor, more and more companies during the industrial circle put the single instruction multi-threads (SIMT) scheduling technology into the processor architecture now, which can develop the multicore processor multi-thread parallel performance through promote the ability of processor multi-thread parallel processing. In order to research and develop the technology of SIMT, this article extracts a generic SIMT scheduling model from Graphic Processing Unit (GPU) which is a kind of processor that used in the field of high performance computing. Through analyzing the performance of this scheduling model, this article shows the attributes of this model and can be an important reference for the use and optimizing of this model in other processors.


Multicore processor Multi-thread parallel processing Single instruction multi-threads Scheduling model Performance analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lee, V.W.: Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. In: The 37th International Symposium on Computer Architecture, ISCA 2010, Saint-Malo, France, pp. 451–460 (2010)Google Scholar
  2. 2.
    Luebke, D., Humphreys, G.: How GPUs work. Computer 40(2), 96–100 (2007)CrossRefGoogle Scholar
  3. 3.
    John, N., Dally, W.J.: The GPU Computing Era. Annals Through the Year, pp. 56–69. The IEEE Computer Society (2010)Google Scholar
  4. 4.
    NVIDIA CUDA: Compute Unified Device Architecture, NVIDIA Corp. (2007)Google Scholar
  5. 5.
    NVIDIA CUDA C Programming Guide Version 3.2 (M/OL). NVIDIA (2010),
  6. 6.
    Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In: 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407–420. IEEE Press (2007)Google Scholar
  7. 7.
    Meng, J., Tarjan, D., Skadron, K.: Skadron: Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In: 37th Annual International Symposium on Computer Architecture, ISCA 2010 (June 2010)Google Scholar
  8. 8.
    Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: ICSPC 2007: Proc. of IEEE Int’l Conf. on Signal Processing and Communication, pp. 65–68 (2007)Google Scholar
  9. 9.
    Giles, M., Xiaoke, S.: Notes on using the NVIDIA 8800 GTX graphics card,
  10. 10.
    Giles, M.: Jacobi iteration for a Laplace discretisation on a 3D structured grid,
  11. 11.
    Maxime Ray tracing,
  12. 12.
    Al-Kiswany, S., Gharaibeh, A., Santos-Neto, E., Yuan, G., Ripeanu, M.: StoreGPU: exploiting graphics processing units to accelerate distributed storage systems. In: Proc. 17th Int’l Symp. on High Performance Distributed Computing, pp. 165–174 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yuanxu Xu
    • 1
  • Mingyan Yu
    • 1
  • Chao Zhang
    • 1
  • Bing Yang
    • 1
  1. 1.Department of Electronic Information and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations