Programmable and Scalable Architecture for Graphics Processing Units

  • Carlos S. de La Lama
  • Pekka Jääskeläinen
  • Jarmo Takala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5657)


Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput.

In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allowing more programming freedom than vector processors.

Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it suitable target for general purpose computing on GPU APIs which have become popular in recent years.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Stephens, R.: A survey of stream processing. Acta Informatica 34(7), 491–541 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Crow, T.S.: Evolution of the Graphical Processing Unit. Master’s thesis, University of Nevada, Reno, NV (December 2004)Google Scholar
  3. 3.
    St-Laurent, S.: The Complete Effect and HLSL Guide. Paradoxal Press (2005)Google Scholar
  4. 4.
    Kessenich, J.: The OpenGL Shading Language. 3DLabs, Inc. (2006)Google Scholar
  5. 5.
    Luebke, D., Humphreys, G.: How GPUs work. Computer 40(2), 96–100 (2007)CrossRefGoogle Scholar
  6. 6.
    Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.J.: A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum 26(1), 80–113 (2007)CrossRefGoogle Scholar
  7. 7.
    Khronos Group: OpenCL 1.0 Specification (Februrary 2009),
  8. 8.
    Halfhill, T.R.: Parallel Processing with CUDA. Microprocessor Report (January 2008)Google Scholar
  9. 9.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro. 28(2), 39–55 (2008)CrossRefGoogle Scholar
  10. 10.
    Wasson, S.: NVIDIA’s GeForce 8800 graphics processor. Tech. Report (November 2007)Google Scholar
  11. 11.
    Wasson, S.: AMD Radeon HD 2900 XT graphics processor: R600 revealed. Tech Report (May 2007)Google Scholar
  12. 12.
    Moya, V., González, C., Roca, J., Fernández, A., Espasa, R.: Shader Performance Analisys on a Modern GPU Architecture. In: 38th IEEE/ACM Int. Symp. Microarchitecture, Barcelona, Spain, November 12-16. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  13. 13.
    Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics 27(18) (August 2008)Google Scholar
  14. 14.
    Segal, M., Akeley, K.: The OpenGL Graphics System: A Specification. Silicon Graphics, Inc. (2006)Google Scholar
  15. 15.
    Colwell, R.P., Nix, R.P., O’Donnell, J.J., Papworth, D.B., Rodman, P.K.: A VLIW architecture for a trace scheduling compiler. In: ASPLOS-II: Proc. second int. conf. on Architectual support for programming languages and operating systems, pp. 180–192. IEEE Computer Society Press, Los Alamitos (1987)Google Scholar
  16. 16.
    Corporaal, H.: Microprocessor Architectures: from VLIW to TTA. John Wiley & Sons, Chichester (1997)Google Scholar
  17. 17.
    Corporaal, H.: TTAs: missing the ILP complexity wall. Journal of Systems Architecture 45(12-13), 949–973 (1999)CrossRefGoogle Scholar
  18. 18.
    Hoogerbrugge, J., Corporaal, H.: Register file port requirements of Transport Triggered Architectures. In: MICRO 27: Proc. 27th Int. Symp. Microarchitecture, pp. 191–195. ACM Press, New York (1994)CrossRefGoogle Scholar
  19. 19.
    Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J.: Codesign toolset for application-specific instruction-set processors. In: Proc. Multimedia on Mobile Devices 2007, pp. 65070X–1 — 65070X–11 (2007),
  20. 20.
    Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proc. Int. Symp. Code Generation and Optimization, Palo Alto, CA, March 20-24, p. 75 (2004)Google Scholar
  21. 21.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2003)zbMATHGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2009

Authors and Affiliations

  • Carlos S. de La Lama
    • 1
  • Pekka Jääskeläinen
    • 2
  • Jarmo Takala
    • 2
  1. 1.Department of Computer Architecture, Computer Science and Artificial IntelligenceUniversidad Rey Juan CarlosMadridSpain
  2. 2.Department of Computer SystemsTampere University of TechnologyTampereFinland

Personalised recommendations