Skip to main content

Programmable and Scalable Architecture for Graphics Processing Units

  • Chapter
  • First Online:
  • 394 Accesses

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 11225))

Abstract

Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput.

In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allows more programming freedom than vector processors.

Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it a suitable target for general purpose computing on GPU APIs which have become popular in the recent years.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Colwell, R.P., Nix, R.P., O’Donnell, J.J., Papworth, D.B., Rodman, P.K.: A VLIW architecture for a trace scheduling compiler. In: Proceedings of 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 180–192. IEEE Computer Society Press, Los Alamitos (1987). https://doi.org/10.1145/36206.36201

  2. Corporaal, H.: Microprocessor Architectures: From VLIW to TTA. Wiley, Chichester (1997)

    Google Scholar 

  3. Corporaal, H.: TTAs: missing the ILP complexity wall. J. Syst. Arch.: EUROMICRO J. 45(12–13), 949–973 (1999). https://doi.org/10.1016/S1383-7621(98)00046-0

    Article  Google Scholar 

  4. Crow, T.S.: Evolution of the graphical processing unit. Master’s thesis, University of Nevada, Reno (2004)

    Google Scholar 

  5. Fatahalian, K., Houston, M.: A closer look at GPUs. Commun. ACM 51(10), 50–57 (2008)

    Article  Google Scholar 

  6. Halfhill, T.R.: Parallel processing with CUDA. Microprocessor Report (2008)

    Google Scholar 

  7. Hoogerbrugge, J., Corporaal, H.: Register file port requirements of transport triggered architectures. In: Proceedings of 27th International Symposium on Microarchitecture, pp. 191–195. ACM, New York (1994). https://doi.org/10.1145/192724.192751

  8. Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J.: Codesign toolset for application-specific instruction-set processors. In: Proceedings of SPIE Multimedia on Mobile Devices 2007, vol. 6507 (2007)

    Google Scholar 

  9. Jääskeläinen, P., de La Lama, C.S., Huerta, P., Takala, J.: OpenCL-based design methodology for application-specific processors. In: 10th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, July 2010, to appear

    Google Scholar 

  10. Kessenich, J.: The OpenGL Shading Language. 3DLabs, Inc. (2006)

    Google Scholar 

  11. Khronos Group: OpenCL 1.0 Specification (2009). http://www.khronos.org/registry/cl/

  12. de La Lama, C.S., Jääskeläinen, P., Takala, J.: Programmable and scalable architecture for graphics processing units. In: Bertels, K., Dimopoulos, N., Silvano, C., Wong, S. (eds.) SAMOS 2009. LNCS, vol. 5657, pp. 2–11. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03138-0_2

    Chapter  Google Scholar 

  13. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society, Washington (2004)

    Google Scholar 

  14. Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)

    Article  Google Scholar 

  15. Lorie, R.A., Hovey R. Strong, J.: Method for conditional branch execution in SIMD vector processors. US Patent 4435758 (1984)

    Google Scholar 

  16. Luebke, D., Humphreys, G.: How GPUs work. Computer 40(2), 96–100 (2007)

    Article  Google Scholar 

  17. Moy, S., Lindholm, J.E.: Method and system for programmable pipelined graphics processing with branching instructions. US Patent 6947047 (2005)

    Google Scholar 

  18. Moya, V., González, C., Roca, J., Fernández, A., Espasa, R.: Shader performance analysis on a modern GPU architecture. In: Proceedings of 38th IEEE/ACM International Symposium on Microarchitecture, pp. 355–364 (2005)

    Google Scholar 

  19. Nickolls, J., Dally, W.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010). https://doi.org/10.1109/MM.2010.41

    Article  Google Scholar 

  20. NVIDIA: CUDA programming guide v2.1. Technical report (2008)

    Google Scholar 

  21. NVIDIA: NVIDIA’s next generation CUDA compute architecture: Fermi. White Paper (2009)

    Google Scholar 

  22. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)

    Article  Google Scholar 

  23. Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)

    Article  Google Scholar 

  24. Poletto, M., Sarkar, V.: Linear scan register allocation. ACM T. Program. Lang. Syst. 21(5), 895–913 (1999). https://doi.org/10.1145/330249.330250

    Article  Google Scholar 

  25. Rost, R.J.: OpenGL Shading Language, 3rd edn. Addison-Wesley, Reading (2010)

    Google Scholar 

  26. Segal, M., Akeley, K.: The OpenGL Graphics System: A Specification. Silicon Graphics, Inc. (2006)

    Google Scholar 

  27. Seiler, L., et al.: Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph. 27(3), 18 (2008)

    Article  MathSciNet  Google Scholar 

  28. Smelyanskiy, M., Mahlke, S.A., Davidson, E.S., Lee, H.H.S.: Predicate-aware scheduling: a technique for reducing resource constraints. In: Proceedings of International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. ACM International Conference Proceedings Series, vol. 37, pp. 169–178 (2003)

    Google Scholar 

  29. St-Laurent, S.: The Complete Effect and HLSL Guide. Paradoxal Press, Redmond (2005)

    Google Scholar 

  30. Stephens, R.: A survey of stream processing. Acta Inform. 34(7), 491–541 (1997)

    Article  MathSciNet  Google Scholar 

  31. Tampere University of Technology: TCE project at TUT. http://tce.cs.tut.fi

  32. Wasson, S.: AMD Radeon HD 2900 XT graphics processor: R600 revealed. Technical report (2007). http://www.techreport.com/reviews/2007q2/radeon-hd-2900xt/index.x?pg=1

  33. Wasson, S.: NVIDIA’s GeForce 8800 graphics processor. Technical report (2007). http://www.techreport.com/reviews/2006q4/geforce-8800/index.x?pg=1

Download references

Acknowledgments

This research was partially funded by the Academy of Finland, the Nokia Foundation, and Finnish Center for International Mobility (CIMO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos S. de La Lama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

de La Lama, C.S., Jääskeläinen, P., Kultala, H., Takala, J. (2019). Programmable and Scalable Architecture for Graphics Processing Units. In: Silvano, C., Bertels, K., Schulte, M. (eds) Transactions on High-Performance Embedded Architectures and Compilers V. Lecture Notes in Computer Science(), vol 11225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58834-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58834-5_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58833-8

  • Online ISBN: 978-3-662-58834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics