Programmable and Scalable Architecture for Graphics Processing Units

de La Lama, Carlos S.; Jääskeläinen, Pekka; Kultala, Heikki; Takala, Jarmo

doi:10.1007/978-3-662-58834-5_2

Programmable and Scalable Architecture for Graphics Processing Units

Chapter
First Online: 23 February 2019

394 Accesses

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 11225))

Abstract

Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput.

In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allows more programming freedom than vector processors.

Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it a suitable target for general purpose computing on GPU APIs which have become popular in the recent years.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Colwell, R.P., Nix, R.P., O’Donnell, J.J., Papworth, D.B., Rodman, P.K.: A VLIW architecture for a trace scheduling compiler. In: Proceedings of 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 180–192. IEEE Computer Society Press, Los Alamitos (1987). https://doi.org/10.1145/36206.36201
Corporaal, H.: Microprocessor Architectures: From VLIW to TTA. Wiley, Chichester (1997)
Google Scholar
Corporaal, H.: TTAs: missing the ILP complexity wall. J. Syst. Arch.: EUROMICRO J. 45(12–13), 949–973 (1999). https://doi.org/10.1016/S1383-7621(98)00046-0
Article Google Scholar
Crow, T.S.: Evolution of the graphical processing unit. Master’s thesis, University of Nevada, Reno (2004)
Google Scholar
Fatahalian, K., Houston, M.: A closer look at GPUs. Commun. ACM 51(10), 50–57 (2008)
Article Google Scholar
Halfhill, T.R.: Parallel processing with CUDA. Microprocessor Report (2008)
Google Scholar
Hoogerbrugge, J., Corporaal, H.: Register file port requirements of transport triggered architectures. In: Proceedings of 27th International Symposium on Microarchitecture, pp. 191–195. ACM, New York (1994). https://doi.org/10.1145/192724.192751
Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J.: Codesign toolset for application-specific instruction-set processors. In: Proceedings of SPIE Multimedia on Mobile Devices 2007, vol. 6507 (2007)
Google Scholar
Jääskeläinen, P., de La Lama, C.S., Huerta, P., Takala, J.: OpenCL-based design methodology for application-specific processors. In: 10th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, July 2010, to appear
Google Scholar
Kessenich, J.: The OpenGL Shading Language. 3DLabs, Inc. (2006)
Google Scholar
Khronos Group: OpenCL 1.0 Specification (2009). http://www.khronos.org/registry/cl/
de La Lama, C.S., Jääskeläinen, P., Takala, J.: Programmable and scalable architecture for graphics processing units. In: Bertels, K., Dimopoulos, N., Silvano, C., Wong, S. (eds.) SAMOS 2009. LNCS, vol. 5657, pp. 2–11. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03138-0_2
Chapter Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society, Washington (2004)
Google Scholar
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
Article Google Scholar
Lorie, R.A., Hovey R. Strong, J.: Method for conditional branch execution in SIMD vector processors. US Patent 4435758 (1984)
Google Scholar
Luebke, D., Humphreys, G.: How GPUs work. Computer 40(2), 96–100 (2007)
Article Google Scholar
Moy, S., Lindholm, J.E.: Method and system for programmable pipelined graphics processing with branching instructions. US Patent 6947047 (2005)
Google Scholar
Moya, V., González, C., Roca, J., Fernández, A., Espasa, R.: Shader performance analysis on a modern GPU architecture. In: Proceedings of 38th IEEE/ACM International Symposium on Microarchitecture, pp. 355–364 (2005)
Google Scholar
Nickolls, J., Dally, W.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010). https://doi.org/10.1109/MM.2010.41
Article Google Scholar
NVIDIA: CUDA programming guide v2.1. Technical report (2008)
Google Scholar
NVIDIA: NVIDIA’s next generation CUDA compute architecture: Fermi. White Paper (2009)
Google Scholar
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)
Article Google Scholar
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)
Article Google Scholar
Poletto, M., Sarkar, V.: Linear scan register allocation. ACM T. Program. Lang. Syst. 21(5), 895–913 (1999). https://doi.org/10.1145/330249.330250
Article Google Scholar
Rost, R.J.: OpenGL Shading Language, 3rd edn. Addison-Wesley, Reading (2010)
Google Scholar
Segal, M., Akeley, K.: The OpenGL Graphics System: A Specification. Silicon Graphics, Inc. (2006)
Google Scholar
Seiler, L., et al.: Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph. 27(3), 18 (2008)
Article MathSciNet Google Scholar
Smelyanskiy, M., Mahlke, S.A., Davidson, E.S., Lee, H.H.S.: Predicate-aware scheduling: a technique for reducing resource constraints. In: Proceedings of International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. ACM International Conference Proceedings Series, vol. 37, pp. 169–178 (2003)
Google Scholar
St-Laurent, S.: The Complete Effect and HLSL Guide. Paradoxal Press, Redmond (2005)
Google Scholar
Stephens, R.: A survey of stream processing. Acta Inform. 34(7), 491–541 (1997)
Article MathSciNet Google Scholar
Tampere University of Technology: TCE project at TUT. http://tce.cs.tut.fi
Wasson, S.: AMD Radeon HD 2900 XT graphics processor: R600 revealed. Technical report (2007). http://www.techreport.com/reviews/2007q2/radeon-hd-2900xt/index.x?pg=1
Wasson, S.: NVIDIA’s GeForce 8800 graphics processor. Technical report (2007). http://www.techreport.com/reviews/2006q4/geforce-8800/index.x?pg=1

Download references

Acknowledgments

This research was partially funded by the Academy of Finland, the Nokia Foundation, and Finnish Center for International Mobility (CIMO).

Author information

Authors and Affiliations

Department of Computer Architecture, Computer Science and Artificial Intelligence, Universidad Rey Juan Carlos, Móstoles, Spain
Carlos S. de La Lama
Tampere University, Tampere, Finland
Pekka Jääskeläinen, Heikki Kultala & Jarmo Takala

Authors

Carlos S. de La Lama
View author publications
You can also search for this author in PubMed Google Scholar
Pekka Jääskeläinen
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Kultala
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo Takala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos S. de La Lama .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Cristina Silvano
Delft University of Technology, Delft, The Netherlands
Koen Bertels
University of Wisconsin–Madison, Madison, WI, USA
Michael Schulte

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

de La Lama, C.S., Jääskeläinen, P., Kultala, H., Takala, J. (2019). Programmable and Scalable Architecture for Graphics Processing Units. In: Silvano, C., Bertels, K., Schulte, M. (eds) Transactions on High-Performance Embedded Architectures and Compilers V. Lecture Notes in Computer Science(), vol 11225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58834-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-58834-5_2
Published: 23 February 2019
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58833-8
Online ISBN: 978-3-662-58834-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics