Abstract
Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput.
In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allows more programming freedom than vector processors.
Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it a suitable target for general purpose computing on GPU APIs which have become popular in the recent years.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Colwell, R.P., Nix, R.P., O’Donnell, J.J., Papworth, D.B., Rodman, P.K.: A VLIW architecture for a trace scheduling compiler. In: Proceedings of 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 180–192. IEEE Computer Society Press, Los Alamitos (1987). https://doi.org/10.1145/36206.36201
Corporaal, H.: Microprocessor Architectures: From VLIW to TTA. Wiley, Chichester (1997)
Corporaal, H.: TTAs: missing the ILP complexity wall. J. Syst. Arch.: EUROMICRO J. 45(12–13), 949–973 (1999). https://doi.org/10.1016/S1383-7621(98)00046-0
Crow, T.S.: Evolution of the graphical processing unit. Master’s thesis, University of Nevada, Reno (2004)
Fatahalian, K., Houston, M.: A closer look at GPUs. Commun. ACM 51(10), 50–57 (2008)
Halfhill, T.R.: Parallel processing with CUDA. Microprocessor Report (2008)
Hoogerbrugge, J., Corporaal, H.: Register file port requirements of transport triggered architectures. In: Proceedings of 27th International Symposium on Microarchitecture, pp. 191–195. ACM, New York (1994). https://doi.org/10.1145/192724.192751
Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J.: Codesign toolset for application-specific instruction-set processors. In: Proceedings of SPIE Multimedia on Mobile Devices 2007, vol. 6507 (2007)
Jääskeläinen, P., de La Lama, C.S., Huerta, P., Takala, J.: OpenCL-based design methodology for application-specific processors. In: 10th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, July 2010, to appear
Kessenich, J.: The OpenGL Shading Language. 3DLabs, Inc. (2006)
Khronos Group: OpenCL 1.0 Specification (2009). http://www.khronos.org/registry/cl/
de La Lama, C.S., Jääskeläinen, P., Takala, J.: Programmable and scalable architecture for graphics processing units. In: Bertels, K., Dimopoulos, N., Silvano, C., Wong, S. (eds.) SAMOS 2009. LNCS, vol. 5657, pp. 2–11. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03138-0_2
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society, Washington (2004)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
Lorie, R.A., Hovey R. Strong, J.: Method for conditional branch execution in SIMD vector processors. US Patent 4435758 (1984)
Luebke, D., Humphreys, G.: How GPUs work. Computer 40(2), 96–100 (2007)
Moy, S., Lindholm, J.E.: Method and system for programmable pipelined graphics processing with branching instructions. US Patent 6947047 (2005)
Moya, V., González, C., Roca, J., Fernández, A., Espasa, R.: Shader performance analysis on a modern GPU architecture. In: Proceedings of 38th IEEE/ACM International Symposium on Microarchitecture, pp. 355–364 (2005)
Nickolls, J., Dally, W.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010). https://doi.org/10.1109/MM.2010.41
NVIDIA: CUDA programming guide v2.1. Technical report (2008)
NVIDIA: NVIDIA’s next generation CUDA compute architecture: Fermi. White Paper (2009)
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)
Poletto, M., Sarkar, V.: Linear scan register allocation. ACM T. Program. Lang. Syst. 21(5), 895–913 (1999). https://doi.org/10.1145/330249.330250
Rost, R.J.: OpenGL Shading Language, 3rd edn. Addison-Wesley, Reading (2010)
Segal, M., Akeley, K.: The OpenGL Graphics System: A Specification. Silicon Graphics, Inc. (2006)
Seiler, L., et al.: Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph. 27(3), 18 (2008)
Smelyanskiy, M., Mahlke, S.A., Davidson, E.S., Lee, H.H.S.: Predicate-aware scheduling: a technique for reducing resource constraints. In: Proceedings of International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. ACM International Conference Proceedings Series, vol. 37, pp. 169–178 (2003)
St-Laurent, S.: The Complete Effect and HLSL Guide. Paradoxal Press, Redmond (2005)
Stephens, R.: A survey of stream processing. Acta Inform. 34(7), 491–541 (1997)
Tampere University of Technology: TCE project at TUT. http://tce.cs.tut.fi
Wasson, S.: AMD Radeon HD 2900 XT graphics processor: R600 revealed. Technical report (2007). http://www.techreport.com/reviews/2007q2/radeon-hd-2900xt/index.x?pg=1
Wasson, S.: NVIDIA’s GeForce 8800 graphics processor. Technical report (2007). http://www.techreport.com/reviews/2006q4/geforce-8800/index.x?pg=1
Acknowledgments
This research was partially funded by the Academy of Finland, the Nokia Foundation, and Finnish Center for International Mobility (CIMO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
de La Lama, C.S., Jääskeläinen, P., Kultala, H., Takala, J. (2019). Programmable and Scalable Architecture for Graphics Processing Units. In: Silvano, C., Bertels, K., Schulte, M. (eds) Transactions on High-Performance Embedded Architectures and Compilers V. Lecture Notes in Computer Science(), vol 11225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58834-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-58834-5_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58833-8
Online ISBN: 978-3-662-58834-5
eBook Packages: Computer ScienceComputer Science (R0)