Skip to main content

CUDA-Lite: Reducing GPU Programming Complexity

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5335))

Abstract

The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnett, M., Leino, K.R.M., Schulte, W.: The Spec# programming system: An overview. In: Barthe, G., Burdy, L., Huisman, M., Lanet, J.-L., Muntean, T. (eds.) CASSIS 2004. LNCS, vol. 3362, pp. 49–69. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2008)

    Google Scholar 

  3. Brunner, R.J., Kindratenko, V.V., Myers, A.D.: Developing and deploying advanced algorithms to novel supercomputing hardware. In: Proceedings of NASA Science Technology Conference - NCTC 2007 (2007)

    Google Scholar 

  4. Guo, J., Bikshandi, G., Fraguela, B.B., Garzaran, M.J., Padua, D.: Programming with tiles. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2008)

    Google Scholar 

  5. Kandemir, M., Choudhary, A.: Compiler-directed scratch pad memory hierarchy design and management. In: DAC 2002: Proceedings of the 39th Conference on Design Automation (2002)

    Google Scholar 

  6. Knight, T.J., Park, J.Y., Ren, M., Mike, H., Erez, M., Fatahalian, K., Aiken, A., Dally, W.J., Hanrahan, P.: Compilation for explicitly managed memory hierarchies. In: Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2007)

    Google Scholar 

  7. Microsoft. Phoenix compiler, http://research.microsoft.com/Phoenix/

  8. Nickolls, J., Buck, I.: NVIDIA CUDA software and GPU parallel computing architecture. Microprocessor Forum (May 2007)

    Google Scholar 

  9. NVIDIA. NVIDIA CUDA, http://www.nvidia.com/cuda

  10. NVIDIA. NVIDIA CUDA Compute Unified Device Architecture Programming Guide: Version 1.0. NVIDIA Corporation (June 2007)

    Google Scholar 

  11. Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)

    Article  Google Scholar 

  12. Panda, P.R., Dutt, N.D., Nicolau, A.: Efficient utilization of scratch-pad memory in embedded processor applications. In: EDTC 1997: Proceedings of the 1997 European Conference on Design and Test (1997)

    Google Scholar 

  13. Ren, G., Wu, P., Padua, D.A.: Optimizing data permutations for SIMD devices. In: PLDI, pp. 118–131 (2006)

    Google Scholar 

  14. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP, pp. 73–82 (2008)

    Google Scholar 

  15. Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S., Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded GPU. In: CGO (April 2008)

    Google Scholar 

  16. Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.W., Liang, Z., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: Proceedings of the 2008 International Conference on Computing Frontiers (May 2008)

    Google Scholar 

  17. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: PLDI 1991: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (1991)

    Google Scholar 

  18. Wu, P., Eichenberger, A.E., Wang, A., Zhao, P.: An integrated simdization framework using virtual vectors. In: ICS, pp. 169–178 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ueng, SZ., Lathara, M., Baghsorkhi, S.S., Hwu, Wm.W. (2008). CUDA-Lite: Reducing GPU Programming Complexity. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89740-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89739-2

  • Online ISBN: 978-3-540-89740-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics