Skip to main content

Cudagrind: Memory-Usage Checking for CUDA

  • Conference paper
  • First Online:
Book cover Tools for High Performance Computing 2013

Abstract

The Memcheck tool, build on top of the popular Valgrind framework, offers a reliable way to perform memory correctness checking in arbitrary x86 programs. At runtime Valgrind dynamically transforms the program into intermediate code which is then being executed on an emulated CPU. Due to a full shadow copy of the memory used by each program the Memcheck tool is able to perform various, bitwise precise runtime checks. These include, but are not limited to, detection of memory leaks, checking the validity of memory accesses and tracking the definedness of memory regions. But despite the wide applicability of this approach, it is bound to fail when accelerator based programming models are involved. Kernels running on such devices, like NVIDIA’s Tesla series, are completely separated from the host. The memory on the device is only accessible through an API provided by the driver or from inside the kernels. Due to this indirect approach Valgrind is not able to understand, instruments or even recognize memory operations being executed on the device. Freeing Valgrind from this limitation has been the focus of the work presented here. A set of wrappers for a subset of the CUDA driver API has been introduced. These allow tracking of (de-)allocation of memory regions on the device as well as memory copy operations needed to place and retrieve data in device memory. This provides the ability to check whether memory is fully allocated during a transfer and, thanks to the host memory checking performed by Valgrind, whether the memory transferred to the device is fully defined and addressable on the host. This techniques allows detection of a number of common programming mistakes, many of which can be rather difficult to debug by other means. These wrappers, combined with Valgrind’s Memcheck tool, is being called Cudagrind.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://docs.nvidia.com/cuda/cuda-driver-api/index.html

  2. 2.

    This might not be an error in cases where the host memory is accessed in a strided way.

  3. 3.

    See the Valgrind manual that comes with your installation or at the Valgrind homepage at http://valgrind.org/ for more information about suppression files.

  4. 4.

    https://www.hlrs.de/systems/platforms/nec-cluster-laki-laki2/

  5. 5.

    https://www.hlrs.de/organization/av/spmt/research/cudagrind/

References

  1. Diamos, G.F., Kerr, A.R., Yalamanchili, S., Clark, N.: Ocelot: a dynamic optimization framework for Bulk-synchronous applications in heterogeneous systems. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, Vienna, pp. 353–364. ACM, New York (2010). http://doi.acm.org/10.1145/1854273.1854318

  2. Farooqui, N., Kerr, A., Diamos, G., Yalamanchili, S., Schwan, K.: A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, Newport Beach, pp. 9:1–9:9. ACM, New York (2011). http://doi.acm.org/10.1145/1964179.1964192

  3. Johnson, S.C.: Lint: A C Program Checker. Computing Science Technical Report 65, Bell Laboratories, (1977)

    Google Scholar 

  4. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007). http://doi.acm.org/10.1145/1273442.1250746

  5. Seward, J., Nethercote, N.: Using valgrind to detect undefined value errors with bit-precision. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC ’05, Anaheim, pp. 17–30. USENIX Association, Berkeley (2005)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the H4H project funded by the German Federal Ministry for Education and Research (grant number 01IS10036B) within the ITEA2 framework (grant number 09011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas M. Baumann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Baumann, T.M., Gracia, J. (2014). Cudagrind: Memory-Usage Checking for CUDA. In: Knüpfer, A., Gracia, J., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2013. Springer, Cham. https://doi.org/10.1007/978-3-319-08144-1_6

Download citation

Publish with us

Policies and ethics