Cudagrind: Memory-Usage Checking for CUDA

Baumann, Thomas M.; Gracia, José

doi:10.1007/978-3-319-08144-1_6

Thomas M. Baumann⁵ &
José Gracia⁵

525 Accesses
1 Citations

Abstract

The Memcheck tool, build on top of the popular Valgrind framework, offers a reliable way to perform memory correctness checking in arbitrary x86 programs. At runtime Valgrind dynamically transforms the program into intermediate code which is then being executed on an emulated CPU. Due to a full shadow copy of the memory used by each program the Memcheck tool is able to perform various, bitwise precise runtime checks. These include, but are not limited to, detection of memory leaks, checking the validity of memory accesses and tracking the definedness of memory regions. But despite the wide applicability of this approach, it is bound to fail when accelerator based programming models are involved. Kernels running on such devices, like NVIDIA’s Tesla series, are completely separated from the host. The memory on the device is only accessible through an API provided by the driver or from inside the kernels. Due to this indirect approach Valgrind is not able to understand, instruments or even recognize memory operations being executed on the device. Freeing Valgrind from this limitation has been the focus of the work presented here. A set of wrappers for a subset of the CUDA driver API has been introduced. These allow tracking of (de-)allocation of memory regions on the device as well as memory copy operations needed to place and retrieve data in device memory. This provides the ability to check whether memory is fully allocated during a transfer and, thanks to the host memory checking performed by Valgrind, whether the memory transferred to the device is fully defined and addressable on the host. This techniques allows detection of a number of common programming mistakes, many of which can be rather difficult to debug by other means. These wrappers, combined with Valgrind’s Memcheck tool, is being called Cudagrind.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://docs.nvidia.com/cuda/cuda-driver-api/index.html
2.
This might not be an error in cases where the host memory is accessed in a strided way.
3.
See the Valgrind manual that comes with your installation or at the Valgrind homepage at http://valgrind.org/ for more information about suppression files.
4.
https://www.hlrs.de/systems/platforms/nec-cluster-laki-laki2/
5.
https://www.hlrs.de/organization/av/spmt/research/cudagrind/

References

Diamos, G.F., Kerr, A.R., Yalamanchili, S., Clark, N.: Ocelot: a dynamic optimization framework for Bulk-synchronous applications in heterogeneous systems. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, Vienna, pp. 353–364. ACM, New York (2010). http://doi.acm.org/10.1145/1854273.1854318
Farooqui, N., Kerr, A., Diamos, G., Yalamanchili, S., Schwan, K.: A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, Newport Beach, pp. 9:1–9:9. ACM, New York (2011). http://doi.acm.org/10.1145/1964179.1964192
Johnson, S.C.: Lint: A C Program Checker. Computing Science Technical Report 65, Bell Laboratories, (1977)
Google Scholar
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007). http://doi.acm.org/10.1145/1273442.1250746
Seward, J., Nethercote, N.: Using valgrind to detect undefined value errors with bit-precision. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC ’05, Anaheim, pp. 17–30. USENIX Association, Berkeley (2005)
Google Scholar

Download references

Acknowledgements

This work was supported by the H4H project funded by the German Federal Ministry for Education and Research (grant number 01IS10036B) within the ITEA2 framework (grant number 09011).

Author information

Authors and Affiliations

High Performance Computing Center Stuttgart, Nobelstr. 19, 70565, Stuttgart, Germany
Thomas M. Baumann & José Gracia

Authors

Thomas M. Baumann
View author publications
You can also search for this author in PubMed Google Scholar
José Gracia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas M. Baumann .

Editor information

Editors and Affiliations

Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Andreas Knüpfer
Höchstleistungsrechenzentrum Stuttgart (HLRS), Universität Stuttgart, Stuttgart, Germany
José Gracia
Zentrum für Informationsdienst und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Wolfgang E. Nagel
Höchstleistungsrechenzentrum Stuttgart (HLRS), Universität Stuttgart, Stuttgart, Germany
Michael M. Resch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baumann, T.M., Gracia, J. (2014). Cudagrind: Memory-Usage Checking for CUDA. In: Knüpfer, A., Gracia, J., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2013. Springer, Cham. https://doi.org/10.1007/978-3-319-08144-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-08144-1_6
Published: 02 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08143-4
Online ISBN: 978-3-319-08144-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics