Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores

Mekkat, Vineeth; Holey, Anup; Zhai, Antonia

doi:10.1007/978-3-642-40787-1_12

Vineeth Mekkat¹⁸,
Anup Holey¹⁸ &
Antonia Zhai¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8174))

Included in the following conference series:

International Conference on Runtime Verification

1515 Accesses
5 Citations

Abstract

Programmers are taking advantage of the increasing availability of on-chip parallelism to meet the rising performance demands of diverse applications. Support of tools that can facilitate the detection of incorrect program execution when concurrent threads are involved is critical to this evolution. Many concurrency bugs manifest as some form of data race condition, and their runtime detection is inherently difficult due to the high overhead of the required memory trace comparisons. Various software and hardware tools have been proposed to detect concurrency bugs at runtime. However, software-based schemes lead to significant performance overhead, while, hardware-based schemes require significant hardware modifications. To enable cost-efficient design of data race detectors, it is desirable to utilize available on-chip resources. The recent integration of CPU cores with data-parallel accelerator cores, such as GPU, provides the opportunity to offload the task of data race detection to these accelerator cores. In this paper, we explore this opportunity by designing a G̱PU̱ A̱ccelerated Data Ṟace Ḏetector (GUARD) that utilizes GPU cores to process memory traces and detect data races in parallel applications executing on the CPU cores. GUARD further explores various optimization techniques for: (i) reducing the size of memory traces by employing signatures; and (ii) improving accuracy of signatures using coherence-based filtering. Overall, GUARD achieves the performance of hardware-based data race detection mechanisms with minimal hardware modifications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (2008)
Google Scholar
Xiong, W., Park, S., Zhang, J., Zhou, Y., Ma, Z.: Ad hoc synchronization considered harmful. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (2010)
Google Scholar
Intel Corporation: Intel Thread Checker, http://www.intel.com
Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems (1997)
Google Scholar
Muzahid, A., Suárez, D., Qi, S., Torrellas, J.: Sigrace: signature-based data race detection. In: International Symposium on Computer Architecture, ISCA (2009)
Google Scholar
Zhou, P., Teodorescu, R., Zhou, Y.: Hard: Hardware-assisted lockset-based race detection. In: International Symposium on High Performance Computer Architecture, HPCA (2007)
Google Scholar
Brookwood, N.: AMD Fusion Family of APUs: Enabling a Superior, Immersive PC Experience. Advanced Micro Devices(AMD) White Paper (2010)
Google Scholar
Intel Corporation: Intel Sandy Bridge Microarchitecture, http://www.intel.com
NVIDIA Corporation: NVIDIA Project Denver, http://goo.gl/i2Q3Z
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of ACM (1978)
Google Scholar
Engler, D., Ashcraft, K.: Racerx: effective, static detection of race conditions and deadlocks. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003)
Google Scholar
Adve, S.V., Hill, M.D., Miller, B.P., Netzer, R.H.B.: Detecting data races on weak memory systems. In: Proceedings of the 18th Annual International Symposium on Computer Architecture (1991)
Google Scholar
Gupta, S., Sultan, F., Cadambi, S., Ivancic, F., Rotteler, M.: Using hardware transactional memory for data race detection. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2009)
Google Scholar
Boyer, M., Skadron, K., Weimer, W.: Automated dynamic analysis of cuda programs. In: Third Workshop on Software Tools for MultiCore Systems (2008)
Google Scholar
Hou, Q., Zhou, K., Guo, B.: Debugging gpu stream programs through automatic dataflow recording and visualization. ACM Transactions on Graphics, TOG (2009)
Google Scholar
Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: Grace: a low-overhead mechanism for detecting data races in gpu programs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (2011)
Google Scholar
Bekar, U.C., Elmas, T., Okur, S., Tasiran, S.: KUDA: GPU accelerated split race checker. In: Workshop on Determinism and Correctness in Parallel Programming, WoDet (2012)
Google Scholar
He, G., Zhai, A.: Improving the performance of program monitors with compiler support in multi-core environment. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2010)
Google Scholar
Chen, S., Falsafi, B., Gibbons, P.B., Kozuch, M., Mowry, T.C., Teodorescu, R., Ailamaki, A., Fix, L., Ganger, G.R., Lin, B., Schlosser, S.W.: Log-based architectures for general-purpose monitoring of deployed code. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (2006)
Google Scholar
Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of ACM (1970)
Google Scholar
Carter, J.L., Wegman, M.N.: Universal Classes of Hash Functions. In: ACM Symposium on Theory of Computing (1977)
Google Scholar
Xu, M., Bodik, R., Hill, M.: A ”flight data recorder” for enabling full-system multiprocessor deterministic replay. In: International Symposium on Computer Architecture, ISCA (2003)
Google Scholar
Prvulovic, M., Zhang, Z., Torrellas, J.: Revive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In: International Symposium on Computer Architecture, ISCA (2002)
Google Scholar
NVIDIA Corporation: NVIDIA CUDA C Programming Guide, http://www.nvidia.com
Xiao, S., Feng, W.C.: Inter-block gpu communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel Distributed Processing, IPDPS (2010)
Google Scholar
Gonzalez-Alberquilla, R., Strauss, K., Ceze, L., Piñuel, L.: Accelerating data race detection with minimal hardware support. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 27–38. Springer, Heidelberg (2011)
Chapter Google Scholar
Sack, P., Bliss, B.E., Ma, Z., Petersen, P., Torrellas, J.: Accurate and efficient filtering for the intel thread checker race detector. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (2006)
Google Scholar
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. Computer (2002)
Google Scholar
Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News (2005)
Google Scholar
Bakhoda, A., Yuan, G., Fung, W., Wong, H., Aamodt, T.: Analyzing cuda workloads using a detailed gpu simulator. In: International Symposium on Performance Analysis of Systems and Software, ISPASS (2009)
Google Scholar
Agarwal, N., Krishna, T., Peh, L.S., Jha, N.: GARNET: A Detailed On-chip Network Model Inside a Full-system Simulator. In: ISPASS (2009)
Google Scholar
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: International Conference on Parallel Architectures and Compilation Techniques, PACT (2008)
Google Scholar
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: International Symposium on Computer Architecture, ISCA (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Vineeth Mekkat, Anup Holey & Antonia Zhai

Authors

Vineeth Mekkat
View author publications
You can also search for this author in PubMed Google Scholar
Anup Holey
View author publications
You can also search for this author in PubMed Google Scholar
Antonia Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Inria Rennes, Campus de Beaulieu, 263 Avenue Général Leclerc, 35042, Rennes, France
Axel Legay
VERIMAG Centre Équation, Université Joseph Fourier, Avenue de Vignate 2, 38618, Gières, France
Saddek Bensalem

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mekkat, V., Holey, A., Zhai, A. (2013). Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores. In: Legay, A., Bensalem, S. (eds) Runtime Verification. RV 2013. Lecture Notes in Computer Science, vol 8174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40787-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-40787-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40786-4
Online ISBN: 978-3-642-40787-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics