Abstract
Programmers are taking advantage of the increasing availability of on-chip parallelism to meet the rising performance demands of diverse applications. Support of tools that can facilitate the detection of incorrect program execution when concurrent threads are involved is critical to this evolution. Many concurrency bugs manifest as some form of data race condition, and their runtime detection is inherently difficult due to the high overhead of the required memory trace comparisons. Various software and hardware tools have been proposed to detect concurrency bugs at runtime. However, software-based schemes lead to significant performance overhead, while, hardware-based schemes require significant hardware modifications. To enable cost-efficient design of data race detectors, it is desirable to utilize available on-chip resources. The recent integration of CPU cores with data-parallel accelerator cores, such as GPU, provides the opportunity to offload the task of data race detection to these accelerator cores. In this paper, we explore this opportunity by designing a G̱PU̱ A̱ccelerated Data Ṟace Ḏetector (GUARD) that utilizes GPU cores to process memory traces and detect data races in parallel applications executing on the CPU cores. GUARD further explores various optimization techniques for: (i) reducing the size of memory traces by employing signatures; and (ii) improving accuracy of signatures using coherence-based filtering. Overall, GUARD achieves the performance of hardware-based data race detection mechanisms with minimal hardware modifications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (2008)
Xiong, W., Park, S., Zhang, J., Zhou, Y., Ma, Z.: Ad hoc synchronization considered harmful. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (2010)
Intel Corporation: Intel Thread Checker, http://www.intel.com
Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems (1997)
Muzahid, A., Suárez, D., Qi, S., Torrellas, J.: Sigrace: signature-based data race detection. In: International Symposium on Computer Architecture, ISCA (2009)
Zhou, P., Teodorescu, R., Zhou, Y.: Hard: Hardware-assisted lockset-based race detection. In: International Symposium on High Performance Computer Architecture, HPCA (2007)
Brookwood, N.: AMD Fusion Family of APUs: Enabling a Superior, Immersive PC Experience. Advanced Micro Devices(AMD) White Paper (2010)
Intel Corporation: Intel Sandy Bridge Microarchitecture, http://www.intel.com
NVIDIA Corporation: NVIDIA Project Denver, http://goo.gl/i2Q3Z
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of ACM (1978)
Engler, D., Ashcraft, K.: Racerx: effective, static detection of race conditions and deadlocks. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003)
Adve, S.V., Hill, M.D., Miller, B.P., Netzer, R.H.B.: Detecting data races on weak memory systems. In: Proceedings of the 18th Annual International Symposium on Computer Architecture (1991)
Gupta, S., Sultan, F., Cadambi, S., Ivancic, F., Rotteler, M.: Using hardware transactional memory for data race detection. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2009)
Boyer, M., Skadron, K., Weimer, W.: Automated dynamic analysis of cuda programs. In: Third Workshop on Software Tools for MultiCore Systems (2008)
Hou, Q., Zhou, K., Guo, B.: Debugging gpu stream programs through automatic dataflow recording and visualization. ACM Transactions on Graphics, TOG (2009)
Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: Grace: a low-overhead mechanism for detecting data races in gpu programs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (2011)
Bekar, U.C., Elmas, T., Okur, S., Tasiran, S.: KUDA: GPU accelerated split race checker. In: Workshop on Determinism and Correctness in Parallel Programming, WoDet (2012)
He, G., Zhai, A.: Improving the performance of program monitors with compiler support in multi-core environment. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2010)
Chen, S., Falsafi, B., Gibbons, P.B., Kozuch, M., Mowry, T.C., Teodorescu, R., Ailamaki, A., Fix, L., Ganger, G.R., Lin, B., Schlosser, S.W.: Log-based architectures for general-purpose monitoring of deployed code. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (2006)
Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of ACM (1970)
Carter, J.L., Wegman, M.N.: Universal Classes of Hash Functions. In: ACM Symposium on Theory of Computing (1977)
Xu, M., Bodik, R., Hill, M.: A ”flight data recorder” for enabling full-system multiprocessor deterministic replay. In: International Symposium on Computer Architecture, ISCA (2003)
Prvulovic, M., Zhang, Z., Torrellas, J.: Revive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In: International Symposium on Computer Architecture, ISCA (2002)
NVIDIA Corporation: NVIDIA CUDA C Programming Guide, http://www.nvidia.com
Xiao, S., Feng, W.C.: Inter-block gpu communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel Distributed Processing, IPDPS (2010)
Gonzalez-Alberquilla, R., Strauss, K., Ceze, L., Piñuel, L.: Accelerating data race detection with minimal hardware support. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 27–38. Springer, Heidelberg (2011)
Sack, P., Bliss, B.E., Ma, Z., Petersen, P., Torrellas, J.: Accurate and efficient filtering for the intel thread checker race detector. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (2006)
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. Computer (2002)
Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News (2005)
Bakhoda, A., Yuan, G., Fung, W., Wong, H., Aamodt, T.: Analyzing cuda workloads using a detailed gpu simulator. In: International Symposium on Performance Analysis of Systems and Software, ISPASS (2009)
Agarwal, N., Krishna, T., Peh, L.S., Jha, N.: GARNET: A Detailed On-chip Network Model Inside a Full-system Simulator. In: ISPASS (2009)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: International Conference on Parallel Architectures and Compilation Techniques, PACT (2008)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: International Symposium on Computer Architecture, ISCA (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mekkat, V., Holey, A., Zhai, A. (2013). Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores. In: Legay, A., Bensalem, S. (eds) Runtime Verification. RV 2013. Lecture Notes in Computer Science, vol 8174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40787-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-40787-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40786-4
Online ISBN: 978-3-642-40787-1
eBook Packages: Computer ScienceComputer Science (R0)