Skip to main content

Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores

  • Conference paper
Runtime Verification (RV 2013)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8174))

Included in the following conference series:

Abstract

Programmers are taking advantage of the increasing availability of on-chip parallelism to meet the rising performance demands of diverse applications. Support of tools that can facilitate the detection of incorrect program execution when concurrent threads are involved is critical to this evolution. Many concurrency bugs manifest as some form of data race condition, and their runtime detection is inherently difficult due to the high overhead of the required memory trace comparisons. Various software and hardware tools have been proposed to detect concurrency bugs at runtime. However, software-based schemes lead to significant performance overhead, while, hardware-based schemes require significant hardware modifications. To enable cost-efficient design of data race detectors, it is desirable to utilize available on-chip resources. The recent integration of CPU cores with data-parallel accelerator cores, such as GPU, provides the opportunity to offload the task of data race detection to these accelerator cores. In this paper, we explore this opportunity by designing a G̱PU̱ A̱ccelerated Data Ṟace Ḏetector (GUARD) that utilizes GPU cores to process memory traces and detect data races in parallel applications executing on the CPU cores. GUARD further explores various optimization techniques for: (i) reducing the size of memory traces by employing signatures; and (ii) improving accuracy of signatures using coherence-based filtering. Overall, GUARD achieves the performance of hardware-based data race detection mechanisms with minimal hardware modifications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (2008)

    Google Scholar 

  2. Xiong, W., Park, S., Zhang, J., Zhou, Y., Ma, Z.: Ad hoc synchronization considered harmful. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (2010)

    Google Scholar 

  3. Intel Corporation: Intel Thread Checker, http://www.intel.com

  4. Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems (1997)

    Google Scholar 

  5. Muzahid, A., Suárez, D., Qi, S., Torrellas, J.: Sigrace: signature-based data race detection. In: International Symposium on Computer Architecture, ISCA (2009)

    Google Scholar 

  6. Zhou, P., Teodorescu, R., Zhou, Y.: Hard: Hardware-assisted lockset-based race detection. In: International Symposium on High Performance Computer Architecture, HPCA (2007)

    Google Scholar 

  7. Brookwood, N.: AMD Fusion Family of APUs: Enabling a Superior, Immersive PC Experience. Advanced Micro Devices(AMD) White Paper (2010)

    Google Scholar 

  8. Intel Corporation: Intel Sandy Bridge Microarchitecture, http://www.intel.com

  9. NVIDIA Corporation: NVIDIA Project Denver, http://goo.gl/i2Q3Z

  10. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of ACM (1978)

    Google Scholar 

  11. Engler, D., Ashcraft, K.: Racerx: effective, static detection of race conditions and deadlocks. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003)

    Google Scholar 

  12. Adve, S.V., Hill, M.D., Miller, B.P., Netzer, R.H.B.: Detecting data races on weak memory systems. In: Proceedings of the 18th Annual International Symposium on Computer Architecture (1991)

    Google Scholar 

  13. Gupta, S., Sultan, F., Cadambi, S., Ivancic, F., Rotteler, M.: Using hardware transactional memory for data race detection. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2009)

    Google Scholar 

  14. Boyer, M., Skadron, K., Weimer, W.: Automated dynamic analysis of cuda programs. In: Third Workshop on Software Tools for MultiCore Systems (2008)

    Google Scholar 

  15. Hou, Q., Zhou, K., Guo, B.: Debugging gpu stream programs through automatic dataflow recording and visualization. ACM Transactions on Graphics, TOG (2009)

    Google Scholar 

  16. Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: Grace: a low-overhead mechanism for detecting data races in gpu programs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (2011)

    Google Scholar 

  17. Bekar, U.C., Elmas, T., Okur, S., Tasiran, S.: KUDA: GPU accelerated split race checker. In: Workshop on Determinism and Correctness in Parallel Programming, WoDet (2012)

    Google Scholar 

  18. He, G., Zhai, A.: Improving the performance of program monitors with compiler support in multi-core environment. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2010)

    Google Scholar 

  19. Chen, S., Falsafi, B., Gibbons, P.B., Kozuch, M., Mowry, T.C., Teodorescu, R., Ailamaki, A., Fix, L., Ganger, G.R., Lin, B., Schlosser, S.W.: Log-based architectures for general-purpose monitoring of deployed code. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (2006)

    Google Scholar 

  20. Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of ACM (1970)

    Google Scholar 

  21. Carter, J.L., Wegman, M.N.: Universal Classes of Hash Functions. In: ACM Symposium on Theory of Computing (1977)

    Google Scholar 

  22. Xu, M., Bodik, R., Hill, M.: A ”flight data recorder” for enabling full-system multiprocessor deterministic replay. In: International Symposium on Computer Architecture, ISCA (2003)

    Google Scholar 

  23. Prvulovic, M., Zhang, Z., Torrellas, J.: Revive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In: International Symposium on Computer Architecture, ISCA (2002)

    Google Scholar 

  24. NVIDIA Corporation: NVIDIA CUDA C Programming Guide, http://www.nvidia.com

  25. Xiao, S., Feng, W.C.: Inter-block gpu communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel Distributed Processing, IPDPS (2010)

    Google Scholar 

  26. Gonzalez-Alberquilla, R., Strauss, K., Ceze, L., Piñuel, L.: Accelerating data race detection with minimal hardware support. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 27–38. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  27. Sack, P., Bliss, B.E., Ma, Z., Petersen, P., Torrellas, J.: Accurate and efficient filtering for the intel thread checker race detector. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (2006)

    Google Scholar 

  28. Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. Computer (2002)

    Google Scholar 

  29. Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News (2005)

    Google Scholar 

  30. Bakhoda, A., Yuan, G., Fung, W., Wong, H., Aamodt, T.: Analyzing cuda workloads using a detailed gpu simulator. In: International Symposium on Performance Analysis of Systems and Software, ISPASS (2009)

    Google Scholar 

  31. Agarwal, N., Krishna, T., Peh, L.S., Jha, N.: GARNET: A Detailed On-chip Network Model Inside a Full-system Simulator. In: ISPASS (2009)

    Google Scholar 

  32. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: International Conference on Parallel Architectures and Compilation Techniques, PACT (2008)

    Google Scholar 

  33. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: International Symposium on Computer Architecture, ISCA (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mekkat, V., Holey, A., Zhai, A. (2013). Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores. In: Legay, A., Bensalem, S. (eds) Runtime Verification. RV 2013. Lecture Notes in Computer Science, vol 8174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40787-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40787-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40786-4

  • Online ISBN: 978-3-642-40787-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics