Advertisement

The Journal of Supercomputing

, Volume 75, Issue 12, pp 7872–7894 | Cite as

Tuning lock-based multicore program based on sliding windows to tolerate data race

  • Suxia Zhu
  • Zhigang ChenEmail author
  • Guanglu Sun
Article
  • 23 Downloads

Abstract

Because in-house debugging and test are difficult to discover all potential data races in multicore programs, it is necessary and significant to tolerate the potential data races in the production-run phase to secure the correct execution. However, the existing tolerating methods are limited to some kinds of data races. This paper proposes a new data-race tolerating approach, which can detect and adjust the data races whether it is in the protection of critical section or lack of protection to improve the correctness of multicore programs. It uses sliding windows to accommodate the memory instructions in critical section or recent memory instructions lack of protection and detects the potential data races which are more likely to cause errors. Then, by delaying the critical reversion points, data races are adjusted to reduce the probability of software failure. To implement the tolerating approach, the current multicore processor need not change its original cache coherence protocol and just adds very little hardware. Simulation results show that it brings low hardware, low bandwidth overhead, and negligible slowdown.

Keywords

Multicore program Data-race detection and tolerance Concurrency bug Sliding window 

Notes

Acknowledgements

The research has been supported by National Natural Youth Science Foundation of China (61502123), Heilongjiang Provincial Youth Science Foundation (QC2015084), and the National Key R&D Plan of China (2017YFB1302701). We thank the anonymous reviewers and our group members for their comments.

References

  1. 1.
    Netzer RHB, Miller BP (1992) What are race conditions?: some issues and formalizations. ACM Lett Program Lang Syst (LOPLAS) 1(1):74–88CrossRefGoogle Scholar
  2. 2.
    Wu J, Cui H, Yang J (2010) Bypassing races in live applications with execution filters. OSDI 10:1–3Google Scholar
  3. 3.
    Ratanaworabhan P et al (2012) Efficient runtime detection and toleration of asymmetric races. IEEE Trans Comput 61(4):548–562MathSciNetCrossRefGoogle Scholar
  4. 4.
    Rajamani S, Ramalingam G, Ranganath VP, Vaswani K (2009) ISOLATOR: dynamically ensuring isolation in concurrent programs. ASPLOS 44:181–192CrossRefGoogle Scholar
  5. 5.
    Qi, S et al (2012) Pacman: tolerating asymmetric data races with unintrusive hardware. In: IEEE 18th International Symposium on High Performance Computer Architecture (HPCA), IEEEGoogle Scholar
  6. 6.
    Qi, S et al (2014) Dynamically detecting and tolerating if-condition data races. In: IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), IEEE, 2014Google Scholar
  7. 7.
    Orosa L, Lourenço J (2016) A hardware approach to detect, expose and tolerate high level data races. In: The 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). IEEE, pp 159–167Google Scholar
  8. 8.
    Lucia B, Ceze L, Strauss K (2010) ColorSafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations. ACM SIGARCH Comput Arch News 38(3):222–233CrossRefGoogle Scholar
  9. 9.
    Marathe VJ, Dice D (2014) Systems and methods for detecting and tolerating atomicity violations between concurrent code blocks. U.S. Patent No. 8,732,682Google Scholar
  10. 10.
    Abadi M, Harris T, Mehrara M (2009) Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingGoogle Scholar
  11. 11.
    Lucia B, Devietti J, Strauss K, Ceze L (2008) Atom-aid: detecting and surviving atomicity violations. In: International Symposium on Computer ArchitectureGoogle Scholar
  12. 12.
    Jin G et al (2012) Automated concurrency-bug fixing. OSDI 12(2012):221–236Google Scholar
  13. 13.
    Yu J, Narayanasamy S (2010) Tolerating concurrency bugs using transactions as lifeguards. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer SocietyGoogle Scholar
  14. 14.
    Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The Splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp 24–36Google Scholar
  15. 15.
    SDTimes. Testers spend too much time testing. http://www.sdtimes.com/SearchResult/31134. Accessed 2012
  16. 16.
    Muzahid A, Suárez D, Qi S et al (2009) SigRace: signature-based data race detection. ACM SIGARCH Comput Arch News 37(3):337–348CrossRefGoogle Scholar
  17. 17.
    Savage S, Burrows M, Nelson G et al (1997) Eraser: a dynamic data race detector for multithreaded programs. ACM Trans Comput Syst (TOCS) 15(4):391–411CrossRefGoogle Scholar
  18. 18.
    Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. CACM 13(7):422–426CrossRefGoogle Scholar
  19. 19.
    Lusk E, Boyle J, Butler R, Disz T, Glickfeld B, Overbeek R, Patterson J, Stevens R (1988) Portable programs for parallel processors. Rinehart & Winston, HoltGoogle Scholar
  20. 20.
    Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Arch News 33:92–99CrossRefGoogle Scholar
  21. 21.
    Orosa L, Lourenço J (2014) A hardware approach for detecting, exposing and tolerating high level atomicity violations. In: Workshop on Dependable Multicore and Transactional Memory Systems (DMTM)Google Scholar
  22. 22.
    Lucia B, Ceze L (2013) Cooperative empirical failure avoidance for multithreaded programs. ACM SIGPLAN Notices 48(4):39–50CrossRefGoogle Scholar
  23. 23.
    Krena B, Letko Z, Tzoref R, Ur S, Vojnar T (2007) Healing data races on-the-fly. In: ACM Workshop on Parallel and Distributed Systems: Testing and DebuggingGoogle Scholar
  24. 24.
    Ratanaworabhan P et al (2012) Hardware support for enforcing isolation in lock-based parallel programs. In: Proceedings of the 26th ACM International Conference on Supercomputing. ACMGoogle Scholar
  25. 25.
    Zhang W et al (2013) ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution. ACM SIGARCH Comput Arch News 41(1):113–126Google Scholar
  26. 26.
    Liu P, Tripp O, Zhang C (2014) Grail: context-aware fixing of concurrency bugs. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACMGoogle Scholar
  27. 27.
    Tchamgoue GM, Kim KH, Jun YK (2016) EventHealer: bypassing data races in event-driven programs. J Syst Softw 118:208–220CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin University of Science and TechnologyHarbinChina
  2. 2.School of Mechatronics EngineeringHarbin Institute of TechnologyHarbinChina

Personalised recommendations