Testing for Race Conditions in Distributed Systems via SMT Solving

  • João Carlos PereiraEmail author
  • Nuno Machado
  • Jorge Sousa Pinto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12165)


Data races, a condition where two memory accesses to the same memory location occur concurrently, have been shown to be a major source of concurrency bugs in distributed systems. Unfortunately, data races are often triggered by non-deterministic event orderings that are hard to detect when testing complex distributed systems.

In this paper, we propose Spider, an automated tool for identifying data races in distributed system traces. Spider encodes the causal relations between the events in the trace as a symbolic constraint model, which is then fed into an SMT solver to check for the presence of conflicting concurrent accesses. To reduce the constraint solving time, Spider employs a pruning technique aimed at removing redundant portions of the trace.

Our experiments with multiple benchmarks show that Spider is effective in detecting data races in distributed executions in a practical amount of time, providing evidence of its usefulness as a testing tool.



This work is financed by the ERDF - European Regional Development Fund through the North Portugal Regional Operational Programme - NORTE2020 Programme and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project NORTE-01-0145-FEDER-028550-PTDC/EEI-COM/28550/2017.


  1. 1.
  2. 2.
  3. 3.
    Cook, S.A.: The complexity of theorem-proving procedures. In: STOC 1971. ACM (1971)Google Scholar
  4. 4.
    The Coq proof assistant.
  5. 5.
    Elwakil, M., Yang, Z., Wang, L., Chen, Q.: Message race detection for web services by an SMT-based analysis. In: Xie, B., Branke, J., Sadjadi, S.M., Zhang, D., Zhou, X. (eds.) ATC 2010. LNCS, vol. 6407, pp. 182–194. Springer, Heidelberg (2010). Scholar
  6. 6.
    Feng, Y., Martins, R., Bastani, O., Dillig, I.: Program synthesis using conflict-driven learning. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018. ACM (2018)Google Scholar
  7. 7.
    Flanagan, C., Freund, S.N.: FastTrack: efficient and precise dynamic race detection. In: PLDI 2009 (2009)Google Scholar
  8. 8.
    Guo, H., Wu, M., Zhou, L., Hu, G., Yang, J., Zhang, L.: Practical software model checking via dynamic interface reduction. In: SOSP 2011 (2011)Google Scholar
  9. 9.
    Hawblitzel, C., et al.: IronFleet: proving practical distributed systems correct. In: SOSP 2015. ACM (2015)Google Scholar
  10. 10.
    Huang, J.: Stateless model checking concurrent programs with maximal causality reduction. In: PLDI 2015. ACM (2015)Google Scholar
  11. 11.
    Huang, J., Rajagopalan, A.K.: What’s the optimal performance of precise dynamic race detection? - a redundancy perspective. In: ECOOP (2017)Google Scholar
  12. 12.
    Huang, J., Zhang, C., Dolby, J.: CLAP: recording local executions to reproduce concurrency failures. In: PLDI 2013. ACM (2013)Google Scholar
  13. 13.
    Kasikci, B., Zamfir, C., Candea, G.: Data races vs. data race bugs: telling the difference with portend. In: ASPLOS 2012. ACM (2012)Google Scholar
  14. 14.
    Killian, C., Anderson, J.W., Jhala, R., Vahdat, A.: Life, death, and the critical transition: finding liveness bugs in systems code. In: NSDI 2007. USENIX Association (2007)Google Scholar
  15. 15.
    Lamport, L.: The TLA+ home page. Accessed 10 Oct 2019
  16. 16.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)CrossRefGoogle Scholar
  17. 17.
    Leesatapornwongsa, T., Hao, M., Joshi, P., Lukman, J.F., Gunawi, H.S.: SAMC: semantic-aware model checking for fast discovery of deep bugs in cloud systems. In: OSDI 2014. USENIX Association (2014)Google Scholar
  18. 18.
    Leesatapornwongsa, T., Lukman, J.F., Lu, S., Gunawi, H.S.: TaxDC: a taxonomy of non-deterministic concurrency bugs in datacenter distributed systems. In: ASPLOS 2016. ACM (2016)Google Scholar
  19. 19.
    Li, G., Lu, S., Musuvathi, M., Nath, S., Padhye, R.: Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing. In: SOSP 2019. ACM (2019)Google Scholar
  20. 20.
    Liu, H., et al.: DCatch: automatically detecting distributed concurrency bugs in cloud systems. SIGOPS Oper. Syst. Rev. 51(2), 677–691 (2017)CrossRefGoogle Scholar
  21. 21.
    Machado, N., Maia, F., Neves, F., Coelho, F., Pereira, J.: Minha: large-scale distributed systems testing made practical. In: OPODIS 2019. Leibniz International Proceedings in Informatics (LIPIcs) (2019)Google Scholar
  22. 22.
    Machado, N.: TaxDC Micro-benchmarks Repository (2018).
  23. 23.
    Machado, N., Lucia, B., Rodrigues, L.: Concurrency debugging with differential schedule projections. In: PLDI 2015. ACM (2015)Google Scholar
  24. 24.
    Machado, N., Lucia, B., Rodrigues, L.: Production-guided concurrency debugging. In: PPoPP 2016. ACM (2016)Google Scholar
  25. 25.
    de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). Scholar
  26. 26.
    Neves, F., Machado, N., Pereira, J.: Falcon: a practical log-based analysis tool for distributed systems. In: DSN 2018 (2018)Google Scholar
  27. 27.
    Padon, O., McMillan, K.L., Panda, A., Sagiv, M., Shoham, S.: Ivy: safety verification by interactive generalization. SIGPLAN Not. 51(6), 614–630 (2016). Scholar
  28. 28.
    Popper, N.: The stock market bell rings, computers fail, wall street cringes (2015). Accessed 06 Aug 2019
  29. 29.
    Simsa, J., Bryant, R., Gibson, G.: DBug: Systematic evaluation of distributed systems. In: Proceedings of the 5th International Conference on Systems Software Verification, SSV 2010, p. 3. USENIX Association, Berkeley (2010)Google Scholar
  30. 30.
  31. 31.
    Terra-Neves, M., Machado, N., Lynce, I., Manquinho, V.: Concurrency debugging with MaxSMT. In: AAAI 2019. AAAI Press (2019)Google Scholar
  32. 32.
    Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region (2011). Accessed 03 Mar 2019
  33. 33.
    Tekken Valapil, V., Yingchareonthawornchai, S., Kulkarni, S., Torng, E., Demirbas, M.: Monitoring partially synchronous distributed systems using SMT solvers. In: Lahiri, S., Reger, G. (eds.) RV 2017. LNCS, vol. 10548, pp. 277–293. Springer, Cham (2017). Scholar
  34. 34.
    Voulgaris, S., Gavidia, D., van Steen, M.: CYCLON: inexpensive membership management for unstructured P2P overlays. J. Netw. Syst. Manag. 13(2), 197–217 (2005)CrossRefGoogle Scholar
  35. 35.
    Wilcox, J.R., et al.: Verdi: A framework for implementing and formally verifying distributed systems. In: PLDI 2015. ACM (2015)Google Scholar
  36. 36.
    Woos, D., Wilcox, J.R., Anton, S., Tatlock, Z., Ernst, M.D., Anderson, T.: Planning for change in a formal verification of the Raft consensus protocol. In: CPP 2016. ACM (2016)Google Scholar
  37. 37.
    Yang, J., et al.: MODIST: transparent model checking of unmodified distributed systems. In: NSDI 2009. USENIX Association (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.HASLab - INESC TEC & U. MinhoBragaPortugal
  2. 2.Teradata IberiaMadridSpain

Personalised recommendations