Skip to main content

Testing for Race Conditions in Distributed Systems via SMT Solving

  • Conference paper
  • First Online:
Tests and Proofs (TAP 2020)

Abstract

Data races, a condition where two memory accesses to the same memory location occur concurrently, have been shown to be a major source of concurrency bugs in distributed systems. Unfortunately, data races are often triggered by non-deterministic event orderings that are hard to detect when testing complex distributed systems.

In this paper, we propose Spider, an automated tool for identifying data races in distributed system traces. Spider encodes the causal relations between the events in the trace as a symbolic constraint model, which is then fed into an SMT solver to check for the presence of conflicting concurrent accesses. To reduce the constraint solving time, Spider employs a pruning technique aimed at removing redundant portions of the trace.

Our experiments with multiple benchmarks show that Spider is effective in detecting data races in distributed executions in a practical amount of time, providing evidence of its usefulness as a testing tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The original ReX algorithm does not take into consideration the existence of fork and join events signaling the creation and joining of threads.

  2. 2.

    Note that the second pass is performed after ReX, so any memory access existing in the block is guaranteed to be non-redundant. Thus, condition i) is automatically satisfied when block \(\beta \) does not contain any memory accesses.

References

  1. Apache Hadoop. http://hadoop.apache.org

  2. Apache HBase. http://hbase.apache.org

  3. Cook, S.A.: The complexity of theorem-proving procedures. In: STOC 1971. ACM (1971)

    Google Scholar 

  4. The Coq proof assistant. https://coq.inria.fr/

  5. Elwakil, M., Yang, Z., Wang, L., Chen, Q.: Message race detection for web services by an SMT-based analysis. In: Xie, B., Branke, J., Sadjadi, S.M., Zhang, D., Zhou, X. (eds.) ATC 2010. LNCS, vol. 6407, pp. 182–194. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16576-4_13

    Chapter  Google Scholar 

  6. Feng, Y., Martins, R., Bastani, O., Dillig, I.: Program synthesis using conflict-driven learning. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018. ACM (2018)

    Google Scholar 

  7. Flanagan, C., Freund, S.N.: FastTrack: efficient and precise dynamic race detection. In: PLDI 2009 (2009)

    Google Scholar 

  8. Guo, H., Wu, M., Zhou, L., Hu, G., Yang, J., Zhang, L.: Practical software model checking via dynamic interface reduction. In: SOSP 2011 (2011)

    Google Scholar 

  9. Hawblitzel, C., et al.: IronFleet: proving practical distributed systems correct. In: SOSP 2015. ACM (2015)

    Google Scholar 

  10. Huang, J.: Stateless model checking concurrent programs with maximal causality reduction. In: PLDI 2015. ACM (2015)

    Google Scholar 

  11. Huang, J., Rajagopalan, A.K.: What’s the optimal performance of precise dynamic race detection? - a redundancy perspective. In: ECOOP (2017)

    Google Scholar 

  12. Huang, J., Zhang, C., Dolby, J.: CLAP: recording local executions to reproduce concurrency failures. In: PLDI 2013. ACM (2013)

    Google Scholar 

  13. Kasikci, B., Zamfir, C., Candea, G.: Data races vs. data race bugs: telling the difference with portend. In: ASPLOS 2012. ACM (2012)

    Google Scholar 

  14. Killian, C., Anderson, J.W., Jhala, R., Vahdat, A.: Life, death, and the critical transition: finding liveness bugs in systems code. In: NSDI 2007. USENIX Association (2007)

    Google Scholar 

  15. Lamport, L.: The TLA+ home page. https://lamport.azurewebsites.net/tla/tla.html. Accessed 10 Oct 2019

  16. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)

    Article  Google Scholar 

  17. Leesatapornwongsa, T., Hao, M., Joshi, P., Lukman, J.F., Gunawi, H.S.: SAMC: semantic-aware model checking for fast discovery of deep bugs in cloud systems. In: OSDI 2014. USENIX Association (2014)

    Google Scholar 

  18. Leesatapornwongsa, T., Lukman, J.F., Lu, S., Gunawi, H.S.: TaxDC: a taxonomy of non-deterministic concurrency bugs in datacenter distributed systems. In: ASPLOS 2016. ACM (2016)

    Google Scholar 

  19. Li, G., Lu, S., Musuvathi, M., Nath, S., Padhye, R.: Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing. In: SOSP 2019. ACM (2019)

    Google Scholar 

  20. Liu, H., et al.: DCatch: automatically detecting distributed concurrency bugs in cloud systems. SIGOPS Oper. Syst. Rev. 51(2), 677–691 (2017)

    Article  Google Scholar 

  21. Machado, N., Maia, F., Neves, F., Coelho, F., Pereira, J.: Minha: large-scale distributed systems testing made practical. In: OPODIS 2019. Leibniz International Proceedings in Informatics (LIPIcs) (2019)

    Google Scholar 

  22. Machado, N.: TaxDC Micro-benchmarks Repository (2018). https://github.com/jcp19/micro-benchmarks

  23. Machado, N., Lucia, B., Rodrigues, L.: Concurrency debugging with differential schedule projections. In: PLDI 2015. ACM (2015)

    Google Scholar 

  24. Machado, N., Lucia, B., Rodrigues, L.: Production-guided concurrency debugging. In: PPoPP 2016. ACM (2016)

    Google Scholar 

  25. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24

    Chapter  Google Scholar 

  26. Neves, F., Machado, N., Pereira, J.: Falcon: a practical log-based analysis tool for distributed systems. In: DSN 2018 (2018)

    Google Scholar 

  27. Padon, O., McMillan, K.L., Panda, A., Sagiv, M., Shoham, S.: Ivy: safety verification by interactive generalization. SIGPLAN Not. 51(6), 614–630 (2016). https://doi.org/10.1145/2980983.2908118

    Article  Google Scholar 

  28. Popper, N.: The stock market bell rings, computers fail, wall street cringes (2015). https://www.nytimes.com/2015/07/09/business/dealbook/new-york-stock-exchange-suspends-trading.html. Accessed 06 Aug 2019

  29. Simsa, J., Bryant, R., Gibson, G.: DBug: Systematic evaluation of distributed systems. In: Proceedings of the 5th International Conference on Systems Software Verification, SSV 2010, p. 3. USENIX Association, Berkeley (2010)

    Google Scholar 

  30. SMT-LIB: Logics. http://smtlib.cs.uiowa.edu/logics.shtml

  31. Terra-Neves, M., Machado, N., Lynce, I., Manquinho, V.: Concurrency debugging with MaxSMT. In: AAAI 2019. AAAI Press (2019)

    Google Scholar 

  32. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region (2011). https://aws.amazon.com/message/65648/. Accessed 03 Mar 2019

  33. Tekken Valapil, V., Yingchareonthawornchai, S., Kulkarni, S., Torng, E., Demirbas, M.: Monitoring partially synchronous distributed systems using SMT solvers. In: Lahiri, S., Reger, G. (eds.) RV 2017. LNCS, vol. 10548, pp. 277–293. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67531-2_17

    Chapter  Google Scholar 

  34. Voulgaris, S., Gavidia, D., van Steen, M.: CYCLON: inexpensive membership management for unstructured P2P overlays. J. Netw. Syst. Manag. 13(2), 197–217 (2005)

    Article  Google Scholar 

  35. Wilcox, J.R., et al.: Verdi: A framework for implementing and formally verifying distributed systems. In: PLDI 2015. ACM (2015)

    Google Scholar 

  36. Woos, D., Wilcox, J.R., Anton, S., Tatlock, Z., Ernst, M.D., Anderson, T.: Planning for change in a formal verification of the Raft consensus protocol. In: CPP 2016. ACM (2016)

    Google Scholar 

  37. Yang, J., et al.: MODIST: transparent model checking of unmodified distributed systems. In: NSDI 2009. USENIX Association (2009)

    Google Scholar 

Download references

Acknowledgements

This work is financed by the ERDF - European Regional Development Fund through the North Portugal Regional Operational Programme - NORTE2020 Programme and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project NORTE-01-0145-FEDER-028550-PTDC/EEI-COM/28550/2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Carlos Pereira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pereira, J.C., Machado, N., Sousa Pinto, J. (2020). Testing for Race Conditions in Distributed Systems via SMT Solving. In: Ahrendt, W., Wehrheim, H. (eds) Tests and Proofs. TAP 2020. Lecture Notes in Computer Science(), vol 12165. Springer, Cham. https://doi.org/10.1007/978-3-030-50995-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-50995-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-50994-1

  • Online ISBN: 978-3-030-50995-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics