Abstract
Data races, a condition where two memory accesses to the same memory location occur concurrently, have been shown to be a major source of concurrency bugs in distributed systems. Unfortunately, data races are often triggered by non-deterministic event orderings that are hard to detect when testing complex distributed systems.
In this paper, we propose Spider, an automated tool for identifying data races in distributed system traces. Spider encodes the causal relations between the events in the trace as a symbolic constraint model, which is then fed into an SMT solver to check for the presence of conflicting concurrent accesses. To reduce the constraint solving time, Spider employs a pruning technique aimed at removing redundant portions of the trace.
Our experiments with multiple benchmarks show that Spider is effective in detecting data races in distributed executions in a practical amount of time, providing evidence of its usefulness as a testing tool.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The original ReX algorithm does not take into consideration the existence of fork and join events signaling the creation and joining of threads.
- 2.
Note that the second pass is performed after ReX, so any memory access existing in the block is guaranteed to be non-redundant. Thus, condition i) is automatically satisfied when block \(\beta \) does not contain any memory accesses.
References
Apache Hadoop. http://hadoop.apache.org
Apache HBase. http://hbase.apache.org
Cook, S.A.: The complexity of theorem-proving procedures. In: STOC 1971. ACM (1971)
The Coq proof assistant. https://coq.inria.fr/
Elwakil, M., Yang, Z., Wang, L., Chen, Q.: Message race detection for web services by an SMT-based analysis. In: Xie, B., Branke, J., Sadjadi, S.M., Zhang, D., Zhou, X. (eds.) ATC 2010. LNCS, vol. 6407, pp. 182–194. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16576-4_13
Feng, Y., Martins, R., Bastani, O., Dillig, I.: Program synthesis using conflict-driven learning. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018. ACM (2018)
Flanagan, C., Freund, S.N.: FastTrack: efficient and precise dynamic race detection. In: PLDI 2009 (2009)
Guo, H., Wu, M., Zhou, L., Hu, G., Yang, J., Zhang, L.: Practical software model checking via dynamic interface reduction. In: SOSP 2011 (2011)
Hawblitzel, C., et al.: IronFleet: proving practical distributed systems correct. In: SOSP 2015. ACM (2015)
Huang, J.: Stateless model checking concurrent programs with maximal causality reduction. In: PLDI 2015. ACM (2015)
Huang, J., Rajagopalan, A.K.: What’s the optimal performance of precise dynamic race detection? - a redundancy perspective. In: ECOOP (2017)
Huang, J., Zhang, C., Dolby, J.: CLAP: recording local executions to reproduce concurrency failures. In: PLDI 2013. ACM (2013)
Kasikci, B., Zamfir, C., Candea, G.: Data races vs. data race bugs: telling the difference with portend. In: ASPLOS 2012. ACM (2012)
Killian, C., Anderson, J.W., Jhala, R., Vahdat, A.: Life, death, and the critical transition: finding liveness bugs in systems code. In: NSDI 2007. USENIX Association (2007)
Lamport, L.: The TLA+ home page. https://lamport.azurewebsites.net/tla/tla.html. Accessed 10 Oct 2019
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)
Leesatapornwongsa, T., Hao, M., Joshi, P., Lukman, J.F., Gunawi, H.S.: SAMC: semantic-aware model checking for fast discovery of deep bugs in cloud systems. In: OSDI 2014. USENIX Association (2014)
Leesatapornwongsa, T., Lukman, J.F., Lu, S., Gunawi, H.S.: TaxDC: a taxonomy of non-deterministic concurrency bugs in datacenter distributed systems. In: ASPLOS 2016. ACM (2016)
Li, G., Lu, S., Musuvathi, M., Nath, S., Padhye, R.: Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing. In: SOSP 2019. ACM (2019)
Liu, H., et al.: DCatch: automatically detecting distributed concurrency bugs in cloud systems. SIGOPS Oper. Syst. Rev. 51(2), 677–691 (2017)
Machado, N., Maia, F., Neves, F., Coelho, F., Pereira, J.: Minha: large-scale distributed systems testing made practical. In: OPODIS 2019. Leibniz International Proceedings in Informatics (LIPIcs) (2019)
Machado, N.: TaxDC Micro-benchmarks Repository (2018). https://github.com/jcp19/micro-benchmarks
Machado, N., Lucia, B., Rodrigues, L.: Concurrency debugging with differential schedule projections. In: PLDI 2015. ACM (2015)
Machado, N., Lucia, B., Rodrigues, L.: Production-guided concurrency debugging. In: PPoPP 2016. ACM (2016)
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Neves, F., Machado, N., Pereira, J.: Falcon: a practical log-based analysis tool for distributed systems. In: DSN 2018 (2018)
Padon, O., McMillan, K.L., Panda, A., Sagiv, M., Shoham, S.: Ivy: safety verification by interactive generalization. SIGPLAN Not. 51(6), 614–630 (2016). https://doi.org/10.1145/2980983.2908118
Popper, N.: The stock market bell rings, computers fail, wall street cringes (2015). https://www.nytimes.com/2015/07/09/business/dealbook/new-york-stock-exchange-suspends-trading.html. Accessed 06 Aug 2019
Simsa, J., Bryant, R., Gibson, G.: DBug: Systematic evaluation of distributed systems. In: Proceedings of the 5th International Conference on Systems Software Verification, SSV 2010, p. 3. USENIX Association, Berkeley (2010)
SMT-LIB: Logics. http://smtlib.cs.uiowa.edu/logics.shtml
Terra-Neves, M., Machado, N., Lynce, I., Manquinho, V.: Concurrency debugging with MaxSMT. In: AAAI 2019. AAAI Press (2019)
Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region (2011). https://aws.amazon.com/message/65648/. Accessed 03 Mar 2019
Tekken Valapil, V., Yingchareonthawornchai, S., Kulkarni, S., Torng, E., Demirbas, M.: Monitoring partially synchronous distributed systems using SMT solvers. In: Lahiri, S., Reger, G. (eds.) RV 2017. LNCS, vol. 10548, pp. 277–293. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67531-2_17
Voulgaris, S., Gavidia, D., van Steen, M.: CYCLON: inexpensive membership management for unstructured P2P overlays. J. Netw. Syst. Manag. 13(2), 197–217 (2005)
Wilcox, J.R., et al.: Verdi: A framework for implementing and formally verifying distributed systems. In: PLDI 2015. ACM (2015)
Woos, D., Wilcox, J.R., Anton, S., Tatlock, Z., Ernst, M.D., Anderson, T.: Planning for change in a formal verification of the Raft consensus protocol. In: CPP 2016. ACM (2016)
Yang, J., et al.: MODIST: transparent model checking of unmodified distributed systems. In: NSDI 2009. USENIX Association (2009)
Acknowledgements
This work is financed by the ERDF - European Regional Development Fund through the North Portugal Regional Operational Programme - NORTE2020 Programme and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project NORTE-01-0145-FEDER-028550-PTDC/EEI-COM/28550/2017.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pereira, J.C., Machado, N., Sousa Pinto, J. (2020). Testing for Race Conditions in Distributed Systems via SMT Solving. In: Ahrendt, W., Wehrheim, H. (eds) Tests and Proofs. TAP 2020. Lecture Notes in Computer Science(), vol 12165. Springer, Cham. https://doi.org/10.1007/978-3-030-50995-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-50995-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50994-1
Online ISBN: 978-3-030-50995-8
eBook Packages: Computer ScienceComputer Science (R0)