Abstract
When distributed applications are replicated for fault tolerance, the presence of even a single nondeterministic service can lead to emergent system-wide nondeterminism that compromises replica consistency. Our approach, Midas identifies and addresses multiple sources of nondeterminism (including system calls, multithreading, etc.) in a multi-service replicated distributed architecture. Midas involves a synergistic combination of compile-time dependency, concurrency and nondeterminism analyses, followed by the performance-sensitive compensation of nondeterminism at runtime. This approach upholds existing application semantics and allows services to continue to be nondeterministic, while yet maintaining their replicas consistent. We demonstrate Midas’ scalability through a microbenchmark that shows the underlying tradeoffs under different kinds of dependencies between clients, services and invocations in a distributed system. We also validate our claims by modeling a representative multi-service application using Java Pathfinder.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alvisi, L., Napper, J.: Transparent Fault Tolerant Java Virtual Machine. In: DSN, San Francisco, CA, pp. 425–434 (June 2003)
Alvisi, L., Elnozahy, E., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)
Amir, Y., et al.: A low latency, loss tolerant architecture and protocol for wide area group communication. In: DSN, New York, pp. 327–336 (June 2000)
Barrett, P., et al.: The Delta-4 extra performance architecture (XPA). In: FTCS, pp. 481–488 (1990)
Basile, C., et al.: A preemptive deterministic scheduling algorithm for multithreaded replicas. In: DSN, San Francisco, CA, pp. 149–158 (June 2003)
Bestaoui, S.: One solution for the nondeterminism problem in the SCEPTRE 2 fault tolerance technique. In: Euromicro Workshop on Real-Time Systems, Odense, Denmark, pp. 352–358 (June 1995)
Bressoud, T.C.: TFT: A software system for application-transparent fault tolerance. In: FTCS, Munich, Germany, pp. 128–137 (June 1998)
Bressoud, T.C., et al.: Hypervisor-based fault-tolerance. ACM Transactions on Computer Systems 14(1), 90–107 (1996)
Budhiraja, N., Marzullo, K., Schneider, F., Toueg, S.: Distributed Systems. In: The Primary-Backup Approach, ch.8, 2nd edn., pp. 199–216 (1993)
Frolund, S., et al.: X-ability: A theory of replication. In: PODC, Portland, OR, pp. 229–237 (2000)
Gaifman, H., et al.: Replay, recovery, replication, and snapshots of nondeterministic concurrent programs. In: PODC, Montreal, Canada, pp. 241–255 (August 1991)
Jimenez-Peris, R., et al.: Deterministic scheduling for transactional multithreaded replicas. In: SRDS, pp. 164–173 (2000)
Orgiyan, M., et al.: Tapping TCP streams. In: IEEE International Symposium on Network Computing and Applications, Cambridge, MA, pp. 278–289 (October 2001)
Poledna, S.: Replica Determinism in Fault-Tolerant Real-Time Systems. PhD thesis, Technical University of Vienna, Vienna, Austria (April 1994)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22(4), 299–319 (1990)
Slember, J.G., et al.: Nondeterminism in ORBs: The perception and the reality. In: Workshop on High Availability in Distributed Systems, Krakow, Poland (September 2006)
Slember, J.G., et al.: Living with nondeterminism in replicated middleware systems. In: Middleware, Melbourne, Australia, pp. 81–100 (November 2006)
Slye, J.H., et al.: Supporting nondeterministic execution in fault-tolerant systems. In: FTCS, Sendai, Japan, pp. 250–259 (June 1996)
Taiani, F., et al.: A multi-level meta-object protocol for fault-tolerance in complex architectures. In: DSN, Yokohama, Japan, pp. 270–279 (June 2005)
White, B., et al.: An integrated experimental environment for distributed systems and networks. In: OSDI, Boston, MA, pp. 255–270 (December 2002)
Wolf, T.: Replication of Non-Deterministic Objects. PhD thesis, Ecole Polytechnique Federale de Lausanne, Switzerland (November 1988)
Zagorodnov, D., et al.: Managing self-inflicted nondeterminism. In: HotDep, Yokohama, Japan (June 2005)
Visser, W., et al.: Model Checking Programs. Automated Software Engineering Journal 10(2) (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Slember, J., Narasimhan, P. (2008). Handling Emergent Nondeterminism in Replicated Services. In: de Lemos, R., Di Giandomenico, F., Gacek, C., Muccini, H., Vieira, M. (eds) Architecting Dependable Systems V. Lecture Notes in Computer Science, vol 5135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85571-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-85571-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85570-5
Online ISBN: 978-3-540-85571-2
eBook Packages: Computer ScienceComputer Science (R0)