Abstract
Application-level nondeterminism can lead to inconsistent state that defeats the purpose of replication as a fault-tolerance strategy. We present Midas, a new approach for living with nondeterminism in distributed, replicated, middleware applications. Midas exploits (i) the static program analysis of the application’s source code prior to replica deployment and (ii) the online compensation of replica divergence even as replicas execute. We identify the sources of nondeterminism within the application, discriminate between actual and superficial nondeterminism, and track the propagation of actual nondeterminism. We evaluate our techniques for the active replication of servers using micro-benchmarks that contain various sources (multi-threading, system calls and propagation) of nondeterminism.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aigner, G., Diwan, A., Heine, D.L., Lam, M.S., Moore, D.L., Murphy, B.R., Sapuntzakis, C.: The Basic SUIF Programming Guide
Amir, Y., Danilov, C., Stanton, J.: A low latency, loss tolerant architecture and protocol for wide area group communication. In: The International Conference on Dependable Systems and Networks, New York, NY, June 2000, pp. 327–336 (2000)
Barrett, P., Bond, P., Hilborne, A., Rodrigues, L., Seaton, D., Speirs, N., Verissimo, P.: The Delta-4 extra performance architecture (XPA). In: Fault-Tolerant Computing Symposium, Newcastle, UK, June 1990, pp. 481–488 (1990)
Basile, C., Kalbarczyk, Z., Iyer, R.: A preemptive deterministic scheduling algorithm for multithreaded replicas. In: The International Conference on Dependable Systems and Networks, San Francisco, CA, pp. 149–158 (June 2003)
Bestaoui, S.: One solution for the nondeterminism problem in the SCEPTRE 2 fault tolerance technique. In: Euromicro Workshop on Real-Time Systems, Odense, Denmark, June 1995, pp. 352–358 (1995)
Bressoud, T.C.: TFT: A software system for application-transparent fault tolerance. In: Fault-Tolerant Computing Symposium, Munich, Germany, pp. 128–137 (June 1998)
Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault-tolerance. ACM Transactions on Computer Systems 14(1), 90–107 (1996)
Frolund, S., Guerraoui, R.: X-ability: A theory of replication. In: Principles of Distributed Computing, Portland, OR, pp. 229–237 (2000)
Gaifman, H., Maher, M.J., Shapiro, E.: Replay, recovery, replication, and snapshots of nondeterministic concurrent programs. In: Principles of Distributed Computing, Montreal, Canada, August 1991, pp. 241–255 (1991)
Jimenez-Peris, R., Patino-Martinez, M., Arevalo, S.: Deterministic scheduling for transactional multithreaded replicas. In: Symposium on Reliable Distributed Systems, Nurnberg, Germany, October 2000, pp. 164–173 (2000)
Narasimhan, P., Dumitraş, T.A., Pertet, S.M., Reverte, C.F., Slember, J.G., Srivastava, D.: MEAD: Support for real-time fault-tolerant CORBA. Concurrency and Computation: Practice and Experience 17(12), 1527–1545 (2005)
Narasimhan, P., Moser, L.E., Melliar-Smith, P.M.: Enforcing determinism for the consistent replication of multithreaded CORBA applications. In: Symposium on Reliable Distributed Systems, Lausanne, Switzerland, October 1999, pp. 263–273 (1999)
Object Management Group. Fault Tolerant CORBA. OMG Technical Committee Document formal/2001-09-29 (September 2001)
Orgiyan, M., Fetzer, C.: Tapping TCP streams. In: IEEE International Symposium on Network Computing and Applications, Cambridge, MA, pp. 278–289 (October 2001)
Poledna, S.: Replica Determinism in Fault-Tolerant Real-Time Systems. PhD thesis, Technical University of Vienna, Vienna, Austria (April 1994)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22(4), 299–319 (1990)
Slember, J.G., Narasimhan, P.: Exploiting program analysis to identify and sanitize nondeterminism in fault-tolerant, replicated systems. In: Symposium on Reliable Distributed Systems, Florianopolis, Brazil, October 2004, pp. 251–263 (2004)
Slember, J.G., Narasimhan, P.: Nondeterminism in ORBs: The perception and the reality. In: Workshop on High Availability of Distributed Systems, Krakow, Poland (September 2006)
Slye, J.H., Elnozahy, E.N.: Supporting nondeterministic execution in fault-tolerant systems. In: Fault-Tolerant Computing Symposium, Sendai, Japan, June 1996, pp. 250–259 (1996)
Vogels, W., van Renesse, R., Birman, K.: Six misconceptions about reliable distributed computing. In: ACM Special Interest Group on Operating Systems, European Workshop, Sintra, Portugal (September 1998)
White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., Joglekar, A.: An integrated experimental environment for distributed systems and networks. In: Symposium on Operating Systems Design and Implementation, Boston, MA, December 2002, pp. 255–270 (2002)
Wolf, T.: Replication of Non-Deterministic Objects. PhD thesis, Ecole Polytechnique Federale de Lausanne, Switzerland (November 1988)
Zagorodnov, D., Marzullo, K.: Managing self-inflicted nondeterminism. In: HotDep, International Conference on Dependable Systems and Networks, Yokohama, Japan, June 2005, pp. 323–328 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 IFIP International Federation for Information Processing
About this paper
Cite this paper
Slember, J., Narasimhan, P. (2006). Living with Nondeterminism in Replicated Middleware Applications. In: van Steen, M., Henning, M. (eds) Middleware 2006. Middleware 2006. Lecture Notes in Computer Science, vol 4290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925071_5
Download citation
DOI: https://doi.org/10.1007/11925071_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49023-4
Online ISBN: 978-3-540-68256-1
eBook Packages: Computer ScienceComputer Science (R0)