Abstract
In the N-Modular Redundancy (NMR) approach, a computation is made reliable by executing it on several computers, and determining its results by a decision algorithm. This paper investigates a formal approach to the use of NMR in replicated distributed systems, for which it introduces a notion of correctness based on consistency with their non-replicated counterpart, and a local correctness criterion. We discuss how a replicated system component may be implemented by N base copies, a majority of which is non-faulty. The formal approach sheds light on the necessity of coordinating the copies and on the requirements they should satisfy; in particular the difficulty of replicating synchronous communication is pointed out. A practical approach is also briefly examined and shown to be consistent with the formal model.
Inside every replicated system there is a non-replicated system trying to get out.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Avizienis, A., Kelly, J.K.J., “Fault tolerance by design diversity: concepts and experiments”, IEEE Computer, vol. 17, no. 8, pp. 67–80, Aug. 1984.
Bird, R. S., “The promotion and accumulation strategies in transformational programming”, ACM Transactions on Programming Languages and Systems, vol. 6, no. 4, Oct. 1984.
Cooper, E, “Replicated distributed programs”, Proc. of the 10th ACM Sym. on Operating Systems Principles, pp. 63–78, Washington, Dic. 1985.
Goldberg, J., “SIFT: A provable fault-tolerant computer for aircraft flight control”, Inform. Processing 80 Proc. IFIP Congr., pp. 151–156, Tokyo, Japan, Oct. 1980.
Hoare, C.A.R., “Communicating sequential processes”, Prentice Hall International, 1985.
Koutny, M., and Mancini, L., “Synchronizing events in replicated computations”, Technical Report TR/237, Computing Laboratory, University of Newcastle upon Tyne, June 1987 (to appear in The Journal of Systems and Software).
Lamport, L., “The implementation of reliable distributed multiprocess sustems”, Computer Networks, pp. 95–114, vol. 2, no. 2, May 1978.
Lamport, L., “Time, clocks and the ordering of events in a distributed system”, Comm. ACM, vol. 21, no. 7, pp. 558–565, July 1978.
Lamport, L., Shostak, R., Pease, M., “The Byzantine Generals problem”, ACM Transactions on Programming Languages and Systems, pp. 382–401, vol. 4, no. 3, July 1982.
Lyons, R.E., Vanderkulk, W., “The use of triple-modular redundancy to improve computer reliability”, IBM Journal of Research and Development, pp. 200–209, vol. 6, no. 2, Apr. 1962.
Mancini, L., “Modular redundancy in a message passing system”, IEEE Trans. Software Eng., pp. 79–86, vol. SE-12, no. 1, Jan. 1986.
Mancini, L., Koutny, M., “Formal specification of N-modular redundancy”, 1986 ACM Computer Science Conference, pp. 199–204, Cincinnati, Ohio, Feb. 1986.
Mancini, L., Pappalardo, G., “The Join algorithm: ordering messages in replicated systems”, Safecomp '86, pp. 51–55, Sarlat, France, Oct. 1986.
Mancini, L., Pappalardo G., “On resolving nondeterminism in replicated distributed systems”, IFIP Conf. on Distributed Processing, Amsterdam, The Netherlands, Oct. 1987.
Mancini, L., Pappalardo G., “Proving correctness properties of a replicated synchronous program”, to appear in The Computer Journal.
Mancini, L., Shrivastava, S.K., “Exception handling in replicated systems with voting”, 16th Int. Conf. on Fault Tolerant Computing, pp. 384–389, Vienna, Austria, July 1986.
Melliar-Smith, P.M., Schwartz, R., “Formal specification and mechanical verification of SIFT: a fault-tolerant flight control system”, IEEE Trans. on Computers, vol. C-31, no. 7, pp. 616–630, July 1982.
Schneider, F.B., “Synchronization in distributed programs”, ACM Transactions on Programming Languages and Systems, vol. 4, no. 2, pp. 125–148, Apr. 1982.
Schneider, F.B., “The state machine approach”, in Paul, M., and Siegert, H.J. (eds.), Distributed systems — methods and tools for specification, an advanced course, LNCS vol. 190, pp. 444–454, Springer-Verlag, 1985.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1988 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mancini, L.V., Pappalardo, G. (1988). Towards a theory of replicated processing. In: Joseph, M. (eds) Formal Techniques in Real-Time and Fault-Tolerant Systems. FTRTFT 1988. Lecture Notes in Computer Science, vol 331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-50302-1_13
Download citation
DOI: https://doi.org/10.1007/3-540-50302-1_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-50302-6
Online ISBN: 978-3-540-45965-1
eBook Packages: Springer Book Archive