A Consensus-Based Framework for Responsive Computer System Design
The concept of responsive computer systems is presented. The emerging discipline of responsive systems demands fault-tolerant and real-time performance in both parallel and distributed computing environments. The responsiveness measure is discussed and a new design framework for responsive systems is introduced. The new framework is based on the fundamental concept of consensus and on application specific responsiveness. It is shown that consensus is crucial in responsive synchronization, communication, diagnosis, and reconfiguration.
It is also illustrated how these tasks form a part of the consensus-based operating system, and, when combined with application specific methods, handle fault-tolerance and real-time issues germane to a given application. This approach seems to be the most appropriate for the design of fault-tolerant, real-time, parallel/distributed systems.
KeywordsFault Diagnosis Fault Tolerance Consensus Problem Consensus Algorithm Consensus Protocol
Unable to display preview. Download preview PDF.
- 1.M. Malek, Responsive Systems: A Marriage between Real Time and Fault Tolerance, Keynote Address, Proceedings of the 5th International GI/ITG/GMA Conference on Fault-Tolerant Computing Systems, Nürnberg, Germany, Springer-Verlag, Informatik-Fachberichte 283, September 25, 1991, 1–17.Google Scholar
- 2.M. Malek, Responsive Systems: A Challenge for the Nineties, EuroMicro 90: Microprocessing and Microprogramming, August 30, 1990, 1–5.Google Scholar
- 3.M. Barborak, M. Malek and A.T. Dahbura, Consensus Problem in Fault-Tolerant Computing,Department of Computer Sciences, The University of Texas at Austin, November 1991, No. TR-91–40.Google Scholar
- 4.M. Barborak and M. Malek, Partitioning for Efficient Consensus, Technical Report, Department of Electrical and Computer Engineering, The University of Texas at Austin, May 1992.Google Scholar
- 8.M. Fischer, N. Lynch and M. Paterson, Impossibility of Distributed Consensus with One Faulty Process, Journal of the ACM, Vol. 32, No. 2, April 1985, 374382.Google Scholar
- 9.A. Johnson and M. Malek, Rainbow nets for system analysis,IBM Systems Technology Division, Austin, Texas, IBM Technical Report, No. TR 51.0565, September 1989Google Scholar
- 10.L. Larenjeira, M. Malek and R. Jenevein, NEST: A Nested Predicate Scheme for Fault Tolerance, to appear in IEEE Transactions on Computers.Google Scholar
- 11.L. Laranjeira, M. Malek and R. Jenevein, An Tolerating Faults in Naturally Redundant Algorithms, The 10th Symposium on Reliable Distributed Systems, Pisa, Italy, September 1991, 118–127.Google Scholar
- 13.K.H. Huang and J.A. Abraham, Algorithm-Based Fault Tolerance for Matrix Operations, IEEE Transactions on Software Engineering, Vol. SE-33, No. 6, June 1984, 518–528.Google Scholar
- 16.L. Laranjeira, M. Malek R. Jenevein, Space/Time Overhead Analysis and Experiments with Fault-Tolerant Techniques, The Third IFIP Working Conference on Dependable Computing for Critical Applications, Palermo, Italy, September 1992.Google Scholar
- 17.M. Malek, M. Guruswamy, H. Owens, M. Pandya, A Hybrid Algorithm Technique, Department of Computer Sciences, The University of Texas at Austin, No. TR-89–06, 1989.Google Scholar
- 19.S.S. Lam and A U Shankar, Protocol Verification via Projections, IEEE Transactions on Software Engineering, July 1984, Vol. SE-10, No. 4, 325–342.Google Scholar