Foundations of Dependable Computing pp 3-38 | Cite as

# Adaptive System-Level Diagnosis in Real-Time

## Abstract

Distributed real-time systems are subject to stricter fault-tolerance requirements than non-real time systems. This work presents an application of system-level diagnosis to a real-time distributed system as a first step in providing fault-tolerance. An existing algorithm for distributed system-level diagnosis, Adaptive_DSD, is converted to a real-time framework, establishing a deadline for the end-to-end diagnosis latency. Rate monotonic analysis is chosen as the framework for achieving real-time performance. The ADSD algorithm is converted into a set of independent periodic tasks running at each node, and a systematic procedure is used to assign priorities and deadlines to minimize the hard deadline of the diagnosis function. The resulting algorithm, Real-Time Adaptive Distributed System-Level Diagnosis (RT-ADSD), is fully compatible with a real-time environment, where both the processors and the network support fixed-priority scheduling. The RT-ADSD algorithm provides a useful first step in adding fault-tolerance to distributed real-time systems by quickly and reliably diagnosis node failures. The key results presented here include a framework for specifying real-time distributed algorithms and a scheduling model for analyzing them that accounts for many requirements of distributed systems, including network I/O, task jitter, and critical sections caused by shared resources.

## Keywords

Idle Time Critical Section Schedule Model Diagnosis Latency Priority Task## Preview

Unable to display preview. Download preview PDF.

## References

- [1]Bianchini, R. P., and Buskens, R. “An Adaptive Distributed System-Level Diagnosis Algorithm and its Implementation.”
*Proceedings of the IEEE 23rd International Symposium on Fault-Tolerant Computing*, June 1991, pp. 222–229.Google Scholar - [2]Ezhilchelvan, P. D. and de Lemos, R. “A Robust Group Membership Algorithm for Distributed Real-Time Systems.”
*Proceedings of IEEE Real-Time Systems Symposium*, December 1990, pp. 173–179.Google Scholar - [3]Liu, C. L., and Layland, J. W. “Scheduling Algorithms for Multi-Programming in a Hard Real-Time Environment.”
*Journal of the Association for Computing Machinery*, 20(1), January 1973, pp. 46–61.MATHMathSciNetGoogle Scholar - [4]Lehoczky, J. P., Sha, L. and Ding, Y. “The Rate-Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior.”
*Proceedings of IEEE Real-Time Systems Symposium*, 1989, pp. 166–171.Google Scholar - [5]Lehoczky, J. P. “Fixed Priority Scheduling of Periodic Task Sets with Arbitrary Deadlines.”
*Proceedings of IEEE Real-Time System Symposium*, 1990, pp. 201–209.Google Scholar - [6]Harbour, M. G., Klein, M. H. and Lehoczky, J. P. “Fixed Priority Scheduling of Periodic Tasks with Varying Execution Priority.”
*Proceedings of IEEE Real-Time Systems Symposium*, 1991.Google Scholar - [7]Preparata, F. P., Metze, G. and Chien, R. T. “On the connection Assignment Problem of Diagnosable Systems.”
*IEEE Transactions on Electronic Computing*, EC-16(12), December 1967, pp. 848–854.CrossRefGoogle Scholar - [8]Hakimi, S. L., and Amin, A. T. “Characterization of Connection Assignment of Diagnosable Systems.”
*IEEE Transactions on Computers*, C-23(1), January 1974, pp. 86–88.CrossRefMathSciNetGoogle Scholar - [9]Dahbura, A.T. “System-Level Diagnosis: A Perspective for the Third Decade.”
*Concurrent Computation: Algorithms, Architectures, Technologies*, Plenum Publishing Corp., 1988, pp. 411–434.Google Scholar - [10]Hakimi, S. L. and Schmeichel, E. F. “An Adaptive Algorithm for System Level Diagnosis.”
*Journal of Algorithms*, 5, June 1984, pp. 526–530.Google Scholar - [11]Hosseini, S. H., Kuhl, J. G., and Reddy, S. M. “A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair.”
*IEEE Transactions on Computers*, C-33(3), March 1984, pp. 223–233.CrossRefGoogle Scholar - [12]Bondy, A. and Murty, U. S. R.
*Graph Theory and Applications*. Elsevier North Holland, Inc., New York, N.Y., 1976.Google Scholar - [13]Sha, L., Rajkumar, R., and Lehoczky, J. P. “Priority Inheritance Protocols: An Approach to Real-Time Synchronization.”
*IEEE Transactions on Computers*, September 1990.Google Scholar - [14]Sprunt, B., Sha, L., and Lehoczky, J. P. “Aperiodic Task Scheduling for Hard Real-Time Systems.”
*The Journal of Real-Time Systems*, 1, 1989, pp. 27–60.CrossRefGoogle Scholar - [15]Klein, M. H. et al.
*A Practitioner’s Handbook for Real-Time Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems*. Kluwer Academic Publishers, Norwell MA, 1993.Google Scholar - [16]“Real-Time Communications Network Operating System. RTCN-OS Users’s Guide.” XXXX-PX2-RTCN edition, IBM Systems Integration Division, Manassas, VA, 1989.Google Scholar
- [17]Sha, L., and Goodenough, J. B. “Real-Time Scheduling Theory and Ada.”
*IEEE Computer*, 23(4), April 1990, pp. 53–62.Google Scholar - [18]Leung, J. and Whitehead, J. “On Complexity of Fixed-Priority Scheduling of Periodic Real-Time Tasks.”
*Performance Evaluation*, 2, 1982, pp. 237–250.CrossRefMathSciNetGoogle Scholar - [19]Klein, M. H., and Ralya, T. “An Analysis of Input/Output Paradigms for Real-Time Systems.” Tech. Report CMU/SEI-90-TR-19, Software Engineering Institute, July 1990.Google Scholar
- [20]Rajkumar, R., Sha, L., and Lehoczky, J. P. “Real-Time Synchronization Protocols for Multiprocessors.”
*Proceedings of IEEE Real-Time Systems Symposium*, December 1988, pp. 259–269.Google Scholar - [21]Golestani, S. J. “Congestion-Free Transmission of Real-Time Traffic in Packet Networks.”
*Proceedings IEEE Infocom’ 90*, June 1990, pp. 527–536.Google Scholar - [22]F. Cristian. “Understanding Fault-Tolerant Distributed Systems.”
*Communications of the ACM*, 34(2), February 1991.Google Scholar - [23]Smith, W. E. “Various Optimizers for Single Stage Production.”
*Naval Research Logistics Quarterly*, 3, 1956, pp. 59–66.CrossRefMathSciNetGoogle Scholar