Adaptive System-Level Diagnosis in Real-Time

  • Mark E. Stahl
  • Ronald P. BianchiniJr.
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 284)


Distributed real-time systems are subject to stricter fault-tolerance requirements than non-real time systems. This work presents an application of system-level diagnosis to a real-time distributed system as a first step in providing fault-tolerance. An existing algorithm for distributed system-level diagnosis, Adaptive_DSD, is converted to a real-time framework, establishing a deadline for the end-to-end diagnosis latency. Rate monotonic analysis is chosen as the framework for achieving real-time performance. The ADSD algorithm is converted into a set of independent periodic tasks running at each node, and a systematic procedure is used to assign priorities and deadlines to minimize the hard deadline of the diagnosis function. The resulting algorithm, Real-Time Adaptive Distributed System-Level Diagnosis (RT-ADSD), is fully compatible with a real-time environment, where both the processors and the network support fixed-priority scheduling. The RT-ADSD algorithm provides a useful first step in adding fault-tolerance to distributed real-time systems by quickly and reliably diagnosis node failures. The key results presented here include a framework for specifying real-time distributed algorithms and a scheduling model for analyzing them that accounts for many requirements of distributed systems, including network I/O, task jitter, and critical sections caused by shared resources.


Idle Time Critical Section Schedule Model Diagnosis Latency Priority Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Bianchini, R. P., and Buskens, R. “An Adaptive Distributed System-Level Diagnosis Algorithm and its Implementation.” Proceedings of the IEEE 23rd International Symposium on Fault-Tolerant Computing, June 1991, pp. 222–229.Google Scholar
  2. [2]
    Ezhilchelvan, P. D. and de Lemos, R. “A Robust Group Membership Algorithm for Distributed Real-Time Systems.” Proceedings of IEEE Real-Time Systems Symposium, December 1990, pp. 173–179.Google Scholar
  3. [3]
    Liu, C. L., and Layland, J. W. “Scheduling Algorithms for Multi-Programming in a Hard Real-Time Environment.” Journal of the Association for Computing Machinery, 20(1), January 1973, pp. 46–61.MATHMathSciNetGoogle Scholar
  4. [4]
    Lehoczky, J. P., Sha, L. and Ding, Y. “The Rate-Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior.” Proceedings of IEEE Real-Time Systems Symposium, 1989, pp. 166–171.Google Scholar
  5. [5]
    Lehoczky, J. P. “Fixed Priority Scheduling of Periodic Task Sets with Arbitrary Deadlines.” Proceedings of IEEE Real-Time System Symposium, 1990, pp. 201–209.Google Scholar
  6. [6]
    Harbour, M. G., Klein, M. H. and Lehoczky, J. P. “Fixed Priority Scheduling of Periodic Tasks with Varying Execution Priority.” Proceedings of IEEE Real-Time Systems Symposium, 1991.Google Scholar
  7. [7]
    Preparata, F. P., Metze, G. and Chien, R. T. “On the connection Assignment Problem of Diagnosable Systems.” IEEE Transactions on Electronic Computing, EC-16(12), December 1967, pp. 848–854.CrossRefGoogle Scholar
  8. [8]
    Hakimi, S. L., and Amin, A. T. “Characterization of Connection Assignment of Diagnosable Systems.” IEEE Transactions on Computers, C-23(1), January 1974, pp. 86–88.CrossRefMathSciNetGoogle Scholar
  9. [9]
    Dahbura, A.T. “System-Level Diagnosis: A Perspective for the Third Decade.” Concurrent Computation: Algorithms, Architectures, Technologies, Plenum Publishing Corp., 1988, pp. 411–434.Google Scholar
  10. [10]
    Hakimi, S. L. and Schmeichel, E. F. “An Adaptive Algorithm for System Level Diagnosis.” Journal of Algorithms, 5, June 1984, pp. 526–530.Google Scholar
  11. [11]
    Hosseini, S. H., Kuhl, J. G., and Reddy, S. M. “A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair.” IEEE Transactions on Computers, C-33(3), March 1984, pp. 223–233.CrossRefGoogle Scholar
  12. [12]
    Bondy, A. and Murty, U. S. R. Graph Theory and Applications. Elsevier North Holland, Inc., New York, N.Y., 1976.Google Scholar
  13. [13]
    Sha, L., Rajkumar, R., and Lehoczky, J. P. “Priority Inheritance Protocols: An Approach to Real-Time Synchronization.” IEEE Transactions on Computers, September 1990.Google Scholar
  14. [14]
    Sprunt, B., Sha, L., and Lehoczky, J. P. “Aperiodic Task Scheduling for Hard Real-Time Systems.” The Journal of Real-Time Systems, 1, 1989, pp. 27–60.CrossRefGoogle Scholar
  15. [15]
    Klein, M. H. et al. A Practitioner’s Handbook for Real-Time Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems. Kluwer Academic Publishers, Norwell MA, 1993.Google Scholar
  16. [16]
    “Real-Time Communications Network Operating System. RTCN-OS Users’s Guide.” XXXX-PX2-RTCN edition, IBM Systems Integration Division, Manassas, VA, 1989.Google Scholar
  17. [17]
    Sha, L., and Goodenough, J. B. “Real-Time Scheduling Theory and Ada.” IEEE Computer, 23(4), April 1990, pp. 53–62.Google Scholar
  18. [18]
    Leung, J. and Whitehead, J. “On Complexity of Fixed-Priority Scheduling of Periodic Real-Time Tasks.” Performance Evaluation, 2, 1982, pp. 237–250.CrossRefMathSciNetGoogle Scholar
  19. [19]
    Klein, M. H., and Ralya, T. “An Analysis of Input/Output Paradigms for Real-Time Systems.” Tech. Report CMU/SEI-90-TR-19, Software Engineering Institute, July 1990.Google Scholar
  20. [20]
    Rajkumar, R., Sha, L., and Lehoczky, J. P. “Real-Time Synchronization Protocols for Multiprocessors.” Proceedings of IEEE Real-Time Systems Symposium, December 1988, pp. 259–269.Google Scholar
  21. [21]
    Golestani, S. J. “Congestion-Free Transmission of Real-Time Traffic in Packet Networks.” Proceedings IEEE Infocom’ 90, June 1990, pp. 527–536.Google Scholar
  22. [22]
    F. Cristian. “Understanding Fault-Tolerant Distributed Systems.” Communications of the ACM, 34(2), February 1991.Google Scholar
  23. [23]
    Smith, W. E. “Various Optimizers for Single Stage Production.” Naval Research Logistics Quarterly, 3, 1956, pp. 59–66.CrossRefMathSciNetGoogle Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Mark E. Stahl
    • 1
  • Ronald P. BianchiniJr.
    • 1
  1. 1.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburgh

Personalised recommendations