Implementable Failure Detectors in Asynchronous Systems

  • Vijay K. Garg
  • J. Roger Mitchell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1530)


The failure detectors discussed in the literature are either impossible to implement in an asynchronous system, or their exact guarantees have not been discussed. We introduce an infinitely often accurate failure detector which can be implemented in an asynchronous system. We provide one such implementation and show its application to the fault-tolerant server maintenance problem. We also show that some natural timeout based failure detectors implemented on Unix are not sufficient to guarantee infinitely often accuracy.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. CHT92.
    Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. In: Proc. of the 11th ACM Symposium on Principles of Distributed Computing, August 1992, pp. 147–158 (1992)Google Scholar
  2. CT96.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. JACM 43(2), 225–267 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  3. DLP+86.
    Dolev, D., Lynch, N.A., Pinter, S.S., Stark, E.W., Weihl, W.E.: Reaching approximate agreement in the presence of faults. JACM 33(3), 499–516 (1986)CrossRefMathSciNetGoogle Scholar
  4. DLS88.
    Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35(2), 288–323 (1988)CrossRefMathSciNetGoogle Scholar
  5. FLP85.
    Fischer, M.J., Lynch, N., Paterson, M.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2) (April 1985)Google Scholar
  6. GM98.
    Garg, V.K., Mitchell, J.R.: Detection of global predicates in a faulty environment. In: Proc. of the IEEE International Conference on Distributed Computing Systems, Amsterdam, May 1998, pp. 416–423 (1998)Google Scholar
  7. HK95.
    Huang, Y., Kintala, C.: Software fault tolerance in the application layer. In: Lyu, M. (ed.) Software Fault Tolerance. Trends in Software, pp. 249–278. Wiley, Chichester (1995)Google Scholar
  8. Lam78.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7), 558–565 (1978)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Vijay K. Garg
    • 1
  • J. Roger Mitchell
    • 1
  1. 1.Electrical and Computer Engineering DepartmentThe University of TexasAustinUSA

Personalised recommendations