Advertisement

Experimental Evaluation of a Failure Detection Service Based on a Gossip Strategy

  • Leandro P. de Sousa
  • Elias P. DuarteJr.
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7017)

Abstract

Failure detectors were first proposed as an abstraction that makes it possible to solve consensus in asynchronous systems. A failure detector is a distributed oracle that provides information about the state of processes of a distributed system. This work presents a failure detection service based on a gossip strategy. The service was implemented on the JXTA platform. A simulator was also implemented so the detector could be evaluated for a larger number of processes. Experimental results show that increasing the frequency in which gossip messages are sent gives better results than increasing the fanout. Results are included for fault and recovery detection time and mistake rate of the detector.

Keywords

Failure Detectors P2P Probabilistic Dissemination 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51(1), 13–32 (2002)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Das, A., Gupta, I., Motivala, A.: Swim: scalable weakly-consistent infection-style process group membership protocol. In: Proc. International Conference on Dependable Systems and Networks DSN 2002, pp. 303–312 (June 23-26, 2002)Google Scholar
  4. 4.
    Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Gupta, I., Birman, K.P., van Renesse, R.: Fighting fire with fire: using randomized gossip to combat stochastic scalability limits. Quality and Reliability Engineering International 18(3), 165–184 (2002)CrossRefGoogle Scholar
  6. 6.
    Gupta, I., Chandra, T.D., Goldszmidt, G.S.: On scalable and efficient distributed failure detectors. In: PODC 2001: Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing, pp. 170–179. ACM, New York (2001)CrossRefGoogle Scholar
  7. 7.
    Jxta website, http://java.net/projects/jxta/ (last access in April 2011)
  8. 8.
    Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  9. 9.
    MacDougall, M.H.: Simulating Computer Systems, Techniques and Tools. The MIT Press, Cambridge (1997)Google Scholar
  10. 10.
    Raynal, M.: A short introduction to failure detectors for asynchronous distributed systems. SIGACT News 36(1), 53–70 (2005)CrossRefGoogle Scholar
  11. 11.
    Turek, J., Shasha, D.: The many faces of consensus in distributed systems. Computer 25(6), 8–17 (1992)CrossRefGoogle Scholar
  12. 12.
    van Renesse, R., Minsky, Y., Hayden, M.: A gossip-style failure detection service. Tech. rep., Cornell University, Ithaca, NY, USA (1998)Google Scholar
  13. 13.
    Wan, Y., Luo, Y., Liu, L., Feng, D.: A dynamic failure detector for p2p storage system. In: NISS (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Leandro P. de Sousa
    • 1
  • Elias P. DuarteJr.
    • 1
  1. 1.Dept. InformaticsFederal University of Parana (UFPR)CuritibaBrazil

Personalised recommendations