Failure detectors for crash faults in cloud

  • Bharati SinhaEmail author
  • Awadhesh Kumar Singh
  • Poonam Saini
Original Research


Failure detector (FD) is an inherent component in atomic broadcast and consensus protocols. Failures are broadly categorized into two types: crash and byzantine. The crash failures simply discontinue the working of a system whereas byzantine reflects the malicious behavior while ongoing communication. The problem to detect a failure becomes more challenging in a dynamic asynchronous environment like cloud computing. The paper proposes a failure detector in order to handle crash faults in cloud while addressing scalability. We introduce BCMP networks in order to compute performance parameters of the proposed algorithm, thereby, detecting failures in an accurate manner. Although, failure detection schemes have a tradeoff between efficiency and latency, the proposed algorithm achieves optimal balance between both metrics.


Failure detectors Cloud computing Crash faults 



  1. Aguilera MK, Toueg S, Deianov B (1999) Revisiting the weakest failure detector for uniform reliable broadcast. In: Proceedings of the 13th international symposium on distributed computing, pp 19–33. Springer, Heidelberg. Google Scholar
  2. Allen AO (1990) Probability, statistics, and queuing theory with computer science application. Academic Press, Inc., Boston. Accessed 30 Sept 2017Google Scholar
  3. Baskett F, Chandy K, Muntz R, Palacios F (1975) Open, closed,and mixed networks of queues with different classes of customers. J ACM. CrossRefzbMATHGoogle Scholar
  4. Benkaouha H, Abdelli A, Ben-Othman J, Mokdad L (2016) Towards an efficient failure detection in MANETs. Wirel Commun Mobile Comput. CrossRefGoogle Scholar
  5. Chandra TD, Toueg S (1996) Unreliable failure detectors for reliable distributed systems. J ACM. MathSciNetCrossRefzbMATHGoogle Scholar
  6. Chandra TD, Hadzilacos V, Toueg S (1996) The weakest failure detector for solving consensus. J ACM. MathSciNetCrossRefzbMATHGoogle Scholar
  7. Chen W, Toueg S, Aguilera MK (2002) On the quality of service of failure detectors. IEEE Trans Comput. MathSciNetCrossRefzbMATHGoogle Scholar
  8. Cristian F (1991) Understanding fault-tolerant distributed systems. Commun ACM. CrossRefGoogle Scholar
  9. Delporte-Gallet C, Fauconnnier H, Guerraoui R (2002) A realistic look at failure detectors. In: Proceedings of the international conference on in dependable systems and networks. IEEE, pp 345–353.
  10. Dwork C, Lynch N, Stockmeyer L (1988) Consensus in the presence of partial synchrony. J ACM. CrossRefGoogle Scholar
  11. Elhadef M, Boukerche A (2007) A failure detection service for large-scale dependable wireless ad-hoc and sensor networks. In: The second international conference in availability, reliability and security. IEEE, pp 182–189.
  12. Fischer MJ, Lynch NA, Paterson MS (1985) Impossibility of distributed consensus with one faulty process. J ACM. CrossRefzbMATHGoogle Scholar
  13. Gupta I, Chandra TD, Goldszmidt GS (2001) On scalable and efficient distributed failure detectors. In: Proceedings of the twentieth annual ACM symposium on principles of distributed computing. ACM, pp 170–179.
  14. Jin R, Wang B, Wei W, Zhang X, Chen X, Bar-Shalom Y, Willete P (2016) Detecting node failures in mobile wireless networks: a probabilistic approach. IEEE Trans Mob Comput. CrossRefGoogle Scholar
  15. Larrea M, Fernandez A, Arevalo S (2000) Optimal implementation of the weakest failure detector for solving consensus. In: Proceedings the 19th IEEE symposium on reliable distributed systems. IEEE, pp 52–59.
  16. Larrea M, Fernandez A, Arevalo S (2002) Eventually consistent failure detectors. In: Proceedings 10th Euromicro workshop on parallel distributed and network-based processing. IEEE, pp 91–98.
  17. Lazowska E, Zahorjan J, Graham G, Sevick K (1984) Quantitative system performance: computer system analysis using queueing network models. Prentice-Hall, Englewood Cliffs. Accessed 2 Nov 2017
  18. Liu D (2015) A fault-tolerant architecture for ROIA in cloud. J Ambient Intell Humaniz Comput. CrossRefGoogle Scholar
  19. Liu J, Wu Z, Wu J, Dong J, Zhao Y, Wen D (2017) A Weibull distribution accrual failure detector for cloud computing. PLoS One. CrossRefGoogle Scholar
  20. Ma T, Hillston J, Anderson S (2010) On the quality of service of crash-recovery failure detectors. IEEE Trans Depend Secure Comput. CrossRefGoogle Scholar
  21. Piuri V (1994) Design of fault-tolerant distributed control systems. IEEE Trans Instrum Meas. CrossRefGoogle Scholar
  22. Schneider FB (1990) Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput Surv. CrossRefGoogle Scholar
  23. Silva FM, Oliveira RL, Monteiro CC, Inacio PR, Freire M (2017) CloudSim Plus: a cloud computing simulation framework pursuing software engineering principles for improved modularity, extensibility and correctness. In: International symposium on integrated network management. IEEE, pp 400–407.
  24. Turchetti RC, Duarte EP, Arantes L, Sens P (2016) A QoS-configurable failure detection service for internet applications. J Internet Serv Appl. CrossRefGoogle Scholar
  25. Wang H, Wang YJ (2018) Maximizing reliability and performance with reliability-driven task scheduling in heterogeneous distributed computing systems. J Ambient Intell Humaniz Comput. CrossRefGoogle Scholar
  26. Wang F, Jin H, Zou D, Qiang W (2014) FDKeeper: a quick and open failure detector for cloud computing system. In: International conference on computer science and software engineering. ACM, pp 1–8.
  27. Xiong N, Vasilakos AV, Wu J, Yang YR, Rindos A, Zhou Y, Pan Y (2012) A self-tuning failure detection scheme for cloud computing service. In: Parallel and distributed processing symposium (IPDPS). IEEE, pp 668–679.
  28. Yi G, Heo YA, Byun H, Jeong SY (2018) MRM: mobile resource management scheme on mobile cloud computing. J Ambient Intell Humaniz Comput. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Bharati Sinha
    • 1
    Email author
  • Awadhesh Kumar Singh
    • 1
  • Poonam Saini
    • 2
  1. 1.National Institute of TechnologyKurukshetraIndia
  2. 2.Punjab Engineering CollegeChandigarhIndia

Personalised recommendations