Impact of Correlated Failures on Dependability in a VAXcluster System

  • Dong Tang
  • Ravishankar K. Iyer
Conference paper
Part of the Dependable Computing and Fault-Tolerant Systems book series (DEPENDABLECOMP, volume 6)


This paper addresses the issue of correlated failures and their impact on system dependability. Measurements are made on a VAXcluster system and validated analytical models are proposed to calculate availability and reliability for simple systems with correlated failures. A correlation analysis of the VAXcluster data shows that errors are highly correlated across machines (average correlation coefficient ρ = 0.62) due to sharing of resources. The measured failure correlation coefficient, however, is not high (0.15). Based on the VAXcluster data, it is shown that models that ignore correlated failures can underestimate unavailability by orders of magnitude. Even a small correlation significantly affects system unavailability. A validated analytical model, to calculate unavailability of 1-out-of-2 systems with correlated failures, is derived. Similarly, reliability is also significantly influenced by correlated failures. The joint failure rate of the two components, λ f , is found to be the key parameter for estimating reliability of 1-out-of-2 systems with correlated failures. A validated relationship between ρ and λ f , is also derived.


Failure State System Availability Independent Model Average Relative Error Network Error 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S. Butner and R. Iyer, “A statistical study of reliability and system load at SLAC,” in FTCS-10, pp. 207-209, IEEE, October 1980.Google Scholar
  2. [2]
    X. Castillo and D. Siewiorek, “Workload, performance, and reliability of digital computer systems,” in FTCS-11, pp. 84-89, IEEE, June 1981.Google Scholar
  3. [3]
    X. Castillo and D. Siewiorek, “A workload dependent software reliability prediction model,” in FTCS-12, pp. 279-286, IEEE, June 1982.Google Scholar
  4. [4]
    R. Iyer and D. Rossetti, “A statistical load dependency model for cpu errors at SLAC,” in FTCS-12, pp. 363-372, IEEE, June 1982.Google Scholar
  5. [5]
    J. Dunkel, “On the modeling of workload-dependent memory faults,” in FTCS-20, pp. 348-355, IEEE, June 1990.Google Scholar
  6. [6]
    J. Meyer and L. Wei, “Analysis of workload influence on dependability,” in FTCS-18, pp. 84-89, IEEE, June 1988.Google Scholar
  7. [7]
    D. Tang, R. Iyer, and S. Subramani, “Failure analysis and modeling of a vaxcluster system,” in FTCS-20, pp. 244-251, IEEE, June 1990.Google Scholar
  8. [8]
    J. Dugan, “Correlated hardware failures in redundant systems,” in Preprints of the 2nd IFIP Working Conference on Dependable Computing for Critical Applications, February 1991.Google Scholar
  9. [9]
    C. Krishan and A. Singh, “Modeling correlated transient failures in fault-tolerant systems,” in FTCS-19, pp. 374-381, IEEE, June 1989.Google Scholar
  10. [10]
    A. Goyal et al., “The system availability estimator,” in FTCS-16, pp. 84-89, IEEE, June 1986.Google Scholar
  11. [11]
    N. Kronenberg, H. Levy, and W. Strecker, “Vaxcluster: A closely-coupled distributed system,” ACM Transactions on Computer Systems, vol. 4, pp. 130–146, May 1986.CrossRefGoogle Scholar
  12. [12]
    K. Trivedi, Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.Google Scholar
  13. [13]
    E.E. Balkovich et al., “Vaxcluster availability modeling,” Digital Technical Journal, pp. 69-79, September 1987.Google Scholar
  14. [14]
    O. Ibe, R. Howe, and K. Trivedi, “Approximate availability analysis of vaxcluster systems,” IEEE Transactions on Reliability, vol. 38, pp. 146–152, April 1989.CrossRefGoogle Scholar
  15. [15]
    D. Heimann, N. Mittal, and K. Trivedi, “Availability and reliability modeling for computer systems,” Advances in Computers, vol. 31, pp. 175–233, 1990.MathSciNetCrossRefGoogle Scholar
  16. [16]
    VAXcluster System Handbook, April 1986.Google Scholar

Copyright information

© Springer-Verlag/Wien 1992

Authors and Affiliations

  • Dong Tang
    • 1
  • Ravishankar K. Iyer
    • 1
  1. 1.Center for Reliable and High-Performance Computing Coordinated Science LaboratoryUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations