Abstract
This paper addresses the issue of correlated failures and their impact on system dependability. Measurements are made on a VAXcluster system and validated analytical models are proposed to calculate availability and reliability for simple systems with correlated failures. A correlation analysis of the VAXcluster data shows that errors are highly correlated across machines (average correlation coefficient ρ = 0.62) due to sharing of resources. The measured failure correlation coefficient, however, is not high (0.15). Based on the VAXcluster data, it is shown that models that ignore correlated failures can underestimate unavailability by orders of magnitude. Even a small correlation significantly affects system unavailability. A validated analytical model, to calculate unavailability of 1-out-of-2 systems with correlated failures, is derived. Similarly, reliability is also significantly influenced by correlated failures. The joint failure rate of the two components, λ f , is found to be the key parameter for estimating reliability of 1-out-of-2 systems with correlated failures. A validated relationship between ρ and λ f , is also derived.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Butner and R. Iyer, “A statistical study of reliability and system load at SLAC,” in FTCS-10, pp. 207-209, IEEE, October 1980.
X. Castillo and D. Siewiorek, “Workload, performance, and reliability of digital computer systems,” in FTCS-11, pp. 84-89, IEEE, June 1981.
X. Castillo and D. Siewiorek, “A workload dependent software reliability prediction model,” in FTCS-12, pp. 279-286, IEEE, June 1982.
R. Iyer and D. Rossetti, “A statistical load dependency model for cpu errors at SLAC,” in FTCS-12, pp. 363-372, IEEE, June 1982.
J. Dunkel, “On the modeling of workload-dependent memory faults,” in FTCS-20, pp. 348-355, IEEE, June 1990.
J. Meyer and L. Wei, “Analysis of workload influence on dependability,” in FTCS-18, pp. 84-89, IEEE, June 1988.
D. Tang, R. Iyer, and S. Subramani, “Failure analysis and modeling of a vaxcluster system,” in FTCS-20, pp. 244-251, IEEE, June 1990.
J. Dugan, “Correlated hardware failures in redundant systems,” in Preprints of the 2nd IFIP Working Conference on Dependable Computing for Critical Applications, February 1991.
C. Krishan and A. Singh, “Modeling correlated transient failures in fault-tolerant systems,” in FTCS-19, pp. 374-381, IEEE, June 1989.
A. Goyal et al., “The system availability estimator,” in FTCS-16, pp. 84-89, IEEE, June 1986.
N. Kronenberg, H. Levy, and W. Strecker, “Vaxcluster: A closely-coupled distributed system,” ACM Transactions on Computer Systems, vol. 4, pp. 130–146, May 1986.
K. Trivedi, Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.
E.E. Balkovich et al., “Vaxcluster availability modeling,” Digital Technical Journal, pp. 69-79, September 1987.
O. Ibe, R. Howe, and K. Trivedi, “Approximate availability analysis of vaxcluster systems,” IEEE Transactions on Reliability, vol. 38, pp. 146–152, April 1989.
D. Heimann, N. Mittal, and K. Trivedi, “Availability and reliability modeling for computer systems,” Advances in Computers, vol. 31, pp. 175–233, 1990.
VAXcluster System Handbook, April 1986.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer-Verlag/Wien
About this paper
Cite this paper
Tang, D., Iyer, R.K. (1992). Impact of Correlated Failures on Dependability in a VAXcluster System. In: Meyer, J.F., Schlichting, R.D. (eds) Dependable Computing for Critical Applications 2. Dependable Computing and Fault-Tolerant Systems, vol 6. Springer, Vienna. https://doi.org/10.1007/978-3-7091-9198-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-7091-9198-9_9
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-9200-9
Online ISBN: 978-3-7091-9198-9
eBook Packages: Springer Book Archive