Skip to main content

Impact of Correlated Failures on Dependability in a VAXcluster System

  • Conference paper
Dependable Computing for Critical Applications 2

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 6))

Abstract

This paper addresses the issue of correlated failures and their impact on system dependability. Measurements are made on a VAXcluster system and validated analytical models are proposed to calculate availability and reliability for simple systems with correlated failures. A correlation analysis of the VAXcluster data shows that errors are highly correlated across machines (average correlation coefficient ρ = 0.62) due to sharing of resources. The measured failure correlation coefficient, however, is not high (0.15). Based on the VAXcluster data, it is shown that models that ignore correlated failures can underestimate unavailability by orders of magnitude. Even a small correlation significantly affects system unavailability. A validated analytical model, to calculate unavailability of 1-out-of-2 systems with correlated failures, is derived. Similarly, reliability is also significantly influenced by correlated failures. The joint failure rate of the two components, λ f , is found to be the key parameter for estimating reliability of 1-out-of-2 systems with correlated failures. A validated relationship between ρ and λ f , is also derived.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Butner and R. Iyer, “A statistical study of reliability and system load at SLAC,” in FTCS-10, pp. 207-209, IEEE, October 1980.

    Google Scholar 

  2. X. Castillo and D. Siewiorek, “Workload, performance, and reliability of digital computer systems,” in FTCS-11, pp. 84-89, IEEE, June 1981.

    Google Scholar 

  3. X. Castillo and D. Siewiorek, “A workload dependent software reliability prediction model,” in FTCS-12, pp. 279-286, IEEE, June 1982.

    Google Scholar 

  4. R. Iyer and D. Rossetti, “A statistical load dependency model for cpu errors at SLAC,” in FTCS-12, pp. 363-372, IEEE, June 1982.

    Google Scholar 

  5. J. Dunkel, “On the modeling of workload-dependent memory faults,” in FTCS-20, pp. 348-355, IEEE, June 1990.

    Google Scholar 

  6. J. Meyer and L. Wei, “Analysis of workload influence on dependability,” in FTCS-18, pp. 84-89, IEEE, June 1988.

    Google Scholar 

  7. D. Tang, R. Iyer, and S. Subramani, “Failure analysis and modeling of a vaxcluster system,” in FTCS-20, pp. 244-251, IEEE, June 1990.

    Google Scholar 

  8. J. Dugan, “Correlated hardware failures in redundant systems,” in Preprints of the 2nd IFIP Working Conference on Dependable Computing for Critical Applications, February 1991.

    Google Scholar 

  9. C. Krishan and A. Singh, “Modeling correlated transient failures in fault-tolerant systems,” in FTCS-19, pp. 374-381, IEEE, June 1989.

    Google Scholar 

  10. A. Goyal et al., “The system availability estimator,” in FTCS-16, pp. 84-89, IEEE, June 1986.

    Google Scholar 

  11. N. Kronenberg, H. Levy, and W. Strecker, “Vaxcluster: A closely-coupled distributed system,” ACM Transactions on Computer Systems, vol. 4, pp. 130–146, May 1986.

    Article  Google Scholar 

  12. K. Trivedi, Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.

    Google Scholar 

  13. E.E. Balkovich et al., “Vaxcluster availability modeling,” Digital Technical Journal, pp. 69-79, September 1987.

    Google Scholar 

  14. O. Ibe, R. Howe, and K. Trivedi, “Approximate availability analysis of vaxcluster systems,” IEEE Transactions on Reliability, vol. 38, pp. 146–152, April 1989.

    Article  Google Scholar 

  15. D. Heimann, N. Mittal, and K. Trivedi, “Availability and reliability modeling for computer systems,” Advances in Computers, vol. 31, pp. 175–233, 1990.

    Article  MathSciNet  Google Scholar 

  16. VAXcluster System Handbook, April 1986.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag/Wien

About this paper

Cite this paper

Tang, D., Iyer, R.K. (1992). Impact of Correlated Failures on Dependability in a VAXcluster System. In: Meyer, J.F., Schlichting, R.D. (eds) Dependable Computing for Critical Applications 2. Dependable Computing and Fault-Tolerant Systems, vol 6. Springer, Vienna. https://doi.org/10.1007/978-3-7091-9198-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-9198-9_9

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-9200-9

  • Online ISBN: 978-3-7091-9198-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics