Skip to main content

Measurement-Based Dependability Evaluation of Operational Computer Systems

  • Chapter

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 283))

Abstract

This paper discusses methodologies and advances in measurement-based dependability evaluation of operational computer systems. Research work over the past 15 years in this area is briefly reviewed. Methodologies are illustrated through discussion of authors’ representative studies. Specifically, measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software dependability, and fault diagnosis are addressed. The discussion covers methods used in the area and several important issues previously studied, including workload/failure dependency, correlated failures, and software fault tolerance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B.E. Aupperle, J.F. Meyer and L. Wei, “Evaluation of Fault-Tolerant Systems with Nonhomogeneous Workloads,” Proc. 19th Int. Symp. Fault-Tolerant Computing, pp. 159–166, June 1989.

    Google Scholar 

  2. A. Avizienis and J.P.J. Kelly, “Fault Tolerance by Design Diversity: Concepts and Experiments,” IEEE Computer, pp. 67–80, Aug. 1984.

    Google Scholar 

  3. J.F. Bartlett, “A ‘Nonstop’ Operating System,” Proc. Int. Hawaii Conf. System Science, pp. 103–117, 1978.

    Google Scholar 

  4. P.G. Bishop and F.D. Pullen, “PODS Revisited — A Study of Software Failure Behavior,” Proc. 18th Int. Symp. Fault-Tolerant Computing, pp. 2–8, 1988.

    Google Scholar 

  5. S.E. Butner and R.K. Iyer, “A Statistical Study of Reliability and System Load at SLAC,” Proc. 10th Int. Symp. Fault-Tolerant Computing, pp. 207–209, Oct. 1980.

    Google Scholar 

  6. X. Castillo and D.P. Siewiorek, “Workload, Performance, and Reliability of Digital Computer Systems,” Proc. 11th Int. Symp. Fault-Tolerant Computing, pp. 84–89, July 1981.

    Google Scholar 

  7. X. Castillo and D.P. Siewiorek, “A Workload Dependent Software Reliability Prediction Model,” Proc. 12th Int. Symp. Fault-Tolerant Computing, pp. 279–286, June 1982.

    Google Scholar 

  8. W.R. Dillon and M. Goldstein, Multivariate Analysis, John Wiley & Sons, 1984.

    Google Scholar 

  9. J.B. Dugan, “Correlated Hardware Failures in Redundant Systems,” Proc. 2nd IFIP Working Conf. Dependable Computing for Critical Applications, Tucson, Arizona, Feb. 1991.

    Google Scholar 

  10. J. Dunkel, “On the Modeling of Workload-Dependent Memory Faults,” Proc. 20th Int. Symp. Fault-Tolerant Computing, pp. 348–355, June 1990.

    Google Scholar 

  11. A.L. Goel, “Software Reliability Models: Assumptions, Limitations, and Applicability,” IEEE Trans. Software Engineering, Vol SE-11, No. 12, pp. 1411–1423, Dec. 1985.

    Article  Google Scholar 

  12. A. Goyal, S.S. Lavenberg and K.S. Trivedi, “Probabilistic Modeling of Computer System Availability,” Annals of Operations Research, No. 8, pp. 285–306, March 1987.

    Google Scholar 

  13. J. Gray, “A Census of Tandem System Availability Between 1985 and 1990,” IEEE Trans. Reliability, Vol. 39, No. 4, pp. 409–418, Oct. 1990.

    Article  Google Scholar 

  14. J.P. Hansen and D.P. Siewiorek, “Models for Time Coalescence in Event Logs,” Proc. 22nd Int. Symp. Fault-Tolerant Computing, pp. 221–227, July 1992.

    Google Scholar 

  15. D.I. Heimann, N. Mittal and K.S. Trivedi, “Availability and Reliability Modeling for Computer Systems,” Advances in Computers, Vol. 31, pp. 175–233, 1990.

    MathSciNet  Google Scholar 

  16. R.A. Howard, Dynamic Probabilistic Systems, John Wiley & Sons, Inc., New York, 1971.

    Google Scholar 

  17. M.C. Hsueh and R.K. Iyer, “A Measurement-Based Model of Software Reliability in a Production Environment,” Proc. 11th Annual Int. Computer Software & Applications Conf., pp. 354–360, Oct. 1987.

    Google Scholar 

  18. M.C. Hsueh, R.K. Iyer, and K.S. Trivedi, “Performability Modeling Based on Real Data: A Case Study,” IEEE Trans. Computers, Vol. 37, No.4, pp. 478–484, April 1988.

    Article  Google Scholar 

  19. R.K. Iyer and D.J. Rossetti, “A Statistical Load Dependency Model for CPU Errors at SLAC,” Proc. 12th Int. Symp. Fault-Tolerant Computing, pp. 363–372, June 1982.

    Google Scholar 

  20. R.K. Iyer, S.E. Butner, and E.J. McCluskey, “A Statistical Failure/Load Relationship: Results of a Multicomputer Study,” IEEE Trans. Computers, Vol. C-31, No. 7, pp. 697–705, July 1982.

    Google Scholar 

  21. R.K. Iyer and P. Velardi, “Hardware-Related Software Errors: Measurement and Analysis,” IEEE Trans. Software Engineering, Vol. SE-11, No. 2, pp. 223–231, Feb. 1985.

    Article  Google Scholar 

  22. R.K. Iyer and D.J. Rossetti, “Effect of System Workload on Operating System Reliability: A Study on IBM 3081,” IEEE Trans. Software Engineering, Vol. SE-11, No. 12, pp. 1438–1448, Dec. 1985.

    Article  Google Scholar 

  23. R.K. Iyer, D.J. Rossetti and M.C. Hsueh, “Measurement and Modeling of Computer Reliability as Affected by System Activity,” ACM Trans. Computer Systems, Vol. 4, No. 3, pp. 214–237, Aug. 1986.

    Article  Google Scholar 

  24. R.K. Iyer, L.T. Young, and P.V.K. Iyer, “Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data,” IEEE Trans. Computers, Vol. 39, No. 4, pp. 525–537, April 1990.

    Article  Google Scholar 

  25. I. Lee, R.K. Iyer and D. Tang, “Error/Failure Analysis Using Event Logs from Fault Tolerant Systems,” Proc. 21st Int. Symp. Fault-Tolerant Computing, pp. 10–17, June 1991.

    Google Scholar 

  26. I. Lee and R.K. Iyer, “Analysis of Software Halts in Tandem System,” Proc. 3rd Int. Symp. Software Reliability Engineering, pp. 227–236, Oct. 1992.

    Google Scholar 

  27. I. Lee, D. Tang, R.K. Iyer, and M.C. Hsueh, “Measurement-Based Evaluation of Operating System Fault Tolerance,” IEEE Transactions on Reliability, pp. 238–249, June 1993.

    Google Scholar 

  28. I. Lee and R.K. Iyer, “Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN90 Operating System,” Proc. 23rd Int. Symp. Fault-Tolerant Computing, June 1993.

    Google Scholar 

  29. T.T. Lin and D.P. Siewiorek, “Error Log Analysis: Statistical Modeling and Heuristic Trend Analysis,” IEEE Trans. Reliability, Vol. 39, No. 4, pp. 419–432, Oct. 1990.

    Article  Google Scholar 

  30. B. Littlewood, “Theories of Software Reliability: How Good Are They and How Can They Be Improved?” IEEE Trans. Software Engineering, Vol. SE-6, No. 5, pp. 489–500, Sept. 1980.

    Article  Google Scholar 

  31. R.A. Maxion, “Anomaly Detection for Diagnosis,” Proc. 20th Int. Symp. Fault-Tolerant Computing, pp. 20–27, June 1990.

    Google Scholar 

  32. R.A. Maxion and F.E. Feather, “A Case Study of Ethernet Anomalies in a Distributed Computing Environment,” IEEE Trans. Reliability, Vol. 39, No. 4, pp. 433–443, Oct. 1990.

    Article  Google Scholar 

  33. R.A. Maxion and R.T. Olszewski, “Detection and Discrimination of Injected Network Faults,” Proc. 23rd Int. Symp. Fault-Tolerant Computing, pp. 198–207, June 1993.

    Google Scholar 

  34. S.R. McConnel, D.P. Siewiorek, and M.M. Tsao, “The Measurement and Analysis of Transient Errors in Digital Compute Systems,” Proc. 9th Int. Symp. Fault-Tolerant Computing, pp. 67–70, 1979.

    Google Scholar 

  35. J.F. Meyer, “On Evaluating the Performability of Degradable Computing Systems,” IEEE Trans. Computers, Vol. C-29, No. 8, pp. 720–731, Aug. 1980.

    Google Scholar 

  36. J.F. Meyer and L. Wei, “Analysis of Workload Influence on Dependability,” Proc. 18th Int. Symp. Fault-Tolerant Computing, pp. 84–89, June 1988.

    Google Scholar 

  37. J.F. Meyer, “Performability: A Retrospective and Some Pointers to the Future,” Performance Evaluation, Vol. 14, pp. 139–156, Feb. 1992.

    Google Scholar 

  38. S. Mourad and D. Andrews, “On the Reliability of the IBM MVS/XA Operating System,” IEEE Trans. Software Engineering, Vol. SE-13, No. 10, pp. 1135–1139, Oct. 1987.

    Article  Google Scholar 

  39. J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction, Application, McGraw-Hill Book Company, 1987.

    Google Scholar 

  40. B. Randell, “System Structure for Software Fault Tolerance,” IEEE Trans. Software Engineering, Vol. SE-1, No. 2, June 1975.

    Google Scholar 

  41. A. Reibman, R. Smith, and K. Trivedi, “Markov and Markov Reward Model Transient Analysis: An Overview of Numerical Approaches,” European Journal of Operational Research, Vol. 40, pp. 257–267, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  42. S.M. Ross, Introduction to Probability Models, 3rd Edition, Academic Press, Inc., 1985.

    Google Scholar 

  43. R.A. Sahner and K.S. Trivedi, “Reliability Modeling Using SHARPE,” IEEE Trans. Reliability, Vol. R-36, No. 2, pp. 186–193, June 1987.

    Article  Google Scholar 

  44. D.P. Siewiorek, V. Kini, H. Mashburn, S.R. McConnel, and M. Tsao, “A Case Study of C.mmp, Cm*, and C.vmp: Part I — Experience with Fault Tolerance in Multiprocessor Systems,” Proc. of the IEEE, Vol. 66, No. 10, pp. 1178–1199, Oct. 1978.

    Google Scholar 

  45. D.P. Siewiorek and R.W. Swarz, Reliable Computer Systems: Design and Evaluation, Digital Press, Bedford, Mass., 1992.

    Google Scholar 

  46. M.S. Sullivan and R. Chillarege, “Software Defects and Their Impact on System Availability — A Study of Field Failures in Operating Systems,” Proc. 21st Int. Symp. Fault-Tolerant Computing, pp. 2–9, June 1991.

    Google Scholar 

  47. M.S. Sullivan and R. Chillarege, “A Comparison of Software Defects in Database Management Systems and Operating Systems,” Proc. 22nd Int. Symp. Fault-Tolerant Computing, pp. 475–484, July 1992.

    Google Scholar 

  48. D. Tang, R.K. Iyer and Sujatha Subramani, “Failure Analysis and Modeling of a VAXcluster System,” Proc. 20th Int. Symp. Fault-Tolerant Computing, pp. 244–251, June 1990.

    Google Scholar 

  49. D. Tang and R. K. Iyer, “Impact of Correlated Failures on Dependability in a VAXcluster System,” Proc. 2nd IFIP Working Conf. Dependable Computing for Critical Applications, Tucson, Arizona, Feb. 1991.

    Google Scholar 

  50. D. Tang and R.K. Iyer, “Analysis and Modeling of Correlated Failures in Multicomputer Systems,” IEEE Trans. Computers, Vol. 41, No. 5, pp. 567–577, May 1992.

    Article  Google Scholar 

  51. D. Tang and R.K. Iyer, “Analysis of the VAX/VMS Error Logs in Multicomputer Environments — A Case Study of Software Dependability,” Proc. Third Int. Symp. Software Reliability Engineering, Research Triangle Park, North Carolina, pp. 216–226, Oct. 1992.

    Google Scholar 

  52. D. Tang and R.K. Iyer, “Dependability Measurement and Modeling of a Multicomputer Systems,” IEEE Trans. Computers, Vol. 42, No. 1, pp. 62–75, Jan. 1993.

    Article  Google Scholar 

  53. D. Tang and R.K. Iyer, “MEASURE+ — A Measurement-Based Dependability Analysis Package,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, Santa Clara, California, pp. 110–121, May 1993.

    Google Scholar 

  54. K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications, Prentice-Hall, Englewood Cliffs, NJ, 1982.

    Google Scholar 

  55. K.S. Trivedi, J.K. Muppala, S.P. Woolet, and B.R. Haverkort, “Composite Performance and Dependability Analysis,” Performance Evaluation, Vol. 14, pp. 197–215, Feb. 1992.

    Google Scholar 

  56. M.M. Tsao and D.P. Siewiorek, “Trend Analysis on System Error files,” Proc. 13th Int. Symp. Fault-Tolerant Computing, pp. 116–119, June 1983.

    Google Scholar 

  57. P. Velardi and R.K. Iyer, “A Study of Software Failures and Recovery in the MVS Operating System,” IEEE Trans. Computers, Vol. C-33, No. 6, pp. 564–568, June 1984.

    Google Scholar 

  58. A.S. Wein and A. Sathaye, “Validating Complex Computer System Availability Models,” IEEE Trans. Reliability, Vol. 39, No. 4, pp. 468–479, Oct. 1990.

    Article  Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Kluwer Academic Publishers

About this chapter

Cite this chapter

Iyer, R.K., Tang, D. (1994). Measurement-Based Dependability Evaluation of Operational Computer Systems. In: Foundations of Dependable Computing. The Springer International Series in Engineering and Computer Science, vol 283. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-27377-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-0-585-27377-8_7

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-9484-6

  • Online ISBN: 978-0-585-27377-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics