Abstract
Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity of software and the well-developed techniques and analysis for hardware reliability, this trend is not likely to change in the near future. In this paper, we first classify software faults and discuss various techniques to deal with them in the testing/debugging phase and the operational phase of the software. We discuss the phenomenon of software aging and a preventive maintenance technique to deal with this problem called software rejuvenation. Stochastic models to evaluate the effectiveness of preventive maintenance in operational software systems and to determine optimal times to perform rejuvenation for different scenarios are described. We also present measurement-based methodologies to detect software aging and estimate its effect on various system resources. These models are intended to help develop software rejuvenation policies. An automated online measurement-based approach has been used in the software rejuvenation agent implemented in a major commercial server.
Chapter PDF
Similar content being viewed by others
Keywords
References
E. Adams. Optimizing Preventive Service of the Software Products. IBM Journal of R&D, 28(1):2–14, January 1984.
P. E. Amman and J. C. Knight. Data Diversity: An Approach to Software FaultTolerance. In Proc. of 17th Int’l. Symposium on Fault Tolerant Computing, pages 122–126, June 1987.
A. Avritzer and E. J. Weyuker. Monitoring Smoothly Degrading Systems for Increased Dependability. Empirical Software Eng. Journal, Vol.2, No.1, pages 59–77, 1997.
A. Avizienis and L. Chen. On the Implementation of N-version Programming for Software Fault Tolerance During Execution. In Proc. IEEE COMPSAC 77, pages 149–155, November 1977.
A. Avizienis, J-C. Laprie and B. Randell. Fundamental Concepts of Dependability LAAS Technical Report No. 01-145, LAAS, France, April 2001.
Y. Bao, X. Sun and K. Trivedi. Adaptive Software Rejuvenation: Degradation Models and Rejuvenation Schemes. In Proc. of The Int’l. Conference on Dependable Systems and Networks, DSN-2003 June 2003.
L. Bernstein. Text of Seminar Delivered by Mr. Bernstein. University Learning Center, George Mason University, January 29, 1996.
A. Bobbio, A. Sereno and C. Anglano. Fine grained software degradation models for optimal rejuvenation policies. Performance Evaluation, Vol. 46, pp 45–62, 2001.
T. Boyd and P. Dasgupta Premptive Module Replacement Using the Virtualizing Operating SystemIn Proc. of the Workshop on self-healing, adaptive and self-managed systems, SHAMAN 2002, New York, NY, June 2002.
K. Cassidy, K. Gross and A. Malekpour. Advanced Pattern Recognition for Detection of Complex Software Aging in Online Transaction Processing Servers. In Proc. of DSN 2002, Washington D.C., June 2002.
V. Castelli, R. E. Harper, P. Heidelberger, S. W. Hunter, K. S. Trivedi, K. Vaidyanathan and W. Zeggert. Proactive Management of Software Aging. IBM Journal of Research & Development, Vol. 45, No. 2, March 2001.
R. Chillarege, S. Biyani, and J. Rosenthal. Measurement of Failure Rate in Widely Distributed Software. In Proc. of 25th IEEE Int’l pages 424–433, Pasadena, CA, July 1995.
T. Dohi, K. Goseva-Popstojanova and K. S. Trivedi. Analysis of Software Cost Models with Rejuvenation. In Proc. of the 5th IEEE International Symposium on High Assurance Systems Engineering, HASE 2000, Albuquerque, NM, Nov. 2000.
T. Dohi, K. Goseva-Popstojanova and K. S. Trivedi. Statistical Non-Parametric Algorithms to Estimate the Optimal Software Rejuvenation Schedule. Proc. of the 2000 Pacific Rim International Symposium on Dependable Computing, PRDC 2000, Los Angeles, CA, Dec. 2000.
C. Fetzer and K. Hostedt Rejuvenation and Failure Detection in Partitionable Systems In Proc. of the Pacific Rim Int’l Symposium oh Dependable computing, PRDC 2001, Seoul, South Korea, December 2001.
S. Garg, A. Puliafito and K. S. Trivedi. Analysis of Software Rejuvenation Using Markov Regenerative Stochastic Petri Net. In Proc. of the Sixth Int’l. Symposium on Software Reliability Engineering, pages 180–187, Toulouse, France, October 1995.
S. Garg, Y. Huang, C. Kintala and K. S. Trivedi. Time and Load Based Software Rejuvenation: Policy, Evaluation and Optimality. In Proc. of the First Fault-Tolerant Symposium, Madras, India, December 1995.
S. Garg, Y. Huang and C. Kintala, K.S. Trivedi, Minimizing Completion Time of a Program by Checkpointing and Rejuvenation. Proc. 1996 ACM SIGMETRICS Conference, Philadelphia, PA, pp. 252–261, May 1996.
S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Analysis of Preventive Maintenance in Transactions Based Software Systems. IEEE Trans. on Computers, pages 96–107, Vol. 47, No. 1, January 1998.
S. Garg, A. van Moorsel, K. Vaidyanathan, K. Trivedi. A Methodology for Detection and Estimation of Software Aging. In Proc. of 9th Int’l. Symposium on Software Reliability Engineering, pages 282–292, Paderborn, Germany, November 1998.
S Garg, Y. Huang, C. M. R. Kintala, K. S. Trivedi and S. Yagnik. Performance and Reliability Evaluation of Passive Replication Scheme s in Application Level Fault Tolerance. In Proc. of the Fault Tolerant Computing Symp., FTCS 1999, Madison, WI, pp. 322–329, June 1999.
J. Gray. Why do Computers Stop and What Can be Done About it? In Proc. of 5th Symposium on Reliability in Distributed Software and Database Systems, pages 3–12, January 1986.
J. Gray. A Census of Tandem System Availability Between 1985 and 1990. IEEE Trans. on Reliability, 39:409–418, October 1990.
J. Gray and D. P. Siewiorek. High-availability Computer Systems. IEEE Computer, pages 39–48, September 1991.
J. A. Hartigan. Clustering Algorithms. New York: Wiley, 1975.
C. Hirel, B. Tuffin and K. S. Trivedi. SPNP: Stochastic Petri Net Package. Version 6.0. B. R. Haverkort et al. (eds.): TOOLS 2000, Lecture Notes in Computer Science 1786, pp 354–357, Springer-Verlag Heidelberg, 2000.
Y. Hong, D. Chen, L. Li and K.S. Trivedi. Closed Loop Design for Software Rejuvenation In Proc. of the Workshop on self-healing, adaptive and self-managed systems, SHAMAN 2002, New York, NY, June 2002.
Y. Huang, P. Jalote, and C. Kintala. Lecture Notes in Computer Science, Vol. 774, Two techniques for transient software error recovery, pages 159–170. Springer Verlag, Berlin, 1994.
Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton. Software Rejuvenation: Analysis, Module and Applications. In Proc. of 25th Symposium on Fault Tolerant Computing, FTCS-25, pages 381–390, Pasadena, California, June 1995.
P. Jalote, Y. Huang, and C. Kintala. A Framework for Understanding and Handling Transient Software Failures. In Proc. 2nd ISSAT Int’l. Conf. on Reliability and Quality in Design, Orlando, FL, 1995.
J. C. Knight and N. G. Leveson. An Experimental Evaluation of the Assumption of Independence in Multiversion Programming Software Engineering Journal, pages 96–109, Vol. 12, No. 1, 1986.
I. Lee and R. K. Iyer. Software Dependability in the Tandem GUARDIAN System. IEEE Trans. on Software Engineering, pages 455–467, Vol. 21, No. 5, May 1995.
L. Li, K. Vaidyanathan and K. S. Trivedi. An Approach to Estimation of Software Aging in a Web Server. In Proc. of the Int’l 2002, Nara, Japan, October 2002.
Y. Liu, Y. Ma, J.J. Han, H. Levendel and K.S. Trivedi. Modeling and Analysis of Software Rejuvenation in Cable Modem Termination System. In Proc. of the Int’l. Symp. on Software Reliability Engineering, ISSRE 2002, Annapolis, MD, November 2002.
E. Marshall. Fatal Error: How Patriot Overlooked a Scud. Science, page 1347, March 13, 1992.
D. Mosberger and T. Jin. Httperf-A Tool for Measuring Web Server Performance In First Workshop on Internet Server Performance, WISP, Madison, WI, pp.59–67, June 1998.
A. Pfening, S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Optimal Rejuvenation for Tolerating Soft Failures. Performance Evaluation, 27 & 28, pages 491–506, October 1996.
S. M. Ross. Stochastic Processes. John Wiley & Sons, New York, 1983.
R. A. Sahner, K. S. Trivedi, A. Puliafito. Performance and Reliability Analysis of Computer Systems-An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Norwell, MA, 1996.
R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications, Springer-Verlag, New York, 2000.
M. Sullivan and R. Chillarege. Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. In Proc. 21st IEEE Int’. Symposium on Fault-Tolerant Computing, pages 2–9, 1991.
A. T Tai, S. N. Chau, L. Alkalaj and H. Hecht. On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period. In 3rd Int’l. Workshop on Object Oriented Real-time Dependable Systems, Newport Beach, CA, February 1997.
K. S. Trivedi, J. Muppala, S. Woolet and B. R. Haverkort. Composite Performance and Dependability Analysis. Performance Evaluation, Vol. 14, nos. 3–4, pp. 197–216, February 1992.
K. S. Trivedi. Probability and Statistics, with Reliability, Queuing and Computer Science Applications, 2nd edition. John Wiley, 2001.
K. Vaidyanathan and K. S. Trivedi. A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems. In Proc. of the Tenth IEEE Int’l. Symposium on Software Reliability Engineering, pages 84–93, Boca Raton, Florida, November 1999.
K. Vaidyanathan, R. E. Harper, S. W. Hunter, K. S. Trivedi. Analysis and Implementation of Software Rejuvenation in Cluster Systems. In Proc. of the Joint Int’l. Conference on Measurement and Modeling of Computer Systems, ACM SIGMETRICS 2001/Performance 2001, Cambridge, MA, June 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science + Business Media, Inc.
About this paper
Cite this paper
Trivedi, K.S., Vaidyanathan, K. (2004). Software Rejuvenation - Modeling and Analysis. In: Reis, R. (eds) Information Technology. IFIP International Federation for Information Processing, vol 157. Springer, Boston, MA. https://doi.org/10.1007/1-4020-8159-6_6
Download citation
DOI: https://doi.org/10.1007/1-4020-8159-6_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-8158-3
Online ISBN: 978-1-4020-8159-0
eBook Packages: Springer Book Archive