Abstract
Self-caring systems are systems capable of monitoring and managing their own health and, indirectly, their useful lifetime. Unlike self-healing systems which are reactive to faults and failures, self-caring systems are aware of their health and hence can potentially circumvent and adapt to impending faults, or recover from them quicker and more effectively. Towards a methodology to model and incorporate health management logic and control mechanisms into an Information Technology (IT) system whose health needs to be managed, we propose the following: 1. the use of Petri nets as a discrete event system (DES) graphical model that can also be used for analysis, simulation and execution control, 2. the use of Remaining-Useful-Life (RUL) management and prognosis as a novel way of looking at health management in IT systems 3. the use of a control theoretic framework for RUL management. As a simple illustration of the concept, a controller was built for useful life management in the application execution stage (containing a potential memory exhaustion fault) of an IT system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Murch, R.: Autonomic Computing. IBM Press (2004)
Marinescu, D.C.: Internet Based Workflow Management: Towards a Semantic Web. Wiley Interscience, Hoboken (2002)
Stewart, C., Shen, K.: Performance modeling and system management for multi-component online services. In: 2nd Conference on Symposium on Networked Systems Design and Implementation (2005)
Conallen, J.: Modeling Web application architectures with UML. Communications of the ACM (1999)
Van der Mei, R.D., Hariharan, R., Reeser, P.: Web Server Performance Modeling. Telecommunication Systems (2001)
Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An analytical model for multi-tier internet services and its applications. In: ACM SIGMETRICS (2005)
Vaidyanathan, K., Trivedi, K.S.: A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems. In: 10th International Symposium on Software Reliability Engineering (1999)
Kahkipuro, P.: UML-Based Performance Modeling Framework for Component-Based Distributed Systems. LNCS (2001)
Zhou, M., Venkatesh, K.: Modeling, Simulation and Control of Flexible Manufacturing Systems A Petri net Approach. World Scientific, Singapore (1999)
Vachtsevanos, G., Lewis, F.L., Roemer, M., Hess, A., Wu, B.: Intelligent Fault Diagnosis and Prognosis for Engineering Systems. Wiley, John and Sons, Chichester (2006)
Tang, L., Kacprzynski, G.J., Goebel, K., Saxena, A., Saha, B., Vachtsevanos, G.: Prognostics-Enhanced Automated Contingency Management for Advanced Autonomous Systems. In: Ist International Conference on Prognostics and Health Management (PHM 2008), Denver, CO (2008)
Engel, S.J., Gilmartin, B.J., Bongort, K., Hess, A.: Prognostics, The Real Issues Involved with Predicting Life Remaining. In: IEEE Aerospace Conference (2000)
Kalgren, P.W., Baybutt, M., Ginart, A., Minnella, C., Roemer, M.J., Dabney, T.: Application of prognostic health management in digital electronic systems. In: IEEE Aerospace Conference, pp. 1–9 (March 2007)
Michael, J.R., Kacprzynski, G.J., Nwadiogbu, E.O., Bloor, G.: Development of Diagnostic and Prognostic Technologies for Aerospace Health Management Applications. In: IEEE Aerospace Conference, Big Sky, MT, pp. 3139–3147 (2001)
Kadirvel, S., Fortes, J.A.B.: Self-Caring IT Systems - A Proof-of-Concept Implementation in Virtualized Environments. In: International Conference on Cloud Computing Technology and Science (CloudCom), Indianapolis, USA (2010)
Urmanov, A.: Electronic Prognostics for Computer Servers. In: Proceedings of 53rd Annual Reliability and Maintainability Symposium (RAMS), Orlando, Florida, pp. 65–70 (2007)
Pecht, M., Jaai, R.: A prognostics and health management roadmap for information and electronics-rich systems. Microelectronics Reliability 50(3), 317–323 (2010)
CWE-400: Uncontrolled Resource Consumption. Common Weakness Enumeration. An initiative sponsored by the National Cyber Security Division of the U.S. Department of Homeland Security, http://cwe.mitre.org/data/definitions/400.html (accessed: March 16, 2010)
Zhou, M., Dicesare, F.: Petri Net Synthesis for Discrete Event Control of Manufacturing Systems. Kluwer Publishers, Dordrecht (1993)
Jensen, K., Kristensen, L.M., Wells, L.: Coloured Petri Nets and CPN Tools for modeling and validation of concurrent systems. International Journal on Software Tools for Technology Transfer, STTT (2007)
Marsan, A.: Stochastic Petri nets: An elementary Introduction. In: Rozenberg, G. (ed.) APN 1989. LNCS, vol. 424, pp. 1–29. Springer, Heidelberg (1990)
Muppala, J., Ciardo, G., Trivedi, K.S.: Stochastic Reward Nets for Reliability Prediction. In: Communications in Reliability, Maintainability and Serviceability (1994)
Kolettis, N., Fulton, N.D.: Software Rejuvenation: Analysis, Module and Applications. In: 25th International Symposium on Fault-Tolerant Computing (1995)
Vaidyanathan, K., Trivedi, K.S.: A Comprehensive Model for Software Rejuvenation. IEEE Transactions Dependable and Secure Computing (2005)
Gross, K.C., McMaster, S., Porter, A., Urmanov, A., Votta, L.G., Langer, Y., Urmanov, A.: System’s Availability Maximization Through Preventive Rejuvenation. Sun Microsystems, USA (2006)
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: 18th International Conference on Machine Learning, pp. 1–9 (2001)
Dobson, S.: Facilitating a well-founded approach to autonomic systems. In: 5th IEEE Workshop on the Engineering of Autonomic and Autonomous Systems, Belfast, UK (2008)
Dobson, S.: Achieving an acceptable design model for autonomic systems. In: 4th IEEE International Workshop on Engineering Autonomic and Autonomous Systems Tucson, AZ, pp. 196–202 (2007)
Graupner, S., Cook, N., Coleman, D.: Automation Controller for Operational IT Management. Integrated Network Management, 363–372 (2007)
Salfner, F., Wolter, K.: A Petri net model for service availability in redundant computing systems. In: Winter Simulation Conference (2009)
Dai, Y.S., Marshall, T., Guan, X.H.: Autonomic and Dependable Computing: Moving Towards a Model-Driven Approach. Journal of Computer Science (2006)
Bellur, U.: Automating Applications Management in the Enterprise using DMTF Information Models. Indian Institute of Technology, Bombay, www.dmtf.org/education/academicalliance (accessed: March 16, 2010)
Van der Aalst, W.M.P., Van Hee, K.M.: Business Process Redesign A Petri net based approach. Computers in Industry (1996)
Shetty, S., Nordstrom, S., Ahuja, S., Yao, D., Bapty, T., Neema, S.: Systems Integration of Large Scale Autonomic Systems Using Multiple Domain Specific Modeling Languages. In: 12th IEEE International Conference and Workshops on Engineering of Computer-Based Systems, Washington DC (2005)
Dubey, A., Nordstrom, S., Keskinpala, T., Neema, S., Bapty, T.: Verifying Autonomic Fault Mitigation Strategies in Large Scale Real-Time Systems. In: Third IEEE international Workshop on Engineering of Autonomic and Autonomous Systems, Washington DC (2006)
Garlan, D., Schmerl, B., Cheng, S.: Software Architecture-Based Self-Adaptation. Autonomic Computing and Networking Part 1, 31–55 (2009)
Salfner, F., Lenk, M., Malek, M.: A Survey of Online Failure Prediction Methods. ACM Comput. Surv. 42(3), Article 10 (2010)
Williams, A.W., Pertet, S.M., Narasimhan, P.: Tiresias: Black-Box Failure Prediction in Distributed Systems. In: 21st International Parallel and Distributed Processing Symposium (IPDPS), California, USA (2007)
Brandt, J., Gentile, A., Mayo, J., Pbay, P., Roe, D., Thompson, D., Wong, M.: Methodologies for Advance Warning of Compute Cluster Problems via Statistical Analysis: A Case Study. In: Workshop on Resiliency in High Performance Computing (HPDC), Munich, Germany (2009)
Brandt, J., Debusschere, B., Gentile, A., Mayo, J., Pbay, P., Thompson, D., Wong, M.: Using Probabilistic Characterization to Reduce Runtime Faults on HPC Systems. In: Workshop on Resiliency in High-Performance Computing (CCGRID), Lyon, France (2008)
Ren, X., Lee, S., Eigenmann, R., Bagchi, S.: Resource Failure Prediction in Fine-Grained Cycle Sharing Systems. In: 15th IEEE International Symposium on High Performance Distributed Computing (HPDC-15), France (2006)
Laguna, I., Arshad, F.A., Grothe, D.M., Bagchi, S.: How To Keep Your Head Above Water While Detecting Errors. In: ACM/IFIP/USENIX 10th International Middleware Conference, Illinois (2009)
Schroeder, B., Gibson, G.A.: A large-scale study of failures in high-performance computing systems. In: International Conference on Dependable Systems and Networks (2006)
Joshi, K.R., Sanders, W.H., Hiltunen, M.A., Schlichting, R.D.: Automatic Model-Driven Recovery in Distributed Systems. In: 24th IEEE Symposium on Reliable Distributed Systems (2005)
Gibson, G.A., Schroeder, B., Digney, J.: Failure Tolerance in Petascale Computers. CTWatch Quarterly 3(4), Volume on Software Enabling Technologies for Petascale Science (2007)
Schroeder, B., Gibson, G.A.: Understanding Failures in Petascale Computers. In: SciDAC 2007. Journal of Physics: Conference Series, vol. 78 (2007)
Schroeder, B., Gibson, G.A.: Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? In: 5th USENIX Conference on File and Storage Technologies, San Jose, CA (2007)
Schroeder, B., Gibson, G.: A Large Scale Study of Failures in High-performance-computing Systems. In: International Symposium on Dependable Systems and Networks (2006)
Antunes, J., Neves, N.F., VerÃssimo, P.J.: Detection and Prediction of Resource-Exhaustion Vulnerabilities. In: 19th International Symposium on Software Reliability Engineering, pp. 87–96 (2008)
Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. John Wiley and Sons, Chichester (2004)
Gandhi, N., Tilbury, D.M., Diao, Y., Hellerstein, J., Parekh, S.: MIMO control of an Apache Web Server: Modeling and Controller Design. In: American Control Conference, Ann Arbor, Michigan (2002)
Diao, Y., Hu, X., Tantawi, A., Wu, H.: An adaptive feedback controller for SIP server memory overload protection. In: 6th International Conference on Autonomic Computing, Barcelona, Spain (2009)
Peterson, J.L.: Petri Net Theory and The Modeling of Systems. Prentice-Hall, New Jersey (1981)
Bonet, P., Llado, C.M., Puijaner, R., Knottenbelt, W.J.: PIPE2.5 - A Petri net tool for performance modeling. In: Proc. 23rd Latin American Conference on Informatics, San Jose, Costa Rica (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kadirvel, S., Fortes, J.A.B. (2011). Towards IT Systems Capable of Managing Their Health. In: Calinescu, R., Jackson, E. (eds) Foundations of Computer Software. Modeling, Development, and Verification of Adaptive Systems. Monterey Workshop 2010. Lecture Notes in Computer Science, vol 6662. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21292-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-21292-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21291-8
Online ISBN: 978-3-642-21292-5
eBook Packages: Computer ScienceComputer Science (R0)