Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6662))

Included in the following conference series:

Abstract

Self-caring systems are systems capable of monitoring and managing their own health and, indirectly, their useful lifetime. Unlike self-healing systems which are reactive to faults and failures, self-caring systems are aware of their health and hence can potentially circumvent and adapt to impending faults, or recover from them quicker and more effectively. Towards a methodology to model and incorporate health management logic and control mechanisms into an Information Technology (IT) system whose health needs to be managed, we propose the following: 1. the use of Petri nets as a discrete event system (DES) graphical model that can also be used for analysis, simulation and execution control, 2. the use of Remaining-Useful-Life (RUL) management and prognosis as a novel way of looking at health management in IT systems 3. the use of a control theoretic framework for RUL management. As a simple illustration of the concept, a controller was built for useful life management in the application execution stage (containing a potential memory exhaustion fault) of an IT system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Murch, R.: Autonomic Computing. IBM Press (2004)

    Google Scholar 

  2. Marinescu, D.C.: Internet Based Workflow Management: Towards a Semantic Web. Wiley Interscience, Hoboken (2002)

    Google Scholar 

  3. Stewart, C., Shen, K.: Performance modeling and system management for multi-component online services. In: 2nd Conference on Symposium on Networked Systems Design and Implementation (2005)

    Google Scholar 

  4. Conallen, J.: Modeling Web application architectures with UML. Communications of the ACM (1999)

    Google Scholar 

  5. Van der Mei, R.D., Hariharan, R., Reeser, P.: Web Server Performance Modeling. Telecommunication Systems (2001)

    Google Scholar 

  6. Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An analytical model for multi-tier internet services and its applications. In: ACM SIGMETRICS (2005)

    Google Scholar 

  7. Vaidyanathan, K., Trivedi, K.S.: A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems. In: 10th International Symposium on Software Reliability Engineering (1999)

    Google Scholar 

  8. Kahkipuro, P.: UML-Based Performance Modeling Framework for Component-Based Distributed Systems. LNCS (2001)

    Google Scholar 

  9. Zhou, M., Venkatesh, K.: Modeling, Simulation and Control of Flexible Manufacturing Systems A Petri net Approach. World Scientific, Singapore (1999)

    Book  Google Scholar 

  10. Vachtsevanos, G., Lewis, F.L., Roemer, M., Hess, A., Wu, B.: Intelligent Fault Diagnosis and Prognosis for Engineering Systems. Wiley, John and Sons, Chichester (2006)

    Book  Google Scholar 

  11. Tang, L., Kacprzynski, G.J., Goebel, K., Saxena, A., Saha, B., Vachtsevanos, G.: Prognostics-Enhanced Automated Contingency Management for Advanced Autonomous Systems. In: Ist International Conference on Prognostics and Health Management (PHM 2008), Denver, CO (2008)

    Google Scholar 

  12. Engel, S.J., Gilmartin, B.J., Bongort, K., Hess, A.: Prognostics, The Real Issues Involved with Predicting Life Remaining. In: IEEE Aerospace Conference (2000)

    Google Scholar 

  13. Kalgren, P.W., Baybutt, M., Ginart, A., Minnella, C., Roemer, M.J., Dabney, T.: Application of prognostic health management in digital electronic systems. In: IEEE Aerospace Conference, pp. 1–9 (March 2007)

    Google Scholar 

  14. Michael, J.R., Kacprzynski, G.J., Nwadiogbu, E.O., Bloor, G.: Development of Diagnostic and Prognostic Technologies for Aerospace Health Management Applications. In: IEEE Aerospace Conference, Big Sky, MT, pp. 3139–3147 (2001)

    Google Scholar 

  15. Kadirvel, S., Fortes, J.A.B.: Self-Caring IT Systems - A Proof-of-Concept Implementation in Virtualized Environments. In: International Conference on Cloud Computing Technology and Science (CloudCom), Indianapolis, USA (2010)

    Google Scholar 

  16. Urmanov, A.: Electronic Prognostics for Computer Servers. In: Proceedings of 53rd Annual Reliability and Maintainability Symposium (RAMS), Orlando, Florida, pp. 65–70 (2007)

    Google Scholar 

  17. Pecht, M., Jaai, R.: A prognostics and health management roadmap for information and electronics-rich systems. Microelectronics Reliability 50(3), 317–323 (2010)

    Article  Google Scholar 

  18. CWE-400: Uncontrolled Resource Consumption. Common Weakness Enumeration. An initiative sponsored by the National Cyber Security Division of the U.S. Department of Homeland Security, http://cwe.mitre.org/data/definitions/400.html (accessed: March 16, 2010)

  19. Zhou, M., Dicesare, F.: Petri Net Synthesis for Discrete Event Control of Manufacturing Systems. Kluwer Publishers, Dordrecht (1993)

    Book  MATH  Google Scholar 

  20. Jensen, K., Kristensen, L.M., Wells, L.: Coloured Petri Nets and CPN Tools for modeling and validation of concurrent systems. International Journal on Software Tools for Technology Transfer, STTT (2007)

    Google Scholar 

  21. Marsan, A.: Stochastic Petri nets: An elementary Introduction. In: Rozenberg, G. (ed.) APN 1989. LNCS, vol. 424, pp. 1–29. Springer, Heidelberg (1990)

    Chapter  Google Scholar 

  22. Muppala, J., Ciardo, G., Trivedi, K.S.: Stochastic Reward Nets for Reliability Prediction. In: Communications in Reliability, Maintainability and Serviceability (1994)

    Google Scholar 

  23. Kolettis, N., Fulton, N.D.: Software Rejuvenation: Analysis, Module and Applications. In: 25th International Symposium on Fault-Tolerant Computing (1995)

    Google Scholar 

  24. Vaidyanathan, K., Trivedi, K.S.: A Comprehensive Model for Software Rejuvenation. IEEE Transactions Dependable and Secure Computing (2005)

    Google Scholar 

  25. Gross, K.C., McMaster, S., Porter, A., Urmanov, A., Votta, L.G., Langer, Y., Urmanov, A.: System’s Availability Maximization Through Preventive Rejuvenation. Sun Microsystems, USA (2006)

    Google Scholar 

  26. Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: 18th International Conference on Machine Learning, pp. 1–9 (2001)

    Google Scholar 

  27. Dobson, S.: Facilitating a well-founded approach to autonomic systems. In: 5th IEEE Workshop on the Engineering of Autonomic and Autonomous Systems, Belfast, UK (2008)

    Google Scholar 

  28. Dobson, S.: Achieving an acceptable design model for autonomic systems. In: 4th IEEE International Workshop on Engineering Autonomic and Autonomous Systems Tucson, AZ, pp. 196–202 (2007)

    Google Scholar 

  29. Graupner, S., Cook, N., Coleman, D.: Automation Controller for Operational IT Management. Integrated Network Management, 363–372 (2007)

    Google Scholar 

  30. Salfner, F., Wolter, K.: A Petri net model for service availability in redundant computing systems. In: Winter Simulation Conference (2009)

    Google Scholar 

  31. Dai, Y.S., Marshall, T., Guan, X.H.: Autonomic and Dependable Computing: Moving Towards a Model-Driven Approach. Journal of Computer Science (2006)

    Google Scholar 

  32. Bellur, U.: Automating Applications Management in the Enterprise using DMTF Information Models. Indian Institute of Technology, Bombay, www.dmtf.org/education/academicalliance (accessed: March 16, 2010)

  33. Van der Aalst, W.M.P., Van Hee, K.M.: Business Process Redesign A Petri net based approach. Computers in Industry (1996)

    Google Scholar 

  34. Shetty, S., Nordstrom, S., Ahuja, S., Yao, D., Bapty, T., Neema, S.: Systems Integration of Large Scale Autonomic Systems Using Multiple Domain Specific Modeling Languages. In: 12th IEEE International Conference and Workshops on Engineering of Computer-Based Systems, Washington DC (2005)

    Google Scholar 

  35. Dubey, A., Nordstrom, S., Keskinpala, T., Neema, S., Bapty, T.: Verifying Autonomic Fault Mitigation Strategies in Large Scale Real-Time Systems. In: Third IEEE international Workshop on Engineering of Autonomic and Autonomous Systems, Washington DC (2006)

    Google Scholar 

  36. Garlan, D., Schmerl, B., Cheng, S.: Software Architecture-Based Self-Adaptation. Autonomic Computing and Networking Part 1, 31–55 (2009)

    Article  Google Scholar 

  37. Salfner, F., Lenk, M., Malek, M.: A Survey of Online Failure Prediction Methods. ACM Comput. Surv. 42(3), Article 10 (2010)

    Google Scholar 

  38. Williams, A.W., Pertet, S.M., Narasimhan, P.: Tiresias: Black-Box Failure Prediction in Distributed Systems. In: 21st International Parallel and Distributed Processing Symposium (IPDPS), California, USA (2007)

    Google Scholar 

  39. Brandt, J., Gentile, A., Mayo, J., Pbay, P., Roe, D., Thompson, D., Wong, M.: Methodologies for Advance Warning of Compute Cluster Problems via Statistical Analysis: A Case Study. In: Workshop on Resiliency in High Performance Computing (HPDC), Munich, Germany (2009)

    Google Scholar 

  40. Brandt, J., Debusschere, B., Gentile, A., Mayo, J., Pbay, P., Thompson, D., Wong, M.: Using Probabilistic Characterization to Reduce Runtime Faults on HPC Systems. In: Workshop on Resiliency in High-Performance Computing (CCGRID), Lyon, France (2008)

    Google Scholar 

  41. Ren, X., Lee, S., Eigenmann, R., Bagchi, S.: Resource Failure Prediction in Fine-Grained Cycle Sharing Systems. In: 15th IEEE International Symposium on High Performance Distributed Computing (HPDC-15), France (2006)

    Google Scholar 

  42. Laguna, I., Arshad, F.A., Grothe, D.M., Bagchi, S.: How To Keep Your Head Above Water While Detecting Errors. In: ACM/IFIP/USENIX 10th International Middleware Conference, Illinois (2009)

    Google Scholar 

  43. Schroeder, B., Gibson, G.A.: A large-scale study of failures in high-performance computing systems. In: International Conference on Dependable Systems and Networks (2006)

    Google Scholar 

  44. Joshi, K.R., Sanders, W.H., Hiltunen, M.A., Schlichting, R.D.: Automatic Model-Driven Recovery in Distributed Systems. In: 24th IEEE Symposium on Reliable Distributed Systems (2005)

    Google Scholar 

  45. Gibson, G.A., Schroeder, B., Digney, J.: Failure Tolerance in Petascale Computers. CTWatch Quarterly 3(4), Volume on Software Enabling Technologies for Petascale Science (2007)

    Google Scholar 

  46. Schroeder, B., Gibson, G.A.: Understanding Failures in Petascale Computers. In: SciDAC 2007. Journal of Physics: Conference Series, vol. 78 (2007)

    Google Scholar 

  47. Schroeder, B., Gibson, G.A.: Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? In: 5th USENIX Conference on File and Storage Technologies, San Jose, CA (2007)

    Google Scholar 

  48. Schroeder, B., Gibson, G.: A Large Scale Study of Failures in High-performance-computing Systems. In: International Symposium on Dependable Systems and Networks (2006)

    Google Scholar 

  49. Antunes, J., Neves, N.F., Veríssimo, P.J.: Detection and Prediction of Resource-Exhaustion Vulnerabilities. In: 19th International Symposium on Software Reliability Engineering, pp. 87–96 (2008)

    Google Scholar 

  50. Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. John Wiley and Sons, Chichester (2004)

    Book  Google Scholar 

  51. Gandhi, N., Tilbury, D.M., Diao, Y., Hellerstein, J., Parekh, S.: MIMO control of an Apache Web Server: Modeling and Controller Design. In: American Control Conference, Ann Arbor, Michigan (2002)

    Google Scholar 

  52. Diao, Y., Hu, X., Tantawi, A., Wu, H.: An adaptive feedback controller for SIP server memory overload protection. In: 6th International Conference on Autonomic Computing, Barcelona, Spain (2009)

    Google Scholar 

  53. Peterson, J.L.: Petri Net Theory and The Modeling of Systems. Prentice-Hall, New Jersey (1981)

    MATH  Google Scholar 

  54. Bonet, P., Llado, C.M., Puijaner, R., Knottenbelt, W.J.: PIPE2.5 - A Petri net tool for performance modeling. In: Proc. 23rd Latin American Conference on Informatics, San Jose, Costa Rica (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kadirvel, S., Fortes, J.A.B. (2011). Towards IT Systems Capable of Managing Their Health. In: Calinescu, R., Jackson, E. (eds) Foundations of Computer Software. Modeling, Development, and Verification of Adaptive Systems. Monterey Workshop 2010. Lecture Notes in Computer Science, vol 6662. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21292-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21292-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21291-8

  • Online ISBN: 978-3-642-21292-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics