Skip to main content

A Self-repair Architecture for Cluster Systems

  • Chapter
Architecting Dependable Systems VI

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5835))

Abstract

The paper presents the Jade framework for the construction of self-repairable cluster systems. Jade adopts an architecture-based approach to management, and maintains a causally connected view of the software architecture of the managed system, itself configured and manipulated as a component-based structure. Self-repair is achieved through a combination of component-based design, reflection and active replication of the management subsystem. The paper illustrates the benefits of the Jade approach through its application to a JEE Web application server. Specifically, our evaluation shows that the Jade framework adds negligible overhead to the operation of a managed system, and that Jade achieves short MTTR even with a simple repair policy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdallah, M., Guerraoui, R., Pucheral, P.: One-Phase Commit: Does it make Sense? In: Int. Conf. on Parallel and Distributed Systems, ICPADS 1998. IEEE Computer Society Press, Los Alamitos (1998)

    Google Scholar 

  2. Abdellatif, T., Kornas, J., Stefani, J.B.: J2EE Packaging, Deployment and Reconfiguration Using a General Component Model. In: Dearle, A., Eisenbach, S. (eds.) CD 2005. LNCS, vol. 3798, pp. 134–148. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Amza, C., Cecchet, E., Chanda, A., Cox, A.L., Elnikety, S., Gil, R., Marguerite, J., Rajamani, K., Zwaenepoel, W.: Specification and Implementation of Dynamic Web Site Benchmarks. In: 5th Annual IEEE Workshop on Workload Characterization (2002)

    Google Scholar 

  4. Appleby, K., Fakhouri, S.A., Fong, L.L., Goldszmidt, G.S., Kalantar, M.H., Krishnakumar, S., Pazel, D.P., Pershing, J.A., Rochwerger, B.: Oceano SLA based management of a computing utility. In: Proceedings of Integrated Network Management (2001)

    Google Scholar 

  5. Arshad, N.: A Planning-Based Approach to Failure Recovery in Distributed Systems. PhD thesis, University of Colorado, USA (2006)

    Google Scholar 

  6. Arshad, N., Heimbigner, D., Wolf, A.L.: Deployment and dynamic reconfiguration planning for distributed software systems. Software Quality Journal 15(3), 265–281 (2007); An earlier version of this paper was presented at ICTAI 2003

    Google Scholar 

  7. Batista, T.V., Joolia, A., Coulson, G.: Managing Dynamic Reconfiguration in Component-Based Systems. In: Morrison, R., Oquendo, F. (eds.) EWSA 2005. LNCS, vol. 3527, pp. 1–17. Springer, Heidelberg (2005)

    Google Scholar 

  8. Blair, G.S., Coulson, G., Blair, L., Duran-Limon, H., Grace, P., Moreira, R., Parlavantzas, N.: Reflection, self-awareness and self-healing in OpenORB. In: WOSS 2002: Proceedings of the first workshop on Self-healing systems, pp. 9–14. ACM Press, New York (2002)

    Chapter  Google Scholar 

  9. Bouchenak, S., Boyer, F., Hagimont, D., Krakowiak, S., Mos, A., De Palma, N., Quéma, V., Stefani, J.B.: Architecture-Based Autonomous Repair Management: An Application to J2EE Clusters. In: The 24th IEEE Symposium on Reliable Distributed Systems (SRDS 2005), Orlando, FL, USA (October 2005)

    Google Scholar 

  10. Bouchenak, S., de Palma, N., Hagimont, D., Taton, C.: Autonomic Management of Clustered Applications. In: Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2006), Barcelona, Spain (September 2006)

    Google Scholar 

  11. Bruneton, É., Coupaye, T., Leclercq, M., Quéma, V., Stefani, J.B.: The Fractal Component Model and its Support in Java. Software – Practice and Experience (SP&E) 36(11-12), 1257–1284 (2006); Special issue on Experiences with Auto-adaptive and Reconfigurable Systems

    Article  Google Scholar 

  12. Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., Fox, A.: A Microrebootable System: Design, Implementation, and Evaluation. In: 6th Symposium on Operating Systems Design and Implementation, OSDI 2004 (2004)

    Google Scholar 

  13. Candea, G., Kiciman, E., Kawamoto, S., Fox, A.: Autonomous recovery in componentized Internet applications. Cluster Computing 9(2), 175–190 (2006)

    Article  Google Scholar 

  14. Candea, G., Kiciman, E., Zhang, S., Keyani, P., Fox, A.: JAGR: An Autonomous Self-Recovering Application Server. In: 5th Annual International Workshop on Active Middleware Services (AMS 2003) Autonomic Computing Workshop, Seattle, Etats-Unis (June 2003)

    Google Scholar 

  15. Claudel, B., De Palma, N., Lachaize, R., Hagimont, D.: Self-protection for Distributed Component-Based Applications. In: Datta, A.K., Gradinariu, M. (eds.) SSS 2006. LNCS, vol. 4280, pp. 184–198. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Coulson, G., Blair, G.S., Clarke, M., Parlavantzas, N.: The design of a configurable and reconfigurable middleware platform. Distrib. Comput. 15(2), 109–126 (2002)

    Article  Google Scholar 

  17. Dashofy, E.M., van der Hoek, A., Taylor, R.N.: Towards Architecture-based Self-Healing Systems. In: Proceedings of the First ACM SIGSOFT Workshop on Self-healing Systems, Charleston (2002)

    Google Scholar 

  18. David, P.C., Léger, M., Grall, H., Ledoux, T., Coupaye, T.: A Multi-stage Approach for Reliable Dynamic Reconfigurations of Component-Based Systems. In: Meier, R., Terzis, S. (eds.) DAIS 2008. LNCS, vol. 5053, pp. 106–111. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Garlan, D., Cheng, S.W., Huang, A.C., Schmerl, B., Steenkiste, P.: Rainbow: Architecture-Based Self Adaptation with Reusable Infrastructure. IEEE Computer 37(10) (October 2004)

    Google Scholar 

  20. Georgiadis, I., Magee, J., Kramer, J.: Self-organising software architectures for distributed systems. In: 1st Workshop on Self-Healing Systems (WOSS 2002), New York, NY (2002)

    Google Scholar 

  21. Guerraoui, R., Rodrigues, L.: Reliable Distributed Programming. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  22. Kephart, J.O.: Research challenges of autonomic computing. In: ICSE 2005: Proceedings of the 27th international conference on Software engineering, pp. 15–22. ACM, New York (2005)

    Google Scholar 

  23. Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. IEEE Computer Magazine 36(1) (2003)

    Google Scholar 

  24. Kiczales, G., des Rivières, J., Bobrow, D.: The Art of the Metaobject Protocol. MIT Press, Cambridge (1991)

    Google Scholar 

  25. Maes, P.: Concepts and Experiments in Computational Reflection. In: Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 1987). SIGPLAN Notices, vol. 22(12). ACM, New York (1987)

    Google Scholar 

  26. Mao, Y., Junqueira, F.P., Marzullo, K.: Mencius: Building Efficient Replicated State Machine for WANs. In: 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008. USENIX Association (2008)

    Google Scholar 

  27. Moreira, R.S., Blair, G.S., Carrapatoso, E.: Supporting adaptable distributed systems with formaware. In: ICDCS 2004 Workshops. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  28. Norris, J., Coleman, K., Fox, A., Candea, G.: OnCall: Defeating spikes with a free-market application cluster. In: 1st International Conference on Autonomic Computing (ICAC 2004), New York, USA, May 2004, pp. 198–205 (2004)

    Google Scholar 

  29. Parashar, M., Liu, H., Li, Z., Matossian, V., Schmidt, C., Zhang, G., Hariri, S.: Automate: Enabling autonomic applications on the grid. Cluster Computing 9(2) (2006)

    Google Scholar 

  30. Pradhan, P., Tewari, R., Sahu, S., Chandra, A., Shenoy, P.: An observation-based approach towards self-managing web servers. In: IWQoS 2002: Tenth IEEE International Workshop on Quality of Service, pp. 13–22 (2002)

    Google Scholar 

  31. Shaw, M., Garlan, D.: Software Architecture: Perspectives on an Emerging Discipline. Prentice-Hall, Englewood Cliffs (1996)

    MATH  Google Scholar 

  32. Sicard, S., Boyer, F., De Palma, N.: Using components for architecture-based management: the self-repair case. In: 30th International Conference on Software Engineering (ICSE 2008). ACM, New York (2008)

    Google Scholar 

  33. Sloman, M.: Policy driven management for distributed systems. J. Network Syst. Management 2(4) (1994)

    Google Scholar 

  34. Soundararajan, G., Amza, C.: Autonomic provisioning of backend databases in dynamic content web servers. Technical report, Department of Electrical and Computer Engineering, University of Toronto (2005)

    Google Scholar 

  35. Soundararajan, G., Amza, C., Goel, A.: Database replication policies for dynamic content applications. In: First EuroSys Conference (EuroSys 2006), Leuven, Belgium (April 2006)

    Google Scholar 

  36. Szyperski, C.: Component Software, 2nd edn. Addison-Wesley, Reading (2002)

    Google Scholar 

  37. Taton, C., De Palma, N., Hagimont, D., Bouchenak, S., Philippe, J.: Self-optimization of clustered message-oriented middleware. In: Meersman, R., Tari, Z. (eds.) OTM 2007, Part I. LNCS, vol. 4803, pp. 540–557. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  38. Urgaonkar, B., Shenoy, P.J.: Cataclysm: policing extreme overloads in internet applications. In: Proceedings of the 14th international conference on World Wide Web (WWW 2005), Chiba, Japan, May 2005, pp. 740–749 (2005)

    Google Scholar 

  39. van der Hoek, A.: Configurable software architecture in support of configuration management and software deployment. In: 21st International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Boyer, F., De Palma, N., Gruber, O., Sicard, S., Stefani, JB. (2009). A Self-repair Architecture for Cluster Systems. In: de Lemos, R., Fabre, JC., Gacek, C., Gadducci, F., ter Beek, M. (eds) Architecting Dependable Systems VI. Lecture Notes in Computer Science, vol 5835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10248-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10248-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10247-9

  • Online ISBN: 978-3-642-10248-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics