On Self-healing Based on Collaborating End-Systems, Access, Edge and Core Network Components

  • Nikolay Tcholtchev
  • Ranganai Chaparadza
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 63)


Autonomic Networking, realized through control loops, is an enabler for advanced self-manageability of network nodes and respectively the network as a whole. Self-healing is one of the desired autonomic features of a system/network that can be facilitated through autonomic behaviors realized by control loop structures. Autonomicity, implemented over existing protocol stacks as managed resources, requires an architectural framework that integrates the diverse aspects and levels of self-healing capabilities of individual protocols, systems and the network as a whole, such that they all should co-operate as required towards achieving reliable network services. This integration should include the traditional resilience capabilities intrinsically embedded within some protocols e.g. some telecommunication protocols, as well as diverse proactive and reactive schemes for incident prevention and resolution, which must be realized by autonomic entities implementing a control loops at a higher-level outside of protocols. In this paper, we present our considerations on how such an architectural framework, integrating the diverse resilience aspects inside an autonomic node, can facilitate collaborative self-healing across end systems, access networks, edge and core network components.


Autonomic Fault-Management GANA-orientated architecture for Autonomic Fault-Management Resilience Self-Healing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Autonomic Computing: An architectural blueprint for autonomic computing, IBM White Paper (2006),
  2. 2.
    Chaparadza, R.: Requirements for a Generic Autonomic Network Architecture (GANA), suitable for Standardizable Autonomic Behavior Specifications for Diverse Networking Environments. IEC Annual Review of Communications 61 (December 2008)Google Scholar
  3. 3.
    The FCAPS management framework: ITU-T Rec. M. 3400Google Scholar
  4. 4.
    Tcholtchev, N., et al.: Towards a Unified Architecture for Resilience, Survivability and Autonomic Fault-Management for Self-Managing Networks. To appear in the Proceedings of the 2nd Workshop on Monitoring Adaptation and Beyond MONA+Google Scholar
  5. 5.
    Markopoulou, A., Iannaccone, G., Bhattacharyya, S., Chuah, C.N., Ganjali, Y., Diot, C.: Characterization of Failures in an Operational IP Backbone Network. IEEE/ACM Transactions on Networking 16(4), 749–762 (2008)CrossRefGoogle Scholar
  6. 6.
    Touvet, F., Harle, D.: Network Resilience in Multilayer Networks: A Critical Review and Open Issues. In: The Proceedings of the First International Conference on Networking-Part 1, July 09-13, pp. 829–838 (2001)Google Scholar
  7. 7.
    Chaparadza, R.: UniFAFF: A Unified Framework for Implementing Autonomic Fault-Management and Failure-Detection for Self-Managing Networks. John Wiley & Sons, Chichester (2008)Google Scholar
  8. 8.
    Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Journal – Science of Computer Programming 53, 165–194 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Li, N., Chen, G., Zhao, M.: Autonomic Fault Management for Wireless Mesh Networks. Electronic Journal for E-Commence Tools and Applicatoins, eJETA (January 2009)Google Scholar
  10. 10.
    EFIPSANS project: (as of date September 17, 2010)
  11. 11.
    Shalunov, S., Carlson, R.: Detecting Duplex Mismatch on Ethernet. In: Dovrolis, C. (ed.) PAM 2005. LNCS, vol. 3431, pp. 135–148. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Kompella, R.R., Yates, J., Greenberg, A., Snoeren, A.C.: Detection and Localization of Network Blackholes. In: The Proceedings of IEEE Infocom, Alaska, USA (May 2007)Google Scholar
  13. 13.
    Hubble: Monitoring Internet Reachability in Real-Time, (as of date July 12, 2010)
  14. 14.
    CIM, (as of date September 17, 2010)
  15. 15.
    ITU-X.733: Information Technology – Open Systems Interconnection – Systems Management: Alarm Reporting FunctionGoogle Scholar

Copyright information

© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2011

Authors and Affiliations

  • Nikolay Tcholtchev
    • 1
  • Ranganai Chaparadza
    • 1
  1. 1.Fraunhofer-FOKUS Institute for Open Communication SystemsBerlinGermany

Personalised recommendations