Advertisement

Fault Management

  • Dinesh Chandra Verma
Chapter

Abstract

A fault in the computer system is the failure of a component which prevents the computer systems from operating normally. As the computer system operates, it may experience faults due to a variety of reasons. Each fault would generate some type of alerts or error messages to be reported in the monitoring infrastructure. These monitored alert messages will be stored in the management database that is responsible for fault management.

Keywords

Fault Diagnosis Dependency Graph Error Code Border Gateway Protocol Fault Management 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    E. Manoel, M.J. Nielsen, A. Salahshour, S. Sampath, and S. Sudarshanan, Problem Determination using Self-Managing Autonomic Technology, IBM Redbook Number SG-24-6665-00, June 2005.Google Scholar
  2. 2.
    OASIS Web Services Distributed Management Working Group Common Base event Specification, October 2003.Google Scholar
  3. 3.
  4. 4.
    T. Acorn and Walden, S., SMART: Support management automated reasoning technology for Compaq customer service. In Proceedings of the Tenth National Conference Conference on Artificial Intelligence. MIT Press, Cambridge, 1992.Google Scholar
  5. 5.
    M. Steinder and A.S. Sethi, A Survey of fault localization techniques in computer networks, Science of Computer Programming, Special Edition on Topics in System Administration, 53(2): 165–194, November 2004.MathSciNetMATHGoogle Scholar
  6. 6.
    A. Ganek and T. Corbi, The dawning of the autonomic computing era, Autonomic Computing. IBM Systems Journal, 42(1): 5–18, 2003.Google Scholar
  7. 7.
    A.T. Bouloutas, S.B. Calo, A. Finkel, and I. Katzela, Distributed fault identification in telecommunication networks, Journal of Network and Systems Management, 3(3): 295–312, 1995.Google Scholar
  8. 8.
    S. Brugnoni, R. Manione, E. Montariolo, E. Paschetta, and L. Sisto, An expert system for real time diagnosis of the Italian telecommunications network, In: H.G. Hegering, Y. Yemini (Eds.), Integrated Network Management III, North-Holland, Amsterdam, 1993.Google Scholar
  9. 9.
    G. Forman, M. Jain, J. Martinka, M. Mansouri-Samani, and A. Snoeren, Automated end-to-end system diagnosis of networked printing services using model based reasoning, In: Ninth International Workshop on Distributed Systems: Operations and Management, University of Delaware, Newark, DE, October 1998, pp. 142–154 [87].Google Scholar
  10. 10.
    R.D. Gardner and D.A. Harle, Alarm correlation and network fault resolution using the Kohonen self-organizing map, In: Proceedings of IEEE GLOBECOM, Toronto, Canada, September 1997.Google Scholar
  11. 11.
    P. Hong and P. Sen, Incorporating non-deterministic reasoning in managing heterogeneous network faults, Integrated Network Management II, North-Holland, Amsterdam, 1991, pp. 481–492.Google Scholar
  12. 12.
    C. Joseph, J. Kindrick, K. Muralidhar, and T. Toth-Fejel, MAP fault management expert system, In: B. Meandzija, J. Westcott (Eds.), Integrated Network Management I, North-Holland, Amsterdam, 1989, pp. 627–636 [68].Google Scholar
  13. 13.
    S. Katker, A modeling framework for integrated distributed systems fault management, Proceedings of the IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, 1996, pp. 187–198.Google Scholar
  14. 14.
    S. Katker and K. Geihs, A generic model for fault isolation in integrated management systems, Journal of Network and Systems Management, 5(2): 109–130, 1997.CrossRefGoogle Scholar
  15. 15.
    I. Katzela and M. Schwartz, Schemes for fault identification in communication networks, IEEE/ACM Transactions on Networking, 3(6): 733–764, 1995.CrossRefGoogle Scholar
  16. 16.
    S. Kliger, S. Yemini, Y. Yemini, D. Ohsie, and S. Stolfo, A coding approach to event correlation, Proceedings of Integrated Network Managemen, Chapman and Hall, London, 1995, pp. 266–277 [86].Google Scholar
  17. 17.
    L. Lewis, A case-based reasoning approach to the resolution of faults in communications networks, In: Proceedings of Integrated Network Management III, North-Holland, Amsterdam, 1993, pp. 671–681 [36].Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations