Advertisement

Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination

  • Manish Gupta
  • Anindya Neogi
  • Manoj K. Agarwal
  • Gautam Kar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2867)

Abstract

In order to reduce mean time to recovery (MTTR) in heterogeneous enterprise environments it should be possible to easily and quickly determine the root cause of a problem detected at a higher level, e.g. through response time violation of a transaction category, and resolve it. Many problem determination applications use a component dependency graph to pinpoint the root cause. However, such graphs are often manually constructed. This paper introduces a simple non-intrusive technique based on mining of existing runtime monitored data, to construct a dynamic dependency graph between the components of an enterprise environment. The graph is traversed to identify nodes that are the cause of response time related problems.

Keywords

Activity Period Time Stamp Dependency Graph Problem Determination Enterprise Environment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aman, J., Eilert, C.K., Emmes, D., Yocom, P., Dillenberger, D.: Adaptive Algorithms for managing a distributed data processing workload. IBM Systems Journal 36(2) (1997)Google Scholar
  2. 2.
    Systems Management: Application Response Measurement (ARM), Open-Group Technical Standard C807, UK ISBN 1-85912-211-6 (July 1998), http://www.opengroup.org/products/publications/catalog/c807.htm
  3. 3.
    Bagchi, S., Kar, G., Hellerstein, J.L.: Dependency Analysis in Distributed Systems using Fault Injection: Application to Problem Determination in an ecommerce Environment. In: 12th International Workshop on Distributed Systems: Operations & Management (2001)Google Scholar
  4. 4.
    Brown, A., Kar, G., Keller, A.: An Active Approach to Characterizing Dynamic Dependencies for Problem Determination in Distributed Environment. In: International IFIP/IEEE Symposium on Integrated Network Management (2001)Google Scholar
  5. 5.
    Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: Problem Determination in Large, Dynamic Internet Service. In: International Conference on Dependable Systems and Networks (DSN 2002) (June 2002)Google Scholar
  6. 6.
    Choi, J., Choi, M., Lee, S.: An Alarm Correlation and Fault Identification Scheme Based on OSI Managed Object Classes. In: 1999 IEEE International Conference onCommunications, Vancouver, BC, Canada, pp. 1547–1551 (1999)Google Scholar
  7. 7.
  8. 8.
    Ensel, C.: New Approach for Automated Generation of Service Dependency Models. In: Second Latin American Network Operation and Management Symposium, LANOMS (2001)Google Scholar
  9. 9.
    Farrell, J.A., Kreger, H.: Web services management approaches. IBM Systems Journal 41(2) (2002)Google Scholar
  10. 10.
    Gruschke, B.: Integrated Event Management: Event Correlation using Dependency Graphs. In: Proceedings of 9th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 1998) (October 1998)Google Scholar
  11. 11.
    Gupta, M., Neogi, A., Agarwal, M.K., Kar, G.: Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination, IBM Research Report, RI03010 (2003)Google Scholar
  12. 12.
    Hasselmeyer, P.: Managing Dynamic Service Dependencies. In: Proceedings of 12th International Workshop on Distributed Systems: Operations & Management (DSOM) (2001)Google Scholar
  13. 13.
    Hellerstein, J.L., Ma, S.: Mining Event Data for Actionable Patterns. The Computer Measurement Group (2000)Google Scholar
  14. 14.
    Java 2 Platform, Enterprise Edition, http://java.sun.com/j2ee
  15. 15.
    Katchabow, M.J., et al.: Making Distributed Applications Manageable Through Instrumentation. Journal of Systems and Software 45 (1999)Google Scholar
  16. 16.
    Katker, S., Paterok, M.: Fault Isolation and Event Correlation for Integrated Fault Management, Integrated Network Management V. Chapman and Hall, Boca Raton (1997)Google Scholar
  17. 17.
    Keller, A., Kar, G.: Classification and Computation of Dependencies for Distributed Management. In: 5th IEEE Symposium on Computers and Communications (ISCC) (July 2000)Google Scholar
  18. 18.
    Kon, F., Campbell, R.H.: Dependence Management in Component-Based Distributed Systems. IEEE Concurrency 8(1), 26–36 (2000)CrossRefGoogle Scholar
  19. 19.
    Steinder, M., Sethi, A.S.: Multi-layer Fault Localization using Probabilistic Inference in Bipartite Dependency Graphs, Technical Report 2001-02, CIS Dept., Univ. of Delaware (February 2001)Google Scholar
  20. 20.
    Thoenen, D., Riosa, J., Hellerstein, J.L.: Event Relationship Networks: A Framework for Action Oriented Analysis for Event Management. In: Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, Seattle, WA, pp. 593–606. IEEE, New York (2001)Google Scholar
  21. 21.
  22. 22.
    Yemini, S., Kliger, S., et al.: High Speed and Robust Event Correlation. IEEE Communications Magazine 34(5), 82–90 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Manish Gupta
    • 1
  • Anindya Neogi
    • 1
  • Manoj K. Agarwal
    • 1
  • Gautam Kar
    • 2
  1. 1.IBM India Research Lab.New Delhi
  2. 2.IBM Watson Research CenterNew York

Personalised recommendations