Problem Determination Using Dependency Graphs and Run-Time Behavior Models

  • Manoj K. Agarwal
  • Karen Appleby
  • Manish Gupta
  • Gautam Kar
  • Anindya Neogi
  • Anca Sailer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3278)


Key challenges in managing an I/T environment for e-business lie in the area of root cause analysis, proactive problem prediction, and automated problem remediation. Our approach as reported in this paper, utilizes two important concepts: dependency graphs and dynamic runtime performance characteristics of resources that comprise an I/T environment to design algorithms for rapid root cause identification in case of problems. In the event of a reported problem, our approach uses the dependency information and the behavior models to narrow down the root cause to a small set of resources that can be individually tested, thus facilitating quick remediation and thus leading to reduced administrative costs.


Dependency Graph Average Response Time Problem Determination Fault Injection Dynamic Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gillen, A., Kusnetzky, M.S.: The role of linux in reducing cost of enterprise computing, IDC white paper (January 2002)Google Scholar
  2. 2.
    TPCW: Wisconsin University,
  3. 3.
    ARM: Application Response Measurement,
  4. 4.
    Gupta, M., Neogi, A., Agarwal, M., Kar, G.: Discovering dynamic dependencies in enterprise environments for problem determination. In: Proceedings of 14th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (October 2003)Google Scholar
  5. 5.
    Agarwal, M.K., Gupta, M., Kar, G., Neogi, A., Sailer, A.: Mining activity data for dynamic dependency discovery in e-business systems. Under review for eTransactions on Network and Service Management (eTNSM) Journal (Fall 2004)Google Scholar
  6. 6.
    CIM: Common Information Model,
  7. 7.
    Boutaba, R., Xiao, J.: network management: state of the art, IFIP World Computer Cogress2002,
  8. 8.
    Steinder, M., Sethi, A.S.: The present and future of event correlation: A need for end-toend service fault localization. In: Proc. SCI-2001, 5th World Multiconference on Systemics, Cybernetics, and Informatics, Orlando, FL, July 2001, pp. 124–129 (2001)Google Scholar
  9. 9.
    Ding, Y., Thornley, C., Newman, K.: On correlating performance metrics, CMG (2001)Google Scholar
  10. 10.
    Thadhani, J.: Interactive User Productivity. IBM System Journal 20, 407–423 (1981)CrossRefGoogle Scholar
  11. 11.
    Ogata, K.: Modern control engineering, 3rd edn. Prentice-Hall, Englewood Cliffs (1997)zbMATHGoogle Scholar
  12. 12.
    Menascé, D.A., Barbara, D., Dodge, R.: Preserving QoS of e-commerce sites through selftuning: a performance model approach. In: Proceedings of the 3rd ACM conference on Electronic Commerce (2001)Google Scholar
  13. 13.
    Parekh, S., Gandhi, N., Hellerstein, J.L., Tilbury, D., Jayram, T.S., Bigus, J.: Using control theory to achieve service level objectives in performance management (2003)Google Scholar
  14. 14.
    Diao, Y., Hellerstein, J.L., Parekh, S., Bigus, J.P.: Managing web server performance with autoTune agents. IBM Systems Journal (2003)Google Scholar
  15. 15.
    Diao, Y., Eskesen, F., Froehlich, S., Hellerstein, J.L., Spainhower, L.F., Surendra, M.: Generic online optimization of multiple configuration parameters with application to a database server. In: Brunner, M., Keller, A. (eds.) DSOM 2003. LNCS, vol. 2867, pp. 3–15. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Brodie, M., Rish, I., Ma, S., Odintsova, N.: Active probing strategies for problem diagnosis in distributed systems. In: Proceedings of IJCAI 2003 (2003)Google Scholar
  17. 17.
    Bagchi, S., Kar, G., Hellerstein, J.L.: Dependency analysis in distributed systems using fault injection: application to problem determination in an e-commerce environment. In: DSOM 2001 (2001)Google Scholar
  18. 18.
    Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)zbMATHGoogle Scholar
  19. 19.
    Appleby, K., Goldszmidt, G., Steinder, M., Yemanja: A layered fault localization system for multi-domain computing utilities. In: IM 2001 (2001)Google Scholar
  20. 20.
    Agarwal, M., Appleby, K., Gupta, M., Kar, G., Neogi, A., Sailer, A.: Problem determination and prediction using dependency graphs and run-time behavior models, IBM Research Report, RI04004Google Scholar
  21. 21.
    Chen, M.Y., Kıcıman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: PD in large, dynamic internet services. In: International Conference on Dependable Systems and Networks, DSN 2002 (2002)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2004

Authors and Affiliations

  • Manoj K. Agarwal
    • 1
  • Karen Appleby
    • 2
  • Manish Gupta
    • 1
  • Gautam Kar
    • 2
  • Anindya Neogi
    • 1
  • Anca Sailer
    • 2
  1. 1.IBM India Research LaboratoryNew DelhiIndia
  2. 2.IBM T.J. Watson Research CenterHawthorneUSA

Personalised recommendations