Active Diagnosis of High-Level Faults in Distributed Internet Services

  • Huihu Long
  • Lu Cheng
  • Yongguo Zeng
  • Li Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5297)


For fault Diagnosis in internet service, the detection and localization of high-level failure is very important and a real big challenge. The diagnose methods that passively collect information have two drawbacks: 1) requiring the target system to report its inner message; 2) it’s impossible to detect and locate faults before user senses them. This paper proposes an active diagnose method which test internet service with probes and make fault inferences based on the probe results. Probing method is proactive and adaptive with low cost. We evaluate it through applying it to a J2EE application “Pet Store”, compare it with a current passive method Pinpoint, and show that our method outperforms Pinpoint.


Fault Diagnosis High-level Fault Dependency Matrix Extended Dependency Matrix Probe 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, M.Y., Kıcıman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: problem determination in large, dynamic Internet services. In: Intl. Conf. on Dependable Systems and Networks (DSN), pp. 595–604 (2002)Google Scholar
  2. 2.
    Kompella, R.R., Yates, J., Greenberg, A., Snoeren, A.C.: Detection and Localization of Network Black Holes. In: IEEE INFOCOM (2007)Google Scholar
  3. 3.
    Oppenheimer, D., Ganapathi, A., Patterson, D.A.: why do internet services fail and what can be done about it. In: Proceedings of USITS 2003: 4th USENIX Symposium on Internet technologies and Systems, Seattle, WA, USA, March 26–28 (2003)Google Scholar
  4. 4.
    Khanna, G., Laguna, I., Arshad, F.A., Bagchi, S.: Distributed Diagnosis of Failures in a Three Tier E-Commerce System. In: 26th IEEE International Symposium on Reliable Distributed SystemsGoogle Scholar
  5. 5.
    Rish, I., Brodie, M., Ma, S., Odintsova, N., Beygelzimer, A., Grabarnik, G., Hernandez, K.: Adaptive diagnosis in distributed systems. IEEE Transactions on neural networks 16(5) (September 2005)Google Scholar
  6. 6.
  7. 7.
    Brodie, M., Rish, I., Ma, S.: Intelligent probing: A cost-effective approach to fault diagnosis in computer networks. IBM Systems Journal 41(3) (2002)Google Scholar
  8. 8.
    Oppenheimer, D., Patterson, D.A.: Architecture operation and dependability of large-scale Internet services. IEEE Internet Computing (2002)Google Scholar
  9. 9.
    Yemini, A., Kliger, S.: High Speed and Robust Event Correlation. IEEE Communication Magazine 34(5), 82–90 (1996)CrossRefGoogle Scholar
  10. 10.
    Lee, Iyer, R.: Software dependability in the Tandem GUARDIAN system. IEEE Transactions on Software Engineering 21(5) (1995)Google Scholar
  11. 11.
    Brown, I.A., Patterson, D.A.: Embracing Failure: A Case for Recovery-Oriented Computing (ROC). In: 2001 High Performance Transaction Processing Symp., Asilomar, CA (October 2001)Google Scholar
  12. 12.
    Aguilera, M.K., Mogul, J.C., Wiener, J.L., Reynolds, P., Muthitacharoen, A.: Performance debugging for distributed systems of black boxes. In: Proc. of the 19th ACM SOSP, pp. 74–89 (2003)Google Scholar
  13. 13.
    Candea, G., Kıcıman, E., Kawamoto, S., Fox, A.: Autonomous Recovery in Componentized Internet Applications. Cluster Computing Journal 9(1) (February 2006)Google Scholar
  14. 14.
  15. 15.
    Cuppens, F., Miege, A.: Alert correlation in a cooperative intrusion detection framework. In: Proceedings of the 2002 IEEE Symp. on Security and Privacy, May 12-15 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Huihu Long
    • 1
  • Lu Cheng
    • 1
  • Yongguo Zeng
    • 1
  • Li Wu
    • 1
  1. 1.State Key Laboratory of Networking and Switching TechnologyBeijing University of Posts and TelecommunicationsChina

Personalised recommendations