Advertisement

A Recovery Technique Using Multi-agent in Distributed Computing Systems

  • Hwa-Min Lee
  • Kwang-Sik Chung
  • Sang-Chul Shin
  • Dae-Won Lee
  • Won-Gyu Lee
  • Heon-Chang Yu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2315)

Abstract

This paper proposes a new approach to rollback-recovery, using multi-agent in distributed computing system. Previous rollback-recovery protocols were dependent on inherent communication and operating system, which cause a decline of computing performance in distributed computing system. By using multi-agent, we propose rollback-recovery protocol which works independently on operating system. We define three kinds of agent. One is a recovery agent that performs rollback-recovery protocol after a failure. Other is an information agent that constructs domain knowledge as a rule of fault tolerance and information during failure-free operation. The other is the facilitator agent that controls the efficient communication between agents. Also we propose rollback-recovery protocol using multi-agent and simulate the proposed roll-back-recovery protocol using JAVA and agent communication language in CORBA environment.

Keywords

Domain Knowledge Distribute Computing System Recovery Technique Information Agent Event List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    B. Bhargava, S. R. Lian: Independent Checkpointing and Concurrent Rollback for Recovery—An Optimistic Approach, In Proceedings of the Symposium on Reliable Distributed Systems (1988) 3–12Google Scholar
  2. 2.
    E. N. Elnozahy, D. B. Johnson, Y. M. Wang,: A Survey of Rollback-Recovery Protocols in Message Passing Systems, CMU Technical Report CMU-CS-99-148 (1999)Google Scholar
  3. 3.
    E. N. Elnozahy: Manetho: Fault tolerance in distributed systems using rollback-recovery and process replication, Ph. D. Thesis, Rice University (1993)Google Scholar
  4. 4.
    Finin T., Fritzson R., Mckay D., McEntire R.: KQML as an agent communication language, Proc. of CIKM’ 94 (1994) 126–130Google Scholar
  5. 5.
    Genesereth M., Fikes R.: Knowledge interchange format version 3.0 reference manual, Technical Report Logic-92-1, Computer Science Department, Stanford University (1992)Google Scholar
  6. 6.
    L. Alvisi: Understanding the message logging paradigm for masking process crashes, Ph.D. Thesis, Department of Computer Science, Cornell University (1996)Google Scholar
  7. 7.
    L. Alvisi, K. Marzullo: Message Logging: Pessimistic, Optimistic, Causal and Optimal, IEEE Trans. on Software Engineering, Vol. 24 (1998) 149–159CrossRefGoogle Scholar
  8. 8.
    L. Lamport: Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, 21 (1978) 558–565zbMATHCrossRefGoogle Scholar
  9. 9.
    R. Koo and S. Toueg: Checkpointing and rollback-recovery for distributed systems, IEEE Trans. on Software Engineering, Vol. SE-13, No. 1 (1987) 23–31CrossRefGoogle Scholar
  10. 10.
    R.D. Schlichting and F.B. Schneider: Fail-stop processors: an approach to designing fault-tolerant distributed computing systems”, ACM Transactions on Computer Systems 1 (1985) 222–238CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Hwa-Min Lee
    • 1
  • Kwang-Sik Chung
    • 2
  • Sang-Chul Shin
    • 1
  • Dae-Won Lee
    • 1
  • Won-Gyu Lee
    • 1
  • Heon-Chang Yu
    • 1
  1. 1.Dept. of Computer Science EducationKorea UniversitySeoulKorea
  2. 2.Dept. of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations