Advertisement

Network-Wide Rollback Scheme for Fast Recovery from Operator Errors Toward Dependable Network

  • Daisuke Arai
  • Kiyohito Yoshihara
  • Akira Idoue
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5297)

Abstract

Network failures may have a major impact on our society. There are many possible causes of network failures, of which the most significant is operator errors. Consequently, the development of new network management schemes to tackle operator errors is important. We have already proposed a basic idea of a new network-wide rollback scheme to tackle operator errors. In the proposed scheme, we introduce a server to manage historical versions of sets of device configuration. An operator rolls back a set of device configuration via the server when the operator detects a network failure. In this paper, we present a detail of the network-wide rollback scheme. In addition, we provide three rollback procedures, and implement a prototype system to evaluate their rollback time. The proposed scheme will serve for fast recovery from operator errors, as the minimum rollback time is about 41 seconds, when 50 routers are rolled back.

Keywords

Configuration Management Operator Error Dependable Network 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Patterson, D., et al.: Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Computer Science Technical Report UCB//CSD-02-1175, U.C.Berkeley (2002)Google Scholar
  2. 2.
    Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do Internet services fail, and what can be done about it? In: Proc. of USITS 2003 (2003)Google Scholar
  3. 3.
    International Electrotechnical Vocabulary Chapter 191 (May 2008), http://www.electropedia.org/iev/iev.nsf/Welcome?OpenForm
  4. 4.
    Jeng, M., Sieqel, H.J.: Design and Analysis of Dynamic Redundancy Networks. IEEE Trans. on Computers 37(9), 1019–1029 (1988)CrossRefzbMATHGoogle Scholar
  5. 5.
    Chen, M., Kiciman, E., Fratkin, E., Fox, A., Brewew, E.: Pinpoint: Problem Determination in Large, Dynamic Internet Services. In: Proc. IPDS Track 2002, pp. 595–604 (2002)Google Scholar
  6. 6.
    Kiciman, E., Fox, A.: Detecting Application- Level Failures in Component-based Internet Services. IEEE Transactions on Neural Networks 16(5), 1027–1041 (2005)CrossRefGoogle Scholar
  7. 7.
    Brown, A.B., Patterson, D.A.: Undo for Operators: Building an Undoable E-mail Store. In: Proc. of USENIX 2003, pp. 1–14 (2003)Google Scholar
  8. 8.
    Yoshihara, K., Arai, D., Idoue, A., Horiuchi, H.: Proposal on Network-Wide Rollback Scheme for Fast Recovery from Operator Errors. In: Clemm, A., Granville, L.Z., Stadler, R. (eds.) DSOM 2007. LNCS, vol. 4785, pp. 199–202. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Cisco Systems Inc.: Cisco IOS Configuration Fundamentals. MacMillan Technical Publishing (1997)Google Scholar
  10. 10.
    CAIDA (May 2008), http://www.caida.org
  11. 11.
    Xen (May 2008), http://xen.org/
  12. 12.
    Quagga Routing Suite (May 2008), http://www.quagga.net/
  13. 13.
    Labovitz, C., Ahuja, A., Jahanian, F.: Experimental Study of Internet Stability and Backbone Failures. In: The 29th International Symposium on Fault-Tolerant Computing, pp. 278–285 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Daisuke Arai
    • 1
  • Kiyohito Yoshihara
    • 1
  • Akira Idoue
    • 1
  1. 1.KDDI R&D Laboratories Inc.Japan

Personalised recommendations