Network-Wide Rollback Scheme for Fast Recovery from Operator Errors Toward Dependable Network
Network failures may have a major impact on our society. There are many possible causes of network failures, of which the most significant is operator errors. Consequently, the development of new network management schemes to tackle operator errors is important. We have already proposed a basic idea of a new network-wide rollback scheme to tackle operator errors. In the proposed scheme, we introduce a server to manage historical versions of sets of device configuration. An operator rolls back a set of device configuration via the server when the operator detects a network failure. In this paper, we present a detail of the network-wide rollback scheme. In addition, we provide three rollback procedures, and implement a prototype system to evaluate their rollback time. The proposed scheme will serve for fast recovery from operator errors, as the minimum rollback time is about 41 seconds, when 50 routers are rolled back.
KeywordsConfiguration Management Operator Error Dependable Network
Unable to display preview. Download preview PDF.
- 1.Patterson, D., et al.: Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Computer Science Technical Report UCB//CSD-02-1175, U.C.Berkeley (2002)Google Scholar
- 2.Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do Internet services fail, and what can be done about it? In: Proc. of USITS 2003 (2003)Google Scholar
- 3.International Electrotechnical Vocabulary Chapter 191 (May 2008), http://www.electropedia.org/iev/iev.nsf/Welcome?OpenForm
- 5.Chen, M., Kiciman, E., Fratkin, E., Fox, A., Brewew, E.: Pinpoint: Problem Determination in Large, Dynamic Internet Services. In: Proc. IPDS Track 2002, pp. 595–604 (2002)Google Scholar
- 7.Brown, A.B., Patterson, D.A.: Undo for Operators: Building an Undoable E-mail Store. In: Proc. of USENIX 2003, pp. 1–14 (2003)Google Scholar
- 9.Cisco Systems Inc.: Cisco IOS Configuration Fundamentals. MacMillan Technical Publishing (1997)Google Scholar
- 10.CAIDA (May 2008), http://www.caida.org
- 11.Xen (May 2008), http://xen.org/
- 12.Quagga Routing Suite (May 2008), http://www.quagga.net/
- 13.Labovitz, C., Ahuja, A., Jahanian, F.: Experimental Study of Internet Stability and Backbone Failures. In: The 29th International Symposium on Fault-Tolerant Computing, pp. 278–285 (1999)Google Scholar