Abstract
We present a new approach for developing robust software applications that breaks dependences on the failed parts of an application’s execution to allow the rest of the application to continue executing. When a failure occurs, the recovery algorithm uses information from a static analysis to characterize the intended behavior of the application had it not failed. It then uses this characterization to recover as much of the application’s execution as possible.
We have implemented this approach in the Bristlecone compiler. We have evaluated our implementation on a multiplayer game, a web portal, and a MapReduce framework. We found that in the presence of injected failures, the recovery task version provided substantially better service than the control versions. Moreover, the recovery task version of the game benchmark successfully recovered from a real fault that we accidentally introduced during development, while the same fault caused the two control versions to crash.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, T., Kerr, R.: Recovery blocks in action: A system supporting high reliability. In: ICSE (1976)
Armstrong, J.: Making Reliable Distributed Systems in the Presence of Software Errors. PhD thesis, Swedish Institute of Computer Science (November 2003)
Avizienis, A.: The methodology of N-version programming (1995)
Candea, G., Fox, A.: Recursive restartability: Turning the reboot sledgehammer into a scalpel. In: HotOS-VIII (2001)
Chandy, K.M., Ramamoorthy, C.: Rollback and recovery strategies. IEEE Transactions on Computers C-21(2), 137–146 (1972)
Cook, W.R., Patwardhan, S., Misra, J.: Workflow patterns in Orc. In: Ciancarini, P., Wiklicky, H. (eds.) COORDINATION 2006. LNCS, vol. 4038, pp. 82–96. Springer, Heidelberg (2006)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)
Demsky, B., Sundaramurthy, S.: Bristlecone: Language support for robust software applications. To Appear in TSE (2010)
Gelernter, D.: Generative communication in Linda. TOPLAS 7(1), 80–112 (1985)
Hewitt, C., Baker, H.G.: Actors and continuous functionals. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA (1978)
Huang, K., Wu, J., Fernandez, E.B.: A generalized forward recovery checkpointing scheme. In: FTPDS (April 1998)
Johnston, W.M., Hanna, J.R.P., Millar, R.J.: Advances in dataflow programming languages. ACM Computing Surveys 36(1) (2004)
Qin, F., Tucek, J., Sundaresan, J., Zhou, Y.: Rx: Treating bugs as allergies—a safe method to survive software failures. In: SOSP (2005)
Smolka, G.: The Oz programming model. In: Orłowska, E., Alferes, J.J., Moniz Pereira, L. (eds.) JELIA 1996. LNCS, vol. 1126, p. 251. Springer, Heidelberg (1996)
Zaeem, R.N., Khurshid, S.: Contract-based data structure repair using alloy. In: D’Hondt, T. (ed.) ECOOP 2010 – Object-Oriented Programming. LNCS, vol. 6183, pp. 577–598. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Demsky, B., Zhou, J., Montaz, W. (2010). Recovery Tasks: An Automated Approach to Failure Recovery. In: Barringer, H., et al. Runtime Verification. RV 2010. Lecture Notes in Computer Science, vol 6418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16612-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-16612-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16611-2
Online ISBN: 978-3-642-16612-9
eBook Packages: Computer ScienceComputer Science (R0)