Skip to main content

Recovery Tasks: An Automated Approach to Failure Recovery

  • Conference paper
Runtime Verification (RV 2010)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6418))

Included in the following conference series:

  • 1546 Accesses

Abstract

We present a new approach for developing robust software applications that breaks dependences on the failed parts of an application’s execution to allow the rest of the application to continue executing. When a failure occurs, the recovery algorithm uses information from a static analysis to characterize the intended behavior of the application had it not failed. It then uses this characterization to recover as much of the application’s execution as possible.

We have implemented this approach in the Bristlecone compiler. We have evaluated our implementation on a multiplayer game, a web portal, and a MapReduce framework. We found that in the presence of injected failures, the recovery task version provided substantially better service than the control versions. Moreover, the recovery task version of the game benchmark successfully recovered from a real fault that we accidentally introduced during development, while the same fault caused the two control versions to crash.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, T., Kerr, R.: Recovery blocks in action: A system supporting high reliability. In: ICSE (1976)

    Google Scholar 

  2. Armstrong, J.: Making Reliable Distributed Systems in the Presence of Software Errors. PhD thesis, Swedish Institute of Computer Science (November 2003)

    Google Scholar 

  3. Avizienis, A.: The methodology of N-version programming (1995)

    Google Scholar 

  4. Candea, G., Fox, A.: Recursive restartability: Turning the reboot sledgehammer into a scalpel. In: HotOS-VIII (2001)

    Google Scholar 

  5. Chandy, K.M., Ramamoorthy, C.: Rollback and recovery strategies. IEEE Transactions on Computers C-21(2), 137–146 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cook, W.R., Patwardhan, S., Misra, J.: Workflow patterns in Orc. In: Ciancarini, P., Wiklicky, H. (eds.) COORDINATION 2006. LNCS, vol. 4038, pp. 82–96. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)

    Google Scholar 

  8. Demsky, B., Sundaramurthy, S.: Bristlecone: Language support for robust software applications. To Appear in TSE (2010)

    Google Scholar 

  9. Gelernter, D.: Generative communication in Linda. TOPLAS 7(1), 80–112 (1985)

    Article  MATH  Google Scholar 

  10. Hewitt, C., Baker, H.G.: Actors and continuous functionals. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA (1978)

    Google Scholar 

  11. Huang, K., Wu, J., Fernandez, E.B.: A generalized forward recovery checkpointing scheme. In: FTPDS (April 1998)

    Google Scholar 

  12. Johnston, W.M., Hanna, J.R.P., Millar, R.J.: Advances in dataflow programming languages. ACM Computing Surveys 36(1) (2004)

    Google Scholar 

  13. Qin, F., Tucek, J., Sundaresan, J., Zhou, Y.: Rx: Treating bugs as allergies—a safe method to survive software failures. In: SOSP (2005)

    Google Scholar 

  14. Smolka, G.: The Oz programming model. In: Orłowska, E., Alferes, J.J., Moniz Pereira, L. (eds.) JELIA 1996. LNCS, vol. 1126, p. 251. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  15. Zaeem, R.N., Khurshid, S.: Contract-based data structure repair using alloy. In: D’Hondt, T. (ed.) ECOOP 2010 – Object-Oriented Programming. LNCS, vol. 6183, pp. 577–598. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Demsky, B., Zhou, J., Montaz, W. (2010). Recovery Tasks: An Automated Approach to Failure Recovery. In: Barringer, H., et al. Runtime Verification. RV 2010. Lecture Notes in Computer Science, vol 6418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16612-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16612-9_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16611-2

  • Online ISBN: 978-3-642-16612-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics