Safe Stochastic Planning: Planning to Avoid Fatal States

  • Hao Ren
  • Ali Akhavan Bitaghsir
  • Mike Barley
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4324)


Markov decision processes (MDPs) are applied as a standard model in Artificial Intelligence planning. MDPs are used to construct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety constraints expressed as probability of reaching threat states. We introduce a method for finding a value optimal policy satisfying the safety constraint, and report on the validity and effectiveness of our method through a set of experiments.


Optimal Policy Goal State Markov Decision Process Reward Function Stochastic Environment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asimov, I.: Runaround. Astounding Science Fiction (March 1942)Google Scholar
  2. 2.
    Weld, D., Etzioni, O.: The first law of robotics (a call to arms). In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-1994), Seattle, Washington. AAAI Press, Menlo Park (1994)Google Scholar
  3. 3.
    Etzioni, O.: Intelligence without robots (a reply to brooks). AI Magazine (1993)Google Scholar
  4. 4.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  6. 6.
    Ghallab, M., Nau, D., Traverso, P.: Automated Planning: Theory and Practice, ch. 16, draft edn., pp. 411–433. Morgan Kaufmann Publishers, San Francisco (2003)zbMATHGoogle Scholar
  7. 7.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  8. 8.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambidge, England (1989)Google Scholar
  9. 9.
    Neuneier, R., Mihatsch, O.: Risk sensitive reinforcement learning. In: Proceedings of the 1998 conference on Advances in neural information processing systems II, pp. 1031–1037. MIT Press, Cambridge (1999)Google Scholar
  10. 10.
    Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. Technical report, Department of Computer Science and Engineering, Seattle, WA (December 1993)Google Scholar
  11. 11.
    Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. In: Hammond, K. (ed.) Proceedings of the Second International Conference on AI Planning Systems, Menlo Park, California, pp. 31–36. American Association for Artificial Intelligence (1994)Google Scholar
  12. 12.
    Fulkerson, M.S., Littman, M.L., Keim, G.A.: Speeding safely: Multi-criteria optimization in probabilistic planning. In: AAAI/IAAI, p. 831 (1997)Google Scholar
  13. 13.
    Geibel, P.: Reinforcement learning with bounded risk. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 162–169 (2001)Google Scholar
  14. 14.
    MathPath. The regula falsi method for square-roots (January 2004),

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hao Ren
    • 1
  • Ali Akhavan Bitaghsir
    • 2
  • Mike Barley
    • 1
  1. 1.University of AucklandNew Zealand
  2. 2.University of TorontoCanada

Personalised recommendations