Safe Stochastic Planning: Planning to Avoid Fatal States
- 388 Downloads
Markov decision processes (MDPs) are applied as a standard model in Artificial Intelligence planning. MDPs are used to construct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety constraints expressed as probability of reaching threat states. We introduce a method for finding a value optimal policy satisfying the safety constraint, and report on the validity and effectiveness of our method through a set of experiments.
KeywordsOptimal Policy Goal State Markov Decision Process Reward Function Stochastic Environment
Unable to display preview. Download preview PDF.
- 1.Asimov, I.: Runaround. Astounding Science Fiction (March 1942)Google Scholar
- 2.Weld, D., Etzioni, O.: The first law of robotics (a call to arms). In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-1994), Seattle, Washington. AAAI Press, Menlo Park (1994)Google Scholar
- 3.Etzioni, O.: Intelligence without robots (a reply to brooks). AI Magazine (1993)Google Scholar
- 5.Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
- 8.Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambidge, England (1989)Google Scholar
- 9.Neuneier, R., Mihatsch, O.: Risk sensitive reinforcement learning. In: Proceedings of the 1998 conference on Advances in neural information processing systems II, pp. 1031–1037. MIT Press, Cambridge (1999)Google Scholar
- 10.Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. Technical report, Department of Computer Science and Engineering, Seattle, WA (December 1993)Google Scholar
- 11.Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. In: Hammond, K. (ed.) Proceedings of the Second International Conference on AI Planning Systems, Menlo Park, California, pp. 31–36. American Association for Artificial Intelligence (1994)Google Scholar
- 12.Fulkerson, M.S., Littman, M.L., Keim, G.A.: Speeding safely: Multi-criteria optimization in probabilistic planning. In: AAAI/IAAI, p. 831 (1997)Google Scholar
- 13.Geibel, P.: Reinforcement learning with bounded risk. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 162–169 (2001)Google Scholar
- 14.MathPath. The regula falsi method for square-roots (January 2004), http://www.mathpath.org/Algor/algor.square.root.regula.falsi.htm