Abstract
Markov Decision Processes (MDPs) provide a mathematical framework for modelling decision-making of agents acting in stochastic environments, in which transitions probabilities model the environment dynamics and a reward function evaluates the agent’s behaviour. Lately, however, special attention has been brought to the difficulty of modelling precisely the reward function, which has motivated research on MDP with imprecisely specified reward. Some of these works exploit the use of nondominated policies, which are optimal policies for some instantiation of the imprecise reward function. An algorithm that calculates nondominated policies is πWitness, and nondominated policies are used to take decision under the minimax regret evaluation. An interesting matter would be defining a small subset of nondominated policies so that the minimax regret can be calculated faster, but accurately. We modified πWitness to do so. We also present the πHull algorithm to calculate nondominated policies adopting a geometric approach. Under the assumption that reward functions are linearly defined on a set of features, we show empirically that πHull can be faster than our modified version of πWitness.
This work was conducted under project LogProb (FAPESP proc. 2008/03995-5). Valdinei F. Silva thanks FAPESP (proc. 09/14650-1) and Anna H. R. Costa thanks CNPq (proc. 305512/2008-0).
Chapter PDF
Similar content being viewed by others
References
Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22, 469–483 (1996), http://doi.acm.org/10.1145/235815.235821
Bertsekas, D.P.: Dynamic Programming - Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)
Boutilier, C., Patrascu, R., Poupart, P., Schuurmans, D.: Constraint-based optimization and utility elicitation using the minimax decision criterion. Artificial Intelligence 170(8), 686–713 (2006)
Braziunas, D., Boutilier, C.: Elicitation of factored utilities. AI Magazine 29(4), 79–92 (2008)
Buchta, C., Muller, J., Tichy, R.F.: Stochastical approximation of convex bodies. Mathematische Annalen 271, 225–235 (1985), http://dx.doi.org/10.1007/BF01455988 , doi:10.1007/BF01455988
Chajewska, U., Koller, D., Parr, R.: Making rational decisions using adaptive utility elicitation. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 363–369. AAAI Press / The MIT Press, Austin, Texas (2000)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49(2/3), 291–323 (2002)
Patrascu, R., Boutilier, C., Das, R., Kephart, J.O., Tesauro, G., Walsh, W.E.: New approaches to optimization and utility elicitation in autonomic computing. In: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, pp. 140–145. AAAI Press / The MIT Press, Pittsburgh, Pennsylvania, USA (2005)
Regan, K., Boutilier, C.: Regret-based reward elicitation for markov decision processes. In: UAI 2009: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 444–451. AUAI Press, Arlington (2009)
Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain mdps using nondominated policies. In: Fox, M., Poole, D. (eds.) AAAI, AAAI Press, Menlo Park (2010)
White III, C.C., Eldeib, H.K.: Markov decision processes with imprecise transition probabilities. Operations Research 42(4), 739–749 (1994)
Xu, H., Mannor, S.: Parametric regret in uncertain markov decision processes. In: 48th IEEE Conference on Decision and Control, CDC 2009 (2009)
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research 19, 399–468 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Freire da Silva, V., Reali Costa, A.H. (2011). A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-23780-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)