Advertisement

A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs

  • Valdinei Freire da Silva
  • Anna Helena Reali Costa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)

Abstract

Markov Decision Processes (MDPs) provide a mathematical framework for modelling decision-making of agents acting in stochastic environments, in which transitions probabilities model the environment dynamics and a reward function evaluates the agent’s behaviour. Lately, however, special attention has been brought to the difficulty of modelling precisely the reward function, which has motivated research on MDP with imprecisely specified reward. Some of these works exploit the use of nondominated policies, which are optimal policies for some instantiation of the imprecise reward function. An algorithm that calculates nondominated policies is πWitness, and nondominated policies are used to take decision under the minimax regret evaluation. An interesting matter would be defining a small subset of nondominated policies so that the minimax regret can be calculated faster, but accurately. We modified πWitness to do so. We also present the πHull algorithm to calculate nondominated policies adopting a geometric approach. Under the assumption that reward functions are linearly defined on a set of features, we show empirically that πHull can be faster than our modified version of πWitness.

Keywords

Imprecise Reward MDP Minimax Regret Preference Elicitation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22, 469–483 (1996), http://doi.acm.org/10.1145/235815.235821 MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bertsekas, D.P.: Dynamic Programming - Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)zbMATHGoogle Scholar
  3. 3.
    Boutilier, C., Patrascu, R., Poupart, P., Schuurmans, D.: Constraint-based optimization and utility elicitation using the minimax decision criterion. Artificial Intelligence 170(8), 686–713 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Braziunas, D., Boutilier, C.: Elicitation of factored utilities. AI Magazine 29(4), 79–92 (2008)CrossRefGoogle Scholar
  5. 5.
    Buchta, C., Muller, J., Tichy, R.F.: Stochastical approximation of convex bodies. Mathematische Annalen 271, 225–235 (1985), http://dx.doi.org/10.1007/BF01455988, doi:10.1007/BF01455988MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Chajewska, U., Koller, D., Parr, R.: Making rational decisions using adaptive utility elicitation. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 363–369. AAAI Press / The MIT Press, Austin, Texas (2000)Google Scholar
  7. 7.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49(2/3), 291–323 (2002)CrossRefzbMATHGoogle Scholar
  9. 9.
    Patrascu, R., Boutilier, C., Das, R., Kephart, J.O., Tesauro, G., Walsh, W.E.: New approaches to optimization and utility elicitation in autonomic computing. In: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, pp. 140–145. AAAI Press / The MIT Press, Pittsburgh, Pennsylvania, USA (2005)Google Scholar
  10. 10.
    Regan, K., Boutilier, C.: Regret-based reward elicitation for markov decision processes. In: UAI 2009: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 444–451. AUAI Press, Arlington (2009)Google Scholar
  11. 11.
    Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain mdps using nondominated policies. In: Fox, M., Poole, D. (eds.) AAAI, AAAI Press, Menlo Park (2010)Google Scholar
  12. 12.
    White III, C.C., Eldeib, H.K.: Markov decision processes with imprecise transition probabilities. Operations Research 42(4), 739–749 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Xu, H., Mannor, S.: Parametric regret in uncertain markov decision processes. In: 48th IEEE Conference on Decision and Control, CDC 2009 (2009)Google Scholar
  14. 14.
    Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research 19, 399–468 (2003)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Valdinei Freire da Silva
    • 1
  • Anna Helena Reali Costa
    • 1
  1. 1.Universidade de São PauloSão PauloBrazil

Personalised recommendations