A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs

Freire da Silva, Valdinei; Reali Costa, Anna Helena

doi:10.1007/978-3-642-23780-5_38

Valdinei Freire da Silva²³ &
Anna Helena Reali Costa²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6911))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2877 Accesses
2 Citations

Abstract

Markov Decision Processes (MDPs) provide a mathematical framework for modelling decision-making of agents acting in stochastic environments, in which transitions probabilities model the environment dynamics and a reward function evaluates the agent’s behaviour. Lately, however, special attention has been brought to the difficulty of modelling precisely the reward function, which has motivated research on MDP with imprecisely specified reward. Some of these works exploit the use of nondominated policies, which are optimal policies for some instantiation of the imprecise reward function. An algorithm that calculates nondominated policies is πWitness, and nondominated policies are used to take decision under the minimax regret evaluation. An interesting matter would be defining a small subset of nondominated policies so that the minimax regret can be calculated faster, but accurately. We modified πWitness to do so. We also present the πHull algorithm to calculate nondominated policies adopting a geometric approach. Under the assumption that reward functions are linearly defined on a set of features, we show empirically that πHull can be faster than our modified version of πWitness.

This work was conducted under project LogProb (FAPESP proc. 2008/03995-5). Valdinei F. Silva thanks FAPESP (proc. 09/14650-1) and Anna H. R. Costa thanks CNPq (proc. 305512/2008-0).

Download to read the full chapter text

Chapter PDF

Markov Decision Processes with Functional Rewards

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Solving Markov Decision Processes via Simulation

Keywords

References

Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22, 469–483 (1996), http://doi.acm.org/10.1145/235815.235821
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming - Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)
MATH Google Scholar
Boutilier, C., Patrascu, R., Poupart, P., Schuurmans, D.: Constraint-based optimization and utility elicitation using the minimax decision criterion. Artificial Intelligence 170(8), 686–713 (2006)
Article MathSciNet MATH Google Scholar
Braziunas, D., Boutilier, C.: Elicitation of factored utilities. AI Magazine 29(4), 79–92 (2008)
Article Google Scholar
Buchta, C., Muller, J., Tichy, R.F.: Stochastical approximation of convex bodies. Mathematische Annalen 271, 225–235 (1985), http://dx.doi.org/10.1007/BF01455988 , doi:10.1007/BF01455988
Article MathSciNet MATH Google Scholar
Chajewska, U., Koller, D., Parr, R.: Making rational decisions using adaptive utility elicitation. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 363–369. AAAI Press / The MIT Press, Austin, Texas (2000)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)
Article MathSciNet MATH Google Scholar
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49(2/3), 291–323 (2002)
Article MATH Google Scholar
Patrascu, R., Boutilier, C., Das, R., Kephart, J.O., Tesauro, G., Walsh, W.E.: New approaches to optimization and utility elicitation in autonomic computing. In: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, pp. 140–145. AAAI Press / The MIT Press, Pittsburgh, Pennsylvania, USA (2005)
Google Scholar
Regan, K., Boutilier, C.: Regret-based reward elicitation for markov decision processes. In: UAI 2009: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 444–451. AUAI Press, Arlington (2009)
Google Scholar
Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain mdps using nondominated policies. In: Fox, M., Poole, D. (eds.) AAAI, AAAI Press, Menlo Park (2010)
Google Scholar
White III, C.C., Eldeib, H.K.: Markov decision processes with imprecise transition probabilities. Operations Research 42(4), 739–749 (1994)
Article MathSciNet MATH Google Scholar
Xu, H., Mannor, S.: Parametric regret in uncertain markov decision processes. In: 48th IEEE Conference on Decision and Control, CDC 2009 (2009)
Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research 19, 399–468 (2003)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universidade de São Paulo, São Paulo, Brazil
Valdinei Freire da Silva & Anna Helena Reali Costa

Authors

Valdinei Freire da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Anna Helena Reali Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Freire da Silva, V., Reali Costa, A.H. (2011). A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-23780-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs

Abstract

Chapter PDF

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Solving Markov Decision Processes via Simulation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs

Abstract

Chapter PDF

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Solving Markov Decision Processes via Simulation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation