Markov Decision Processes with Functional Rewards

Spanjaard, Olivier; Weng, Paul

doi:10.1007/978-3-642-44949-9_25

Olivier Spanjaard²³ &
Paul Weng²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8271))

Included in the following conference series:

International Workshop on Multi-disciplinary Trends in Artificial Intelligence

925 Accesses
1 Citations

Abstract

Markov decision processes (MDP) have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dean, T., Kaelbling, L., Kirman, J., Nicholson, A.: Planning with deadlines in stochastic domains. In: AAAI, vol. 11, pp. 574–579 (1993)
Google Scholar
Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving markov decision problems. In: UAI, pp. 394–402 (1995)
Google Scholar
Boyan, J., Littman, M.: Exact solutions to time-dependent MDPs. In: NIPS, pp. 1026–1032 (2000)
Google Scholar
Liu, Y., Koenig, S.: Risk-sensitive planning with one-switch utility functions: Value iteration. In: AAAI, pp. 993–999 (2005)
Google Scholar
Regan, K., Boutilier, C.: Regret based reward elicitation for Markov decision processes. In: UAI, pp. 444–451 (2009)
Google Scholar
Serafini, P.: Dynamic programming and minimum risk paths. European Journal of Operational Research 175, 224–237 (2006)
Article MathSciNet MATH Google Scholar
Kreps, D., Porteus, E.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)
Article MathSciNet MATH Google Scholar
Kreps, D., Porteus, E.: Dynamic choice theory and dynamic programming. Econometrica 47, 91–100 (1979)
Article MathSciNet MATH Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121, 49–107 (2000)
Article MathSciNet MATH Google Scholar
Hernstein, I., Milnor, J.: An axiomatic approach to measurable utility. Econometrica 21, 291–297 (1953)
Article MathSciNet Google Scholar
Fishburn, P.: Utility theory for decision making. Wiley (1970)
Google Scholar
Denardo, E., Rothblum, U.: Optimal stopping, exponential utility and linear programming. Mathematical Programming 16, 228–244 (1979)
Article MathSciNet Google Scholar
Liu, Y., Koenig, S.: Functional value iteration for decision-theoretic planning with general utility functions. In: AAAI, pp. 1186–1193 (2006)
Google Scholar
Loui, R.: Optimal paths in graphs with stochastic or multidimensional weights. Communications of the ACM 26, 670–676 (1983)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LIP6, UPMC, France
Olivier Spanjaard & Paul Weng

Authors

Olivier Spanjaard
View author publications
You can also search for this author in PubMed Google Scholar
Paul Weng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Applied Computer Science, University of Winnipeg, R3B 2E9, Winnipeg, Manitoba, Canada
Sheela Ramanna
Department of Mathematics and Computing Science, Saint Mary’s University, B3H 3C3, Halifax, Nova Scotia, Canada
Pawan Lingras
Faculty of Informatics, Mahasarakham University, Khamreang, 44150, Kantarawichai, Mahasarakham, Thailand
Chattrakul Sombattheera
Department of Computing, Curtin University, Perth, WA, Australia
Aneesh Krishna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spanjaard, O., Weng, P. (2013). Markov Decision Processes with Functional Rewards. In: Ramanna, S., Lingras, P., Sombattheera, C., Krishna, A. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2013. Lecture Notes in Computer Science(), vol 8271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44949-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-44949-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44948-2
Online ISBN: 978-3-642-44949-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics