Abstract
Markov decision processes (MDP) have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dean, T., Kaelbling, L., Kirman, J., Nicholson, A.: Planning with deadlines in stochastic domains. In: AAAI, vol. 11, pp. 574–579 (1993)
Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving markov decision problems. In: UAI, pp. 394–402 (1995)
Boyan, J., Littman, M.: Exact solutions to time-dependent MDPs. In: NIPS, pp. 1026–1032 (2000)
Liu, Y., Koenig, S.: Risk-sensitive planning with one-switch utility functions: Value iteration. In: AAAI, pp. 993–999 (2005)
Regan, K., Boutilier, C.: Regret based reward elicitation for Markov decision processes. In: UAI, pp. 444–451 (2009)
Serafini, P.: Dynamic programming and minimum risk paths. European Journal of Operational Research 175, 224–237 (2006)
Kreps, D., Porteus, E.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)
Kreps, D., Porteus, E.: Dynamic choice theory and dynamic programming. Econometrica 47, 91–100 (1979)
Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121, 49–107 (2000)
Hernstein, I., Milnor, J.: An axiomatic approach to measurable utility. Econometrica 21, 291–297 (1953)
Fishburn, P.: Utility theory for decision making. Wiley (1970)
Denardo, E., Rothblum, U.: Optimal stopping, exponential utility and linear programming. Mathematical Programming 16, 228–244 (1979)
Liu, Y., Koenig, S.: Functional value iteration for decision-theoretic planning with general utility functions. In: AAAI, pp. 1186–1193 (2006)
Loui, R.: Optimal paths in graphs with stochastic or multidimensional weights. Communications of the ACM 26, 670–676 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spanjaard, O., Weng, P. (2013). Markov Decision Processes with Functional Rewards. In: Ramanna, S., Lingras, P., Sombattheera, C., Krishna, A. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2013. Lecture Notes in Computer Science(), vol 8271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44949-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-44949-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44948-2
Online ISBN: 978-3-642-44949-9
eBook Packages: Computer ScienceComputer Science (R0)