Skip to main content

Markov Decision Processes with Functional Rewards

  • Conference paper
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8271))

Abstract

Markov decision processes (MDP) have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, T., Kaelbling, L., Kirman, J., Nicholson, A.: Planning with deadlines in stochastic domains. In: AAAI, vol. 11, pp. 574–579 (1993)

    Google Scholar 

  2. Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving markov decision problems. In: UAI, pp. 394–402 (1995)

    Google Scholar 

  3. Boyan, J., Littman, M.: Exact solutions to time-dependent MDPs. In: NIPS, pp. 1026–1032 (2000)

    Google Scholar 

  4. Liu, Y., Koenig, S.: Risk-sensitive planning with one-switch utility functions: Value iteration. In: AAAI, pp. 993–999 (2005)

    Google Scholar 

  5. Regan, K., Boutilier, C.: Regret based reward elicitation for Markov decision processes. In: UAI, pp. 444–451 (2009)

    Google Scholar 

  6. Serafini, P.: Dynamic programming and minimum risk paths. European Journal of Operational Research 175, 224–237 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Kreps, D., Porteus, E.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kreps, D., Porteus, E.: Dynamic choice theory and dynamic programming. Econometrica 47, 91–100 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  9. Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121, 49–107 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Hernstein, I., Milnor, J.: An axiomatic approach to measurable utility. Econometrica 21, 291–297 (1953)

    Article  MathSciNet  Google Scholar 

  11. Fishburn, P.: Utility theory for decision making. Wiley (1970)

    Google Scholar 

  12. Denardo, E., Rothblum, U.: Optimal stopping, exponential utility and linear programming. Mathematical Programming 16, 228–244 (1979)

    Article  MathSciNet  Google Scholar 

  13. Liu, Y., Koenig, S.: Functional value iteration for decision-theoretic planning with general utility functions. In: AAAI, pp. 1186–1193 (2006)

    Google Scholar 

  14. Loui, R.: Optimal paths in graphs with stochastic or multidimensional weights. Communications of the ACM 26, 670–676 (1983)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Spanjaard, O., Weng, P. (2013). Markov Decision Processes with Functional Rewards. In: Ramanna, S., Lingras, P., Sombattheera, C., Krishna, A. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2013. Lecture Notes in Computer Science(), vol 8271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44949-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44949-9_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44948-2

  • Online ISBN: 978-3-642-44949-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics