Skip to main content

On Modeling Risk in Markov Decision Processes

  • Chapter
Optimization and Related Topics

Part of the book series: Applied Optimization ((APOP,volume 47))

Abstract

Markov decision processes are solved recursively, using the Bellman optimality principle,

$$V(s,t): = \mathop {\max }\limits_{a \in A(s)} \left\{ {r(s,a) + \alpha \sum\limits_{j \in S} {{p_{s,j}}(a)} V(j,t + 1)} \right\}$$
(A)

where V(s, t) is the optimal value of state s at stage t, r(s, a) is the instantaneous profit from action a at state s, S is the state space, A(s) the set of feasible actions at state s and p i,j (a) the transition probabilities from i to j. This solution maximizes the expected value of the discounted sum of future profits (the right side of (A)), and assumes risk neutrality, i.e. the decision maker is indifferent between a random variable and its expected value.

We propose an alternative solution, with explicit modeling of risk, using the recursion

$$V(s,t): = \mathop {\max }\limits_{a \in A(s)} \left\{ {r(s,a) + \alpha {S_\beta }(V(Z(s,a),t + 1))} \right\} $$
(B)

where Z(s, a) is the next state, S β is the quadratic certainty equivalent

$${S_\beta }(X): = EX - \frac{\beta }{2}VarX$$

and β is a parameter modeling the attitude of the decision maker towards risk: β > 0 if risk-averse, β < 0 if risk seeking and β = 0 if risk-neutral (in which case (B) reduces to (A)).

We apply our model to solve two problems of maintenance and inventory and compare with the classical solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baykal-Gürsoy, M. and Ross, K.W. (1992), Variability sensitive Markov decision processes, Math, Oper. Res., Vol. 17, 558–571.

    MATH  Google Scholar 

  • Ben-Israel, A. and Ben-Tal, A. (1997), Duality and equilibrium prices in economics of uncertainty, Math. Meth. of Oper. Res., Vol. 46, 51–85.

    Article  MathSciNet  MATH  Google Scholar 

  • Ben-Tal, A. (1985), The entropic penalty approach to stochastic programming, Math. Oper. Res., Vol. 10, 263–279.

    Article  MathSciNet  MATH  Google Scholar 

  • Ben-Tal, A. and Ben-Israel, A. (1991), A recourse certainty equivalent for decisions under uncertainty, Annals of Oper. Res., Vol. 30, 3–44.

    Article  MathSciNet  MATH  Google Scholar 

  • Bouakiz, M. and Sobel, M.J. (1992), Inventory control with an exponential utility criterion, Oper. Res., Vol. 40, 603–608.

    Article  MathSciNet  MATH  Google Scholar 

  • Denardo, E.V. (1982), Dynamic Programming: Models and Applications, Prentice -Hall, Englwood Cliffs, New Jersey.

    Google Scholar 

  • Filar, J., Kallenberg, L.C.M. and Lee, H.M. (1989), Variance-penalized Markov decision processes, Math. Oper. Res., Vol. 14, 147–161.

    Article  MathSciNet  MATH  Google Scholar 

  • Filar, J. and Vrieze, K. (1997), Competitive Markov Decision Processes, Springer Verlag, New York.

    MATH  Google Scholar 

  • Huang, Y. and Kallenberg, L.C.M. (1994), On finding optimal solutions for Markov decision chains: a unifying framework for mean-variance tradeoffs, Math. Oper. Res., Vol. 19, 434–448.

    Article  MathSciNet  MATH  Google Scholar 

  • Köchel, P. (1985), A note on “Myopic solutions of Markov decision processes and stochastic games” by M. J. Sobel, Oper. Res., Vol. 33, 1394–1398.

    Article  MATH  Google Scholar 

  • Monahan, G.E. and Sobel, M.J. (1997), Risk-sensitive dynamic market share attraction games, Games Econ. Behay., Vol. 20, 149–160.

    Article  MathSciNet  MATH  Google Scholar 

  • Sobel, M.J. (1981), Myopic solutions of Markov decision processes and stochastic games, Oper. Res., Vol. 29, 996–1009. See also Köchel (1985).

    Google Scholar 

  • Sobel, M.J. (1990), Higher-order and average reward myopic-affine dynamic nodels, Math. Oper. Res., Vol. 15, 299–310.

    MathSciNet  MATH  Google Scholar 

  • White, D.J. (1988), Mean, variance, and probabilistic criteria in finite Markov decision processes: a review, J. Optimiz. Theory Appl., Vol. 56, 1–29.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Levitt, S., Ben-Israel, A. (2001). On Modeling Risk in Markov Decision Processes. In: Rubinov, A., Glover, B. (eds) Optimization and Related Topics. Applied Optimization, vol 47. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6099-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-6099-6_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4844-1

  • Online ISBN: 978-1-4757-6099-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics