On Modeling Risk in Markov Decision Processes

Levitt, Steve; Ben-Israel, Adi

doi:10.1007/978-1-4757-6099-6_3

Steve Levitt⁴ &
Adi Ben-Israel⁵

Part of the book series: Applied Optimization ((APOP,volume 47))

516 Accesses
4 Citations

Abstract

Markov decision processes are solved recursively, using the Bellman optimality principle,

$$V(s,t): = \mathop {\max }\limits_{a \in A(s)} \left\{ {r(s,a) + \alpha \sum\limits_{j \in S} {{p_{s,j}}(a)} V(j,t + 1)} \right\}$$

(A)

where V(s, t) is the optimal value of state s at stage t, r(s, a) is the instantaneous profit from action a at state s, S is the state space, A(s) the set of feasible actions at state s and p _i,j (a) the transition probabilities from i to j. This solution maximizes the expected value of the discounted sum of future profits (the right side of (A)), and assumes risk neutrality, i.e. the decision maker is indifferent between a random variable and its expected value.

We propose an alternative solution, with explicit modeling of risk, using the recursion

$$V(s,t): = \mathop {\max }\limits_{a \in A(s)} \left\{ {r(s,a) + \alpha {S_\beta }(V(Z(s,a),t + 1))} \right\} $$

(B)

where Z(s, a) is the next state, S_β is the quadratic certainty equivalent

$${S_\beta }(X): = EX - \frac{\beta }{2}VarX$$

and β is a parameter modeling the attitude of the decision maker towards risk: β > 0 if risk-averse, β < 0 if risk seeking and β = 0 if risk-neutral (in which case (B) reduces to (A)).

We apply our model to solve two problems of maintenance and inventory and compare with the classical solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baykal-Gürsoy, M. and Ross, K.W. (1992), Variability sensitive Markov decision processes, Math, Oper. Res., Vol. 17, 558–571.
MATH Google Scholar
Ben-Israel, A. and Ben-Tal, A. (1997), Duality and equilibrium prices in economics of uncertainty, Math. Meth. of Oper. Res., Vol. 46, 51–85.
Article MathSciNet MATH Google Scholar
Ben-Tal, A. (1985), The entropic penalty approach to stochastic programming, Math. Oper. Res., Vol. 10, 263–279.
Article MathSciNet MATH Google Scholar
Ben-Tal, A. and Ben-Israel, A. (1991), A recourse certainty equivalent for decisions under uncertainty, Annals of Oper. Res., Vol. 30, 3–44.
Article MathSciNet MATH Google Scholar
Bouakiz, M. and Sobel, M.J. (1992), Inventory control with an exponential utility criterion, Oper. Res., Vol. 40, 603–608.
Article MathSciNet MATH Google Scholar
Denardo, E.V. (1982), Dynamic Programming: Models and Applications, Prentice -Hall, Englwood Cliffs, New Jersey.
Google Scholar
Filar, J., Kallenberg, L.C.M. and Lee, H.M. (1989), Variance-penalized Markov decision processes, Math. Oper. Res., Vol. 14, 147–161.
Article MathSciNet MATH Google Scholar
Filar, J. and Vrieze, K. (1997), Competitive Markov Decision Processes, Springer Verlag, New York.
MATH Google Scholar
Huang, Y. and Kallenberg, L.C.M. (1994), On finding optimal solutions for Markov decision chains: a unifying framework for mean-variance tradeoffs, Math. Oper. Res., Vol. 19, 434–448.
Article MathSciNet MATH Google Scholar
Köchel, P. (1985), A note on “Myopic solutions of Markov decision processes and stochastic games” by M. J. Sobel, Oper. Res., Vol. 33, 1394–1398.
Article MATH Google Scholar
Monahan, G.E. and Sobel, M.J. (1997), Risk-sensitive dynamic market share attraction games, Games Econ. Behay., Vol. 20, 149–160.
Article MathSciNet MATH Google Scholar
Sobel, M.J. (1981), Myopic solutions of Markov decision processes and stochastic games, Oper. Res., Vol. 29, 996–1009. See also Köchel (1985).
Google Scholar
Sobel, M.J. (1990), Higher-order and average reward myopic-affine dynamic nodels, Math. Oper. Res., Vol. 15, 299–310.
MathSciNet MATH Google Scholar
White, D.J. (1988), Mean, variance, and probabilistic criteria in finite Markov decision processes: a review, J. Optimiz. Theory Appl., Vol. 56, 1–29.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Taro Pharmaceuticals, Hawthorne, NY, 10532, USA
Steve Levitt
RUTCOR—Rutgers Center for Operations Research, Rutgers University, 640 Bartholomew Rd, Piscataway, NJ, 08854-8003, USA
Adi Ben-Israel

Authors

Steve Levitt
View author publications
You can also search for this author in PubMed Google Scholar
Adi Ben-Israel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology & Mathematical Sciences, University of Ballarat, Victoria, Australia
Alexander Rubinov
School of Mathematics and Statistics, Curtin University of Technology, WA, Australia
Barney Glover

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Levitt, S., Ben-Israel, A. (2001). On Modeling Risk in Markov Decision Processes. In: Rubinov, A., Glover, B. (eds) Optimization and Related Topics. Applied Optimization, vol 47. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6099-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-4757-6099-6_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4844-1
Online ISBN: 978-1-4757-6099-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics