Abstract
Markov decision processes are solved recursively, using the Bellman optimality principle,
where V(s, t) is the optimal value of state s at stage t, r(s, a) is the instantaneous profit from action a at state s, S is the state space, A(s) the set of feasible actions at state s and p i,j (a) the transition probabilities from i to j. This solution maximizes the expected value of the discounted sum of future profits (the right side of (A)), and assumes risk neutrality, i.e. the decision maker is indifferent between a random variable and its expected value.
We propose an alternative solution, with explicit modeling of risk, using the recursion
where Z(s, a) is the next state, S β is the quadratic certainty equivalent
and β is a parameter modeling the attitude of the decision maker towards risk: β > 0 if risk-averse, β < 0 if risk seeking and β = 0 if risk-neutral (in which case (B) reduces to (A)).
We apply our model to solve two problems of maintenance and inventory and compare with the classical solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baykal-Gürsoy, M. and Ross, K.W. (1992), Variability sensitive Markov decision processes, Math, Oper. Res., Vol. 17, 558–571.
Ben-Israel, A. and Ben-Tal, A. (1997), Duality and equilibrium prices in economics of uncertainty, Math. Meth. of Oper. Res., Vol. 46, 51–85.
Ben-Tal, A. (1985), The entropic penalty approach to stochastic programming, Math. Oper. Res., Vol. 10, 263–279.
Ben-Tal, A. and Ben-Israel, A. (1991), A recourse certainty equivalent for decisions under uncertainty, Annals of Oper. Res., Vol. 30, 3–44.
Bouakiz, M. and Sobel, M.J. (1992), Inventory control with an exponential utility criterion, Oper. Res., Vol. 40, 603–608.
Denardo, E.V. (1982), Dynamic Programming: Models and Applications, Prentice -Hall, Englwood Cliffs, New Jersey.
Filar, J., Kallenberg, L.C.M. and Lee, H.M. (1989), Variance-penalized Markov decision processes, Math. Oper. Res., Vol. 14, 147–161.
Filar, J. and Vrieze, K. (1997), Competitive Markov Decision Processes, Springer Verlag, New York.
Huang, Y. and Kallenberg, L.C.M. (1994), On finding optimal solutions for Markov decision chains: a unifying framework for mean-variance tradeoffs, Math. Oper. Res., Vol. 19, 434–448.
Köchel, P. (1985), A note on “Myopic solutions of Markov decision processes and stochastic games” by M. J. Sobel, Oper. Res., Vol. 33, 1394–1398.
Monahan, G.E. and Sobel, M.J. (1997), Risk-sensitive dynamic market share attraction games, Games Econ. Behay., Vol. 20, 149–160.
Sobel, M.J. (1981), Myopic solutions of Markov decision processes and stochastic games, Oper. Res., Vol. 29, 996–1009. See also Köchel (1985).
Sobel, M.J. (1990), Higher-order and average reward myopic-affine dynamic nodels, Math. Oper. Res., Vol. 15, 299–310.
White, D.J. (1988), Mean, variance, and probabilistic criteria in finite Markov decision processes: a review, J. Optimiz. Theory Appl., Vol. 56, 1–29.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Levitt, S., Ben-Israel, A. (2001). On Modeling Risk in Markov Decision Processes. In: Rubinov, A., Glover, B. (eds) Optimization and Related Topics. Applied Optimization, vol 47. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6099-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4757-6099-6_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4844-1
Online ISBN: 978-1-4757-6099-6
eBook Packages: Springer Book Archive