Abstract
In this introductory section we consider Blackwell optimality in Controlled Markov Processes (CMPs) with finite state and action spaces; for brevity, we call them finite models. We introduce the basic definitions, the Laurent-expansion technique, the lexicographical policy improvement, and the Blackwell optimality equation, which were developed at the early stage of the study of sensitive criteria in CMPs. We also mention some extensions and generalizations obtained afterwards for the case of a finite state space. In Chapter 2 the algorithmic approach to Blackwell optimality for finite models is given. We refer to that chapter for computational methods. Especially for the linear programming method, which we do not introduce.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Altman, A. Hordijk and L.C.M. Kallenberg, “On the value function in constrained control of Markov chains”, Mathematical Methods of Operations Research 44, 387–399, 1996.
E.I. Balder, “On compactness of the space of policies in stochastic dynamic programming”, Stochastic Processes and Applications 32, 141–150, 1989.
D. Blackwell, “Discrete dynamic programming”, Annals of Mathematical Statistics 33, 719–726, 1962.
R. Cavazos-Cadena and J.B. Lasserre, “Strong 1-optimal stationary policies in denumerable Markov decision processes”, Systems and Control Letters 11, 65–71, 1988.
R. Cavazos-Cadena and J.B. Lasserre, “A direct approach to Blackwell optimality”, MORFISMOS 3, no. 1, 9–33, 1999.
R.Ya. Chitashvili, “A controlled finite Markov chain with an arbitrary set of decisions”, Theory Prob. Appl. 20, 839–846, 1975.
R.Ya. Chitashvili, “A finite controlled Markov chain with small termination probability”, Theory Prob. Appl. 21, 158–163, 1976.
R. Dekker and A. Hordijk, “Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards”, Mathematics of Operations Research 13, 395–421, 1988.
R. Dekker and A. Hordijk. “Denumerable semi-Markov decision chains with small interest rates”, Annals Oper. Res. 28, 185–212, 1991.
R. Dekker and A. Hordijk, “Recurrence conditions for average and Black-well optimality in denumerable state Markov decision chains”, Mathematics of Operations Research 17, 271–289, 1992.
R. Dekker, A. Hordijk and F.M. Spieksma, “On the relation between recurrence and ergodicity properties in denumerable Markov decision chains”, Mathematics of Operations Research 19, 539–559, 1994.
E.V. Denardo, “Markov renewal programming with small interest rates”, Annals Math. Stat. 42, 477–496, 1971.
A. Federgruen, A. Hordijk and H.C. Tijms, “A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices”, Journal of Applied Probability 15, 842–847, 1978.
A. Federgruen, A. Hordijk and H.C. Tijms, “Recurrence conditions in denumerable state Markov decision processes”, in Dynamic Programming and Its Applications, ed. by M.L. Puterman, 3–22, Academic Press, 1978.
J. Flynn, “Averaging vs. discounting in dynamic programming: a counterexample”, Annals of Statistics 2, 411–413, 1974.
O. Hernández-Lerma, R. Montes-de-Oca and R. Cavazos-Cadena, “Recurrence conditions for Markov decision processes with Borel state space: a survey”, Annals of Operations Research 28, 29–46, 1991.
A. Hordijk, Dynamic Programming and Markov Potential Theory, Mathematical Centre Tract 51, Mathematisch Centrum, 1974.
A. Hordijk, “Regenerative Markov decision models”, in Mathematical Programming Study, 6, ed. by R.J.B. Wets, North Holland, 1976, 49–72.
A. Hordijk, R. Dekker and L.C.M. Kallenberg, “Sensitivity-analysis in discounted Markovian decision problems”, Operations Research Spektrum 7, 143–151, 1985.
A. Hordijk, O. Passchier and F.M. Spieksma, “On the existence of the Puisseux expansion of the discounted rewards: a counterexample”, Probability in the Engineering and Informational Sciences 13, 229–235, 1999.
A. Hordijk and K. Sladký, “Sensitive optimality criteria in countable state dynamic programming”, Math. of Oper. Res. 2, 1–14, 1977.
A. Hordijk and F.M. Spieksma, “Are limits of α-discounted optimal policies Blackwell optimal? A counterexample”, Systems and Control Letters, 13, 31–41, 1989.
A. Hordijk and F.M. Spieksma, “On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network”, Advances in Applied Probability 24, 343–376, 1992.
A. Hordijk, F.M. Spieksma and R.L. Tweedie, “Uniform stability conditions for general space Markov decision processes”, Technical report, Leiden University and Colorado State University, 1995.
A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards”, Math. Methods Oper. Res. 49, 1–39, 1999.
A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards”, Mathematical Methods of Operations Research 50, 421–428, 1999.
R.A. Howard, Dynamic Programming and Markov Processes, Wiley, 1960.
Y. Huang and A.F. Veinott Jr., “Markov branching decision chains with interest-rate-dependent rewards, Probability in the Engineering and Informational Sciences 9, 99–121, 1995.
J.G. Kemeny and J.L. Snell, Finite Markov chains, Van Nostrand-Reinhold, 1960.
J.B. Lasserre, “Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes”, Journal of Mathematical Analysis and Applications 136, 479–490, 1988.
A. Maitra, “Dynamic programming for countable state systems”, Sankhya Ser. A 27, 241–248, 1965.
S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability, Springer, 1993.
B.L. Miller and A.F. Veinott, “Discrete dynamic programming with a small interest rate”, Annals of Mathematical Statistics 40, 366–370, 1969.
M.L. Puterman, Markov Decision Processes, Wiley, 1994.
S.M. Ross, “Non-discounted denumerable Markovian decision models”, Annals of Mathematical Statistics 39, 412–423, 1968.
M. Schäl, “On dynamic programming: Compactness of the space of policies”, Stochastic Processes and Applications 3, 345–354, 1975.
M. Schäl, “On dynamic programming and statistical decision theory”, Annals of Statistics 7, 432–445, 1979.
K. Sladký, “On the set of optimal controls for Markov chains with rewards”, Kybernetika (Prague) 10, 350–367, 1974.
F.M. Spieksma, “Geometrically ergodic Markov chains and the optimal control of queues”, Ph.D. Thesis, University of Leiden, 1990.
L.C. Thomas, “Connectedness conditions for denumerable state Markov decision processes”, in Recent Developments in Markov Decision Processes, ed. by R. Hartley, L. Thomas, D. White, Academic Press, 1980, 181–204.
H.C. Tijms, “Average reward optimality equation in Markov decision processes with a general state space”, in Probability, Statistics and Optimization: a Tribute to Peter Whittle, ed. by F.P. Kelly, Wiley, 1994, 485–495.
A.F. Veinott Jr., “On finding optimal policies in discrete dynamic programming with no discounting”, Annals of Mathematical Statistics 37, 1284–1294, 1966.
A.F. Veinott Jr., “Discrete dynamic programming with sensitive optimality criteria”, Annals of Mathematical Statistics 40, 1635–1660, 1969.
A.F. Veinott Jr., Dynamic Programming and Stochastic Control, Unpublished class notes.
A.F. Veinott Jr., “Markov decision chains”, Studies in Optimization, G.B. Dantzig and B.C. Eaves editors, American Mathematical Association, Providence RI 1974, 124–159.
H.M. Wagner, “On optimality of pure strategies”, Management Science 6, 268–269, 1960.
J. Warga, Optimal Control of Differential and Functional Equations, Academic Press, 1972.
K. Yosida, Functional Analysis, Springer, 1980.
A.A. Yushkevich, “Blackwell optimal policies in a Markov decision process with a Borel state space”, Mathematical Methods of Operations Research 40, 253–288, 1994.
A.A. Yushkevich, “Strong 0-discount optimal policies in a Markov decision process with a Borel state space”, Mathematical Methods of Operations Research 42, 93–108, 1995.
A.A. Yushkevich, “A note on asymptotics of discounted value function and strong 0-discount optimality”, Mathematical Methods of Operations Research 44, 223–231, 1996.
A.A. Yushkevich, “Blackwell optimal policies in countable dynamic programming without aperiodicity assumptions”, in Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, ed. by T.S. Ferguson, L.S. Shapley and J.B. MacQueen, Inst, of Math. Stat., 1996, 401–407.
A.A. Yushkevich, “Blackwell optimality in Markov decision processes with a Borel state space”, Proceedings of 36th IEEE Conference on Decision and Control 3, 2827–2830, 1997.
A.A. Yushkevich, “The compactness of a policy space in dynamic programming via an extension theorem for Carathéodory functions”, Mathematics of Operations Research 22, 458–467, 1997.
A.A. Yushkevich, “Blackwell optimality in Borelian continuous-in-action Markov decision processes”, SIAM Journal on Control and Optimization 35, 2157–2182, 1997.
A.A. Yushkevich and R.Ya. Chitashvili, “Controlled random sequences and Markov chains”, Russian Mathematical Surveys 37, 239–274, 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Hordijk, A., Yushkevich, A.A. (2002). Blackwell Optimality. In: Feinberg, E.A., Shwartz, A. (eds) Handbook of Markov Decision Processes. International Series in Operations Research & Management Science, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0805-2_8
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0805-2_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5248-8
Online ISBN: 978-1-4615-0805-2
eBook Packages: Springer Book Archive