Handbook of Markov Decision Processes pp 231-267 | Cite as

# Blackwell Optimality

## Abstract

In this introductory section we consider Blackwell optimality in Controlled Markov Processes (CMPs) with finite state and action spaces; for brevity, we call them finite models. We introduce the basic definitions, the Laurent-expansion technique, the lexicographical policy improvement, and the Blackwell optimality equation, which were developed at the early stage of the study of sensitive criteria in CMPs. We also mention some extensions and generalizations obtained afterwards for the case of a finite state space. In Chapter 2 the algorithmic approach to Blackwell optimality for finite models is given. We refer to that chapter for computational methods. Especially for the linear programming method, which we do not introduce.

### Keywords

Convolution Dition Kelly Stim## Preview

Unable to display preview. Download preview PDF.

### References

- [1]E. Altman, A. Hordijk and L.C.M. Kallenberg, “On the value function in constrained control of Markov chains”,
*Mathematical Methods of Operations Research***44**, 387–399, 1996.CrossRefGoogle Scholar - [2]E.I. Balder, “On compactness of the space of policies in stochastic dynamic programming”,
*Stochastic Processes and Applications***32**, 141–150, 1989.CrossRefGoogle Scholar - [3]D. Blackwell, “Discrete dynamic programming”,
*Annals of Mathematical Statistics***33**, 719–726, 1962.CrossRefGoogle Scholar - [4]R. Cavazos-Cadena and J.B. Lasserre, “Strong 1-optimal stationary policies in denumerable Markov decision processes”,
*Systems and Control Letters***11**, 65–71, 1988.CrossRefGoogle Scholar - [5]R. Cavazos-Cadena and J.B. Lasserre, “A direct approach to Blackwell optimality”, MORFISMOS
**3**, no. 1, 9–33, 1999.Google Scholar - [6]R.Ya. Chitashvili, “A controlled finite Markov chain with an arbitrary set of decisions”,
*Theory Prob. Appl.***20**, 839–846, 1975.CrossRefGoogle Scholar - [7]R.Ya. Chitashvili, “A finite controlled Markov chain with small termination probability”,
*Theory Prob. Appl.***21**, 158–163, 1976.CrossRefGoogle Scholar - [8]R. Dekker and A. Hordijk, “Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards”,
*Mathematics of Operations Research***13**, 395–421, 1988.CrossRefGoogle Scholar - [9]R. Dekker and A. Hordijk. “Denumerable semi-Markov decision chains with small interest rates”,
*Annals Oper. Res.***28**, 185–212, 1991.CrossRefGoogle Scholar - [10]R. Dekker and A. Hordijk, “Recurrence conditions for average and Black-well optimality in denumerable state Markov decision chains”,
*Mathematics of Operations Research***17**, 271–289, 1992.CrossRefGoogle Scholar - [11]R. Dekker, A. Hordijk and F.M. Spieksma, “On the relation between recurrence and ergodicity properties in denumerable Markov decision chains”,
*Mathematics of Operations Research***19**, 539–559, 1994.CrossRefGoogle Scholar - [12]E.V. Denardo, “Markov renewal programming with small interest rates”,
*Annals Math. Stat.***42**, 477–496, 1971.CrossRefGoogle Scholar - [13]A. Federgruen, A. Hordijk and H.C. Tijms, “A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices”,
*Journal of Applied Probability***15**, 842–847, 1978.CrossRefGoogle Scholar - [14]A. Federgruen, A. Hordijk and H.C. Tijms, “Recurrence conditions in denumerable state Markov decision processes”, in
*Dynamic Programming and Its Applications*, ed. by M.L. Puterman, 3–22, Academic Press, 1978.Google Scholar - [15]J. Flynn, “Averaging vs. discounting in dynamic programming: a counterexample”,
*Annals of Statistics***2**, 411–413, 1974.CrossRefGoogle Scholar - [16]O. Hernández-Lerma, R. Montes-de-Oca and R. Cavazos-Cadena, “Recurrence conditions for Markov decision processes with Borel state space: a survey”,
*Annals of Operations Research***28**, 29–46, 1991.CrossRefGoogle Scholar - [17]A. Hordijk,
*Dynamic Programming and Markov Potential Theory*, Mathematical Centre Tract**51**, Mathematisch Centrum, 1974.Google Scholar - [18]A. Hordijk, “Regenerative Markov decision models”, in
*Mathematical Programming Study*,**6**, ed. by R.J.B. Wets, North Holland, 1976, 49–72.Google Scholar - [19]A. Hordijk, R. Dekker and L.C.M. Kallenberg, “Sensitivity-analysis in discounted Markovian decision problems”,
*Operations Research Spektrum***7**, 143–151, 1985.CrossRefGoogle Scholar - [20]A. Hordijk, O. Passchier and F.M. Spieksma, “On the existence of the Puisseux expansion of the discounted rewards: a counterexample”,
*Probability in the Engineering and Informational Sciences***13**, 229–235, 1999.CrossRefGoogle Scholar - [21]A. Hordijk and K. Sladký, “Sensitive optimality criteria in countable state dynamic programming”,
*Math. of Oper. Res.***2**, 1–14, 1977.CrossRefGoogle Scholar - [22]A. Hordijk and F.M. Spieksma, “Are limits of
*α*-discounted optimal policies Blackwell optimal? A counterexample”,*Systems and Control Letters*,**13**, 31–41, 1989.CrossRefGoogle Scholar - [23]A. Hordijk and F.M. Spieksma, “On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network”,
*Advances in Applied Probability***24**, 343–376, 1992.CrossRefGoogle Scholar - [24]A. Hordijk, F.M. Spieksma and R.L. Tweedie, “Uniform stability conditions for general space Markov decision processes”, Technical report, Leiden University and Colorado State University, 1995.Google Scholar
- [25]A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards”,
*Math. Methods Oper. Res.***49**, 1–39, 1999.Google Scholar - [26]A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards”,
*Mathematical Methods of Operations Research***50**, 421–428, 1999.CrossRefGoogle Scholar - [27]R.A. Howard,
*Dynamic Programming and Markov Processes*, Wiley, 1960.Google Scholar - [28]Y. Huang and A.F. Veinott Jr., “Markov branching decision chains with interest-rate-dependent rewards,
*Probability in the Engineering and Informational Sciences***9**, 99–121, 1995.CrossRefGoogle Scholar - [29]J.G. Kemeny and J.L. Snell,
*Finite Markov chains*, Van Nostrand-Reinhold, 1960.Google Scholar - [30]J.B. Lasserre, “Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes”,
*Journal of Mathematical Analysis and Applications***136**, 479–490, 1988.CrossRefGoogle Scholar - [31]A. Maitra, “Dynamic programming for countable state systems”,
*Sankhya Ser. A***27**, 241–248, 1965.Google Scholar - [32]S.P. Meyn and R.L. Tweedie,
*Markov Chains and Stochastic Stability*, Springer, 1993.CrossRefGoogle Scholar - [33]B.L. Miller and A.F. Veinott, “Discrete dynamic programming with a small interest rate”,
*Annals of Mathematical Statistics***40**, 366–370, 1969.CrossRefGoogle Scholar - [34]M.L. Puterman,
*Markov Decision Processes*, Wiley, 1994.CrossRefGoogle Scholar - [35]S.M. Ross, “Non-discounted denumerable Markovian decision models”,
*Annals of Mathematical Statistics***39**, 412–423, 1968.CrossRefGoogle Scholar - [36]M. Schäl, “On dynamic programming: Compactness of the space of policies”,
*Stochastic Processes and Applications***3**, 345–354, 1975.CrossRefGoogle Scholar - [37]M. Schäl, “On dynamic programming and statistical decision theory”,
*Annals of Statistics***7**, 432–445, 1979.CrossRefGoogle Scholar - [38]K. Sladký, “On the set of optimal controls for Markov chains with rewards”,
*Kybernetika*(Prague)**10**, 350–367, 1974.Google Scholar - [39]F.M. Spieksma, “Geometrically ergodic Markov chains and the optimal control of queues”, Ph.D. Thesis, University of Leiden, 1990.Google Scholar
- [40]L.C. Thomas, “Connectedness conditions for denumerable state Markov decision processes”, in
*Recent Developments in Markov Decision Processes*, ed. by R. Hartley, L. Thomas, D. White, Academic Press, 1980, 181–204.Google Scholar - [41]H.C. Tijms, “Average reward optimality equation in Markov decision processes with a general state space”, in
*Probability, Statistics and Optimization: a Tribute to Peter Whittle*, ed. by F.P. Kelly, Wiley, 1994, 485–495.Google Scholar - [42]A.F. Veinott Jr., “On finding optimal policies in discrete dynamic programming with no discounting”,
*Annals of Mathematical Statistics***37**, 1284–1294, 1966.CrossRefGoogle Scholar - [43]A.F. Veinott Jr., “Discrete dynamic programming with sensitive optimality criteria”,
*Annals of Mathematical Statistics***40**, 1635–1660, 1969.CrossRefGoogle Scholar - [44]A.F. Veinott Jr.,
*Dynamic Programming and Stochastic Control*, Unpublished class notes.Google Scholar - [45]A.F. Veinott Jr., “Markov decision chains”,
*Studies in Optimization*, G.B. Dantzig and B.C. Eaves editors, American Mathematical Association, Providence RI 1974, 124–159.Google Scholar - [46]H.M. Wagner, “On optimality of pure strategies”,
*Management Science***6**, 268–269, 1960.CrossRefGoogle Scholar - [47]J. Warga,
*Optimal Control of Differential and Functional Equations*, Academic Press, 1972.Google Scholar - [48]K. Yosida,
*Functional Analysis*, Springer, 1980.CrossRefGoogle Scholar - [49]A.A. Yushkevich, “Blackwell optimal policies in a Markov decision process with a Borel state space”,
*Mathematical Methods of Operations Research***40**, 253–288, 1994.CrossRefGoogle Scholar - [50]A.A. Yushkevich, “Strong 0-discount optimal policies in a Markov decision process with a Borel state space”,
*Mathematical Methods of Operations Research***42**, 93–108, 1995.CrossRefGoogle Scholar - [51]A.A. Yushkevich, “A note on asymptotics of discounted value function and strong 0-discount optimality”,
*Mathematical Methods of Operations Research***44**, 223–231, 1996.CrossRefGoogle Scholar - [52]A.A. Yushkevich, “Blackwell optimal policies in countable dynamic programming without aperiodicity assumptions”, in
*Statistics, Probability and Game Theory: Papers in Honor of David Blackwell*, ed. by T.S. Ferguson, L.S. Shapley and J.B. MacQueen, Inst, of Math. Stat., 1996, 401–407.CrossRefGoogle Scholar - [53]A.A. Yushkevich, “Blackwell optimality in Markov decision processes with a Borel state space”,
*Proceedings of 36th IEEE Conference on Decision and Control***3**, 2827–2830, 1997.CrossRefGoogle Scholar - [54]A.A. Yushkevich, “The compactness of a policy space in dynamic programming via an extension theorem for Carathéodory functions”,
*Mathematics of Operations Research***22**, 458–467, 1997.CrossRefGoogle Scholar - [55]A.A. Yushkevich, “Blackwell optimality in Borelian continuous-in-action Markov decision processes”,
*SIAM Journal on Control and Optimization***35**, 2157–2182, 1997.CrossRefGoogle Scholar - [56]A.A. Yushkevich and R.Ya. Chitashvili, “Controlled random sequences and Markov chains”,
*Russian Mathematical Surveys***37**, 239–274, 1982.CrossRefGoogle Scholar