Blackwell Optimality

  • Arie Hordijk
  • Alexander A. Yushkevich
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 40)

Abstract

In this introductory section we consider Blackwell optimality in Controlled Markov Processes (CMPs) with finite state and action spaces; for brevity, we call them finite models. We introduce the basic definitions, the Laurent-expansion technique, the lexicographical policy improvement, and the Blackwell optimality equation, which were developed at the early stage of the study of sensitive criteria in CMPs. We also mention some extensions and generalizations obtained afterwards for the case of a finite state space. In Chapter 2 the algorithmic approach to Blackwell optimality for finite models is given. We refer to that chapter for computational methods. Especially for the linear programming method, which we do not introduce.

Keywords

Convolution Dition Kelly Stim 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    E. Altman, A. Hordijk and L.C.M. Kallenberg, “On the value function in constrained control of Markov chains”, Mathematical Methods of Operations Research 44, 387–399, 1996.CrossRefGoogle Scholar
  2. [2]
    E.I. Balder, “On compactness of the space of policies in stochastic dynamic programming”, Stochastic Processes and Applications 32, 141–150, 1989.CrossRefGoogle Scholar
  3. [3]
    D. Blackwell, “Discrete dynamic programming”, Annals of Mathematical Statistics 33, 719–726, 1962.CrossRefGoogle Scholar
  4. [4]
    R. Cavazos-Cadena and J.B. Lasserre, “Strong 1-optimal stationary policies in denumerable Markov decision processes”, Systems and Control Letters 11, 65–71, 1988.CrossRefGoogle Scholar
  5. [5]
    R. Cavazos-Cadena and J.B. Lasserre, “A direct approach to Blackwell optimality”, MORFISMOS 3, no. 1, 9–33, 1999.Google Scholar
  6. [6]
    R.Ya. Chitashvili, “A controlled finite Markov chain with an arbitrary set of decisions”, Theory Prob. Appl. 20, 839–846, 1975.CrossRefGoogle Scholar
  7. [7]
    R.Ya. Chitashvili, “A finite controlled Markov chain with small termination probability”, Theory Prob. Appl. 21, 158–163, 1976.CrossRefGoogle Scholar
  8. [8]
    R. Dekker and A. Hordijk, “Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards”, Mathematics of Operations Research 13, 395–421, 1988.CrossRefGoogle Scholar
  9. [9]
    R. Dekker and A. Hordijk. “Denumerable semi-Markov decision chains with small interest rates”, Annals Oper. Res. 28, 185–212, 1991.CrossRefGoogle Scholar
  10. [10]
    R. Dekker and A. Hordijk, “Recurrence conditions for average and Black-well optimality in denumerable state Markov decision chains”, Mathematics of Operations Research 17, 271–289, 1992.CrossRefGoogle Scholar
  11. [11]
    R. Dekker, A. Hordijk and F.M. Spieksma, “On the relation between recurrence and ergodicity properties in denumerable Markov decision chains”, Mathematics of Operations Research 19, 539–559, 1994.CrossRefGoogle Scholar
  12. [12]
    E.V. Denardo, “Markov renewal programming with small interest rates”, Annals Math. Stat. 42, 477–496, 1971.CrossRefGoogle Scholar
  13. [13]
    A. Federgruen, A. Hordijk and H.C. Tijms, “A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices”, Journal of Applied Probability 15, 842–847, 1978.CrossRefGoogle Scholar
  14. [14]
    A. Federgruen, A. Hordijk and H.C. Tijms, “Recurrence conditions in denumerable state Markov decision processes”, in Dynamic Programming and Its Applications, ed. by M.L. Puterman, 3–22, Academic Press, 1978.Google Scholar
  15. [15]
    J. Flynn, “Averaging vs. discounting in dynamic programming: a counterexample”, Annals of Statistics 2, 411–413, 1974.CrossRefGoogle Scholar
  16. [16]
    O. Hernández-Lerma, R. Montes-de-Oca and R. Cavazos-Cadena, “Recurrence conditions for Markov decision processes with Borel state space: a survey”, Annals of Operations Research 28, 29–46, 1991.CrossRefGoogle Scholar
  17. [17]
    A. Hordijk, Dynamic Programming and Markov Potential Theory, Mathematical Centre Tract 51, Mathematisch Centrum, 1974.Google Scholar
  18. [18]
    A. Hordijk, “Regenerative Markov decision models”, in Mathematical Programming Study, 6, ed. by R.J.B. Wets, North Holland, 1976, 49–72.Google Scholar
  19. [19]
    A. Hordijk, R. Dekker and L.C.M. Kallenberg, “Sensitivity-analysis in discounted Markovian decision problems”, Operations Research Spektrum 7, 143–151, 1985.CrossRefGoogle Scholar
  20. [20]
    A. Hordijk, O. Passchier and F.M. Spieksma, “On the existence of the Puisseux expansion of the discounted rewards: a counterexample”, Probability in the Engineering and Informational Sciences 13, 229–235, 1999.CrossRefGoogle Scholar
  21. [21]
    A. Hordijk and K. Sladký, “Sensitive optimality criteria in countable state dynamic programming”, Math. of Oper. Res. 2, 1–14, 1977.CrossRefGoogle Scholar
  22. [22]
    A. Hordijk and F.M. Spieksma, “Are limits of α-discounted optimal policies Blackwell optimal? A counterexample”, Systems and Control Letters, 13, 31–41, 1989.CrossRefGoogle Scholar
  23. [23]
    A. Hordijk and F.M. Spieksma, “On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network”, Advances in Applied Probability 24, 343–376, 1992.CrossRefGoogle Scholar
  24. [24]
    A. Hordijk, F.M. Spieksma and R.L. Tweedie, “Uniform stability conditions for general space Markov decision processes”, Technical report, Leiden University and Colorado State University, 1995.Google Scholar
  25. [25]
    A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards”, Math. Methods Oper. Res. 49, 1–39, 1999.Google Scholar
  26. [26]
    A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards”, Mathematical Methods of Operations Research 50, 421–428, 1999.CrossRefGoogle Scholar
  27. [27]
    R.A. Howard, Dynamic Programming and Markov Processes, Wiley, 1960.Google Scholar
  28. [28]
    Y. Huang and A.F. Veinott Jr., “Markov branching decision chains with interest-rate-dependent rewards, Probability in the Engineering and Informational Sciences 9, 99–121, 1995.CrossRefGoogle Scholar
  29. [29]
    J.G. Kemeny and J.L. Snell, Finite Markov chains, Van Nostrand-Reinhold, 1960.Google Scholar
  30. [30]
    J.B. Lasserre, “Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes”, Journal of Mathematical Analysis and Applications 136, 479–490, 1988.CrossRefGoogle Scholar
  31. [31]
    A. Maitra, “Dynamic programming for countable state systems”, Sankhya Ser. A 27, 241–248, 1965.Google Scholar
  32. [32]
    S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability, Springer, 1993.CrossRefGoogle Scholar
  33. [33]
    B.L. Miller and A.F. Veinott, “Discrete dynamic programming with a small interest rate”, Annals of Mathematical Statistics 40, 366–370, 1969.CrossRefGoogle Scholar
  34. [34]
    M.L. Puterman, Markov Decision Processes, Wiley, 1994.CrossRefGoogle Scholar
  35. [35]
    S.M. Ross, “Non-discounted denumerable Markovian decision models”, Annals of Mathematical Statistics 39, 412–423, 1968.CrossRefGoogle Scholar
  36. [36]
    M. Schäl, “On dynamic programming: Compactness of the space of policies”, Stochastic Processes and Applications 3, 345–354, 1975.CrossRefGoogle Scholar
  37. [37]
    M. Schäl, “On dynamic programming and statistical decision theory”, Annals of Statistics 7, 432–445, 1979.CrossRefGoogle Scholar
  38. [38]
    K. Sladký, “On the set of optimal controls for Markov chains with rewards”, Kybernetika (Prague) 10, 350–367, 1974.Google Scholar
  39. [39]
    F.M. Spieksma, “Geometrically ergodic Markov chains and the optimal control of queues”, Ph.D. Thesis, University of Leiden, 1990.Google Scholar
  40. [40]
    L.C. Thomas, “Connectedness conditions for denumerable state Markov decision processes”, in Recent Developments in Markov Decision Processes, ed. by R. Hartley, L. Thomas, D. White, Academic Press, 1980, 181–204.Google Scholar
  41. [41]
    H.C. Tijms, “Average reward optimality equation in Markov decision processes with a general state space”, in Probability, Statistics and Optimization: a Tribute to Peter Whittle, ed. by F.P. Kelly, Wiley, 1994, 485–495.Google Scholar
  42. [42]
    A.F. Veinott Jr., “On finding optimal policies in discrete dynamic programming with no discounting”, Annals of Mathematical Statistics 37, 1284–1294, 1966.CrossRefGoogle Scholar
  43. [43]
    A.F. Veinott Jr., “Discrete dynamic programming with sensitive optimality criteria”, Annals of Mathematical Statistics 40, 1635–1660, 1969.CrossRefGoogle Scholar
  44. [44]
    A.F. Veinott Jr., Dynamic Programming and Stochastic Control, Unpublished class notes.Google Scholar
  45. [45]
    A.F. Veinott Jr., “Markov decision chains”, Studies in Optimization, G.B. Dantzig and B.C. Eaves editors, American Mathematical Association, Providence RI 1974, 124–159.Google Scholar
  46. [46]
    H.M. Wagner, “On optimality of pure strategies”, Management Science 6, 268–269, 1960.CrossRefGoogle Scholar
  47. [47]
    J. Warga, Optimal Control of Differential and Functional Equations, Academic Press, 1972.Google Scholar
  48. [48]
    K. Yosida, Functional Analysis, Springer, 1980.CrossRefGoogle Scholar
  49. [49]
    A.A. Yushkevich, “Blackwell optimal policies in a Markov decision process with a Borel state space”, Mathematical Methods of Operations Research 40, 253–288, 1994.CrossRefGoogle Scholar
  50. [50]
    A.A. Yushkevich, “Strong 0-discount optimal policies in a Markov decision process with a Borel state space”, Mathematical Methods of Operations Research 42, 93–108, 1995.CrossRefGoogle Scholar
  51. [51]
    A.A. Yushkevich, “A note on asymptotics of discounted value function and strong 0-discount optimality”, Mathematical Methods of Operations Research 44, 223–231, 1996.CrossRefGoogle Scholar
  52. [52]
    A.A. Yushkevich, “Blackwell optimal policies in countable dynamic programming without aperiodicity assumptions”, in Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, ed. by T.S. Ferguson, L.S. Shapley and J.B. MacQueen, Inst, of Math. Stat., 1996, 401–407.CrossRefGoogle Scholar
  53. [53]
    A.A. Yushkevich, “Blackwell optimality in Markov decision processes with a Borel state space”, Proceedings of 36th IEEE Conference on Decision and Control 3, 2827–2830, 1997.CrossRefGoogle Scholar
  54. [54]
    A.A. Yushkevich, “The compactness of a policy space in dynamic programming via an extension theorem for Carathéodory functions”, Mathematics of Operations Research 22, 458–467, 1997.CrossRefGoogle Scholar
  55. [55]
    A.A. Yushkevich, “Blackwell optimality in Borelian continuous-in-action Markov decision processes”, SIAM Journal on Control and Optimization 35, 2157–2182, 1997.CrossRefGoogle Scholar
  56. [56]
    A.A. Yushkevich and R.Ya. Chitashvili, “Controlled random sequences and Markov chains”, Russian Mathematical Surveys 37, 239–274, 1982.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • Arie Hordijk
    • 1
  • Alexander A. Yushkevich
    • 2
  1. 1.Mathematical InstituteUniversity of LeidenLeidenThe Netherlands
  2. 2.Department of MathematicsUniversity of North Carolina at CharlotteCharlotteUSA

Personalised recommendations