Skip to main content

Abstract

In this introductory section we consider Blackwell optimality in Controlled Markov Processes (CMPs) with finite state and action spaces; for brevity, we call them finite models. We introduce the basic definitions, the Laurent-expansion technique, the lexicographical policy improvement, and the Blackwell optimality equation, which were developed at the early stage of the study of sensitive criteria in CMPs. We also mention some extensions and generalizations obtained afterwards for the case of a finite state space. In Chapter 2 the algorithmic approach to Blackwell optimality for finite models is given. We refer to that chapter for computational methods. Especially for the linear programming method, which we do not introduce.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Altman, A. Hordijk and L.C.M. Kallenberg, “On the value function in constrained control of Markov chains”, Mathematical Methods of Operations Research 44, 387–399, 1996.

    Article  Google Scholar 

  2. E.I. Balder, “On compactness of the space of policies in stochastic dynamic programming”, Stochastic Processes and Applications 32, 141–150, 1989.

    Article  Google Scholar 

  3. D. Blackwell, “Discrete dynamic programming”, Annals of Mathematical Statistics 33, 719–726, 1962.

    Article  Google Scholar 

  4. R. Cavazos-Cadena and J.B. Lasserre, “Strong 1-optimal stationary policies in denumerable Markov decision processes”, Systems and Control Letters 11, 65–71, 1988.

    Article  Google Scholar 

  5. R. Cavazos-Cadena and J.B. Lasserre, “A direct approach to Blackwell optimality”, MORFISMOS 3, no. 1, 9–33, 1999.

    Google Scholar 

  6. R.Ya. Chitashvili, “A controlled finite Markov chain with an arbitrary set of decisions”, Theory Prob. Appl. 20, 839–846, 1975.

    Article  Google Scholar 

  7. R.Ya. Chitashvili, “A finite controlled Markov chain with small termination probability”, Theory Prob. Appl. 21, 158–163, 1976.

    Article  Google Scholar 

  8. R. Dekker and A. Hordijk, “Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards”, Mathematics of Operations Research 13, 395–421, 1988.

    Article  Google Scholar 

  9. R. Dekker and A. Hordijk. “Denumerable semi-Markov decision chains with small interest rates”, Annals Oper. Res. 28, 185–212, 1991.

    Article  Google Scholar 

  10. R. Dekker and A. Hordijk, “Recurrence conditions for average and Black-well optimality in denumerable state Markov decision chains”, Mathematics of Operations Research 17, 271–289, 1992.

    Article  Google Scholar 

  11. R. Dekker, A. Hordijk and F.M. Spieksma, “On the relation between recurrence and ergodicity properties in denumerable Markov decision chains”, Mathematics of Operations Research 19, 539–559, 1994.

    Article  Google Scholar 

  12. E.V. Denardo, “Markov renewal programming with small interest rates”, Annals Math. Stat. 42, 477–496, 1971.

    Article  Google Scholar 

  13. A. Federgruen, A. Hordijk and H.C. Tijms, “A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices”, Journal of Applied Probability 15, 842–847, 1978.

    Article  Google Scholar 

  14. A. Federgruen, A. Hordijk and H.C. Tijms, “Recurrence conditions in denumerable state Markov decision processes”, in Dynamic Programming and Its Applications, ed. by M.L. Puterman, 3–22, Academic Press, 1978.

    Google Scholar 

  15. J. Flynn, “Averaging vs. discounting in dynamic programming: a counterexample”, Annals of Statistics 2, 411–413, 1974.

    Article  Google Scholar 

  16. O. Hernández-Lerma, R. Montes-de-Oca and R. Cavazos-Cadena, “Recurrence conditions for Markov decision processes with Borel state space: a survey”, Annals of Operations Research 28, 29–46, 1991.

    Article  Google Scholar 

  17. A. Hordijk, Dynamic Programming and Markov Potential Theory, Mathematical Centre Tract 51, Mathematisch Centrum, 1974.

    Google Scholar 

  18. A. Hordijk, “Regenerative Markov decision models”, in Mathematical Programming Study, 6, ed. by R.J.B. Wets, North Holland, 1976, 49–72.

    Google Scholar 

  19. A. Hordijk, R. Dekker and L.C.M. Kallenberg, “Sensitivity-analysis in discounted Markovian decision problems”, Operations Research Spektrum 7, 143–151, 1985.

    Article  Google Scholar 

  20. A. Hordijk, O. Passchier and F.M. Spieksma, “On the existence of the Puisseux expansion of the discounted rewards: a counterexample”, Probability in the Engineering and Informational Sciences 13, 229–235, 1999.

    Article  Google Scholar 

  21. A. Hordijk and K. Sladký, “Sensitive optimality criteria in countable state dynamic programming”, Math. of Oper. Res. 2, 1–14, 1977.

    Article  Google Scholar 

  22. A. Hordijk and F.M. Spieksma, “Are limits of α-discounted optimal policies Blackwell optimal? A counterexample”, Systems and Control Letters, 13, 31–41, 1989.

    Article  Google Scholar 

  23. A. Hordijk and F.M. Spieksma, “On ergodicity and recurrence properties of a Markov chain with an application to an open Jackson network”, Advances in Applied Probability 24, 343–376, 1992.

    Article  Google Scholar 

  24. A. Hordijk, F.M. Spieksma and R.L. Tweedie, “Uniform stability conditions for general space Markov decision processes”, Technical report, Leiden University and Colorado State University, 1995.

    Google Scholar 

  25. A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards”, Math. Methods Oper. Res. 49, 1–39, 1999.

    Google Scholar 

  26. A. Hordijk and A.A. Yushkevich, “Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards”, Mathematical Methods of Operations Research 50, 421–428, 1999.

    Article  Google Scholar 

  27. R.A. Howard, Dynamic Programming and Markov Processes, Wiley, 1960.

    Google Scholar 

  28. Y. Huang and A.F. Veinott Jr., “Markov branching decision chains with interest-rate-dependent rewards, Probability in the Engineering and Informational Sciences 9, 99–121, 1995.

    Article  Google Scholar 

  29. J.G. Kemeny and J.L. Snell, Finite Markov chains, Van Nostrand-Reinhold, 1960.

    Google Scholar 

  30. J.B. Lasserre, “Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes”, Journal of Mathematical Analysis and Applications 136, 479–490, 1988.

    Article  Google Scholar 

  31. A. Maitra, “Dynamic programming for countable state systems”, Sankhya Ser. A 27, 241–248, 1965.

    Google Scholar 

  32. S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability, Springer, 1993.

    Book  Google Scholar 

  33. B.L. Miller and A.F. Veinott, “Discrete dynamic programming with a small interest rate”, Annals of Mathematical Statistics 40, 366–370, 1969.

    Article  Google Scholar 

  34. M.L. Puterman, Markov Decision Processes, Wiley, 1994.

    Book  Google Scholar 

  35. S.M. Ross, “Non-discounted denumerable Markovian decision models”, Annals of Mathematical Statistics 39, 412–423, 1968.

    Article  Google Scholar 

  36. M. Schäl, “On dynamic programming: Compactness of the space of policies”, Stochastic Processes and Applications 3, 345–354, 1975.

    Article  Google Scholar 

  37. M. Schäl, “On dynamic programming and statistical decision theory”, Annals of Statistics 7, 432–445, 1979.

    Article  Google Scholar 

  38. K. Sladký, “On the set of optimal controls for Markov chains with rewards”, Kybernetika (Prague) 10, 350–367, 1974.

    Google Scholar 

  39. F.M. Spieksma, “Geometrically ergodic Markov chains and the optimal control of queues”, Ph.D. Thesis, University of Leiden, 1990.

    Google Scholar 

  40. L.C. Thomas, “Connectedness conditions for denumerable state Markov decision processes”, in Recent Developments in Markov Decision Processes, ed. by R. Hartley, L. Thomas, D. White, Academic Press, 1980, 181–204.

    Google Scholar 

  41. H.C. Tijms, “Average reward optimality equation in Markov decision processes with a general state space”, in Probability, Statistics and Optimization: a Tribute to Peter Whittle, ed. by F.P. Kelly, Wiley, 1994, 485–495.

    Google Scholar 

  42. A.F. Veinott Jr., “On finding optimal policies in discrete dynamic programming with no discounting”, Annals of Mathematical Statistics 37, 1284–1294, 1966.

    Article  Google Scholar 

  43. A.F. Veinott Jr., “Discrete dynamic programming with sensitive optimality criteria”, Annals of Mathematical Statistics 40, 1635–1660, 1969.

    Article  Google Scholar 

  44. A.F. Veinott Jr., Dynamic Programming and Stochastic Control, Unpublished class notes.

    Google Scholar 

  45. A.F. Veinott Jr., “Markov decision chains”, Studies in Optimization, G.B. Dantzig and B.C. Eaves editors, American Mathematical Association, Providence RI 1974, 124–159.

    Google Scholar 

  46. H.M. Wagner, “On optimality of pure strategies”, Management Science 6, 268–269, 1960.

    Article  Google Scholar 

  47. J. Warga, Optimal Control of Differential and Functional Equations, Academic Press, 1972.

    Google Scholar 

  48. K. Yosida, Functional Analysis, Springer, 1980.

    Book  Google Scholar 

  49. A.A. Yushkevich, “Blackwell optimal policies in a Markov decision process with a Borel state space”, Mathematical Methods of Operations Research 40, 253–288, 1994.

    Article  Google Scholar 

  50. A.A. Yushkevich, “Strong 0-discount optimal policies in a Markov decision process with a Borel state space”, Mathematical Methods of Operations Research 42, 93–108, 1995.

    Article  Google Scholar 

  51. A.A. Yushkevich, “A note on asymptotics of discounted value function and strong 0-discount optimality”, Mathematical Methods of Operations Research 44, 223–231, 1996.

    Article  Google Scholar 

  52. A.A. Yushkevich, “Blackwell optimal policies in countable dynamic programming without aperiodicity assumptions”, in Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, ed. by T.S. Ferguson, L.S. Shapley and J.B. MacQueen, Inst, of Math. Stat., 1996, 401–407.

    Chapter  Google Scholar 

  53. A.A. Yushkevich, “Blackwell optimality in Markov decision processes with a Borel state space”, Proceedings of 36th IEEE Conference on Decision and Control 3, 2827–2830, 1997.

    Article  Google Scholar 

  54. A.A. Yushkevich, “The compactness of a policy space in dynamic programming via an extension theorem for Carathéodory functions”, Mathematics of Operations Research 22, 458–467, 1997.

    Article  Google Scholar 

  55. A.A. Yushkevich, “Blackwell optimality in Borelian continuous-in-action Markov decision processes”, SIAM Journal on Control and Optimization 35, 2157–2182, 1997.

    Article  Google Scholar 

  56. A.A. Yushkevich and R.Ya. Chitashvili, “Controlled random sequences and Markov chains”, Russian Mathematical Surveys 37, 239–274, 1982.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Hordijk, A., Yushkevich, A.A. (2002). Blackwell Optimality. In: Feinberg, E.A., Shwartz, A. (eds) Handbook of Markov Decision Processes. International Series in Operations Research & Management Science, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0805-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0805-2_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5248-8

  • Online ISBN: 978-1-4615-0805-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics