Skip to main content

Average Reward Optimization Theory for Denumerable State Spaces

  • Chapter

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 40))

Abstract

In this chapter we deal with certain aspects of average reward optimality. It is assumed that the state space X is denumerably infinite, and that for each x ∈ X, the set A(x) of available actions is finite. It is possible to extend the theory to compact action sets, but at the expense of increased mathematical complexity. Finite action sets are sufficient for digitally implemented controls, and so we restrict our attention to this case.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Altman, P. Konstantopoulos, and Z. Liu, “Stability, monotonicity and invariant quantities in general polling systems,” Queueing Sys. 11, 35–57, 1992.

    Article  Google Scholar 

  2. A. Arapostathis, V. Borkar, E. Fernandez-Gaucherand, M. Ghosh, and S. Marcus, “Discrete-time controlled Markov processes with average cost criterion: a survey,” SIAM J. Control Optim. 31, 282–344, 1993.

    Article  Google Scholar 

  3. V. Borkar, “On minimum cost per unit time control of Markov chains,” SIAM J. Control Optim. 22, 965–978, 1984.

    Article  Google Scholar 

  4. V. Borkar, “Control of Markov chains with long-run average cost criterion,” in Stochastic Differential Systems, Stochastic Control Theory and Applications, edited by W. Fleming and P. L. Lions, Springer-Verlag, New York, 1988.

    Google Scholar 

  5. V. Borkar, “Control of Markov chains with long-run average cost criterion: the dynamic programming equations,” SIAM J. Control Optim. 27, 642–657, 1989.

    Article  Google Scholar 

  6. V. Borkar, Topics in Controlled Markov Chains, Pitman Research Notes in Mathematics No. 240, Longman Scientific-Wiley, New York, 1991.

    Google Scholar 

  7. R. Cavazos-Cadena, “Weak conditions for the existence of optimal stationary policies in average Markov decision chains with unbounded costs, “Kybernetiha 25, 145–156, 1989.

    Google Scholar 

  8. R. Cavazos-Cadena, “Solution to the optimality equation in a class of Markov decision chains with the average cost criterion,” Kybernetiha 27, 23–37, 1991.

    Google Scholar 

  9. R. Cavazos-Cadena, “A counterexample on the optimality equation in Markov decision chains with the average cost criterion,” Sys. Control Letters 16, 387–392, 1991.

    Article  Google Scholar 

  10. R. Cavazos-Cadena, “Recent results on conditions for the existence of av- erage optimal stationary policies,” Ann. Op. Res. 28, 3–27, 1991.

    Article  Google Scholar 

  11. R. Cavazos-Cadena and L. Sennott, “Comparing recent assumptions for the existence of average optimal stationary policies,” Op. Res. Letters 11, 33–37, 1992.

    Article  Google Scholar 

  12. C. Derman, “Denumerable state Markovian decision processes-average cost criterion,” Ann. Math. Stat. 37, 1545–1553, 1966.

    Article  Google Scholar 

  13. C. Derman, Finite State Markovian Decision Processes, Academic, New York, 1970.

    Google Scholar 

  14. E. Dynkin and A. Yushkevich, Controlled Markov Processes, Springer-Verlag, New York, 1979.

    Book  Google Scholar 

  15. A. Federgruen and H. Tijms, “The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms,” J. Appl. Prob. 15, 356–373, 1978.

    Article  Google Scholar 

  16. A. Federgruen, A. Hordijk, and H. Tijms, “Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion,” Stoc. Proc. Appl. 9, 223–235, 1979.

    Article  Google Scholar 

  17. A. Federgruen, P. Schweitzer, and H. Tijms, “Denumerable undiscounted semi-Markov decision processes with unbounded costs,” Math. Op. Res. 8, 298–313, 1983.

    Article  Google Scholar 

  18. E. Feinberg, “An ε-optimal control of a finite Markov chain with an average reward criterion,” SIAM Theory Probability Appl. 25, 70–81, 1980.

    Article  Google Scholar 

  19. C. Fricker and M. Jaibi, “Monotonicity and stability of periodic polling models,” Queueing Sys. 15, 211–238, 1994.

    Article  Google Scholar 

  20. L. Georgiadis and W. Szpankowski, “Stability of token passing rings,” Queueing Sys. 11, 7–33, 1992.

    Article  Google Scholar 

  21. O. Hernández-Lerma and J. Lasserre, “Average cost optimal policies for Markov control processes with Borel state space and unbounded costs,” Sys. Control Letters 15, 349–356, 1990.

    Article  Google Scholar 

  22. O. Hernández-Lerma, “Average optimality in dynamic programming on Borel spaces—unbounded costs and controls,” Sys. Control Letters 17, 237–242, 1991.

    Article  Google Scholar 

  23. O. Hernández-Lerma, “Existence of average optimal policies in Markov control processes with strictly unbounded costs,” Kybernetika 29, 1–17, 1993.

    Google Scholar 

  24. O. Hernández-Lerma and J. Lasserre, Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1996.

    Book  Google Scholar 

  25. A. Hordijk, “Regenerative Markov decision models,” Math. Prog. Study 6, 49–72, 1976.

    Article  Google Scholar 

  26. A. Hordijk, Dynamic Programming and Markov Potential Theory Second Ed., Mathematisch Centrum Tract 51, Amsterdam, 1977.

    Google Scholar 

  27. Q. Hu, “Discounted and average Markov decision processes with unbounded rewards: new conditions,” J. Math. Anal. Appl. 171, 111–124, 1992.

    Article  Google Scholar 

  28. M. Kitaev and V. Rykov, Controlled Queueing Systems, CRC Press, Boca Raton, 1995.

    Google Scholar 

  29. S. Lippman, “On dynamic programming with unbounded rewards,” Man. Sci. 21, 1225–1233, 1975.

    Article  Google Scholar 

  30. R. Montes-de-Oca and O. Hernandez-Lerma, “Conditions for average optimally in Markov control processes with unbounded costs and controls,” J. Math. Sys. Estimation and Control 4, 1–19, 1994.

    Google Scholar 

  31. M. Puterman, Markov Decision Processes, Wiley, New York, 1994.

    Book  Google Scholar 

  32. R. Ritt and L. Sennott, “Optimal stationary policies in general state space Markov decision chains with finite action sets,” Math. Op. Res. 17, 901–909, 1992.

    Article  Google Scholar 

  33. S. Ross, “Non-discounted denumerable Markovian decision models,” Ann. Math. Stat. 39, 412–423, 1968.

    Article  Google Scholar 

  34. S. Ross, Introduction to Stochastic Dynamic Programming, Academic Press, New York, 1983.

    Google Scholar 

  35. M. Schäl, “Average optimality in dynamic programming with general state space,” Math. Op. Res. 18, 163–172, 1993.

    Article  Google Scholar 

  36. L. Sennott, “The average cost optimality equation and critical number policies,” Prob. Eng. Info. Sci. 7, 47–67, 1993.

    Article  Google Scholar 

  37. L. Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley, New York, 1999.

    Google Scholar 

  38. F. Spieksma, Geometrically Ergodic Markov Chains and the Optimal Control of Queues, Ph.D. thesis, Leiden University, 1990.

    Google Scholar 

  39. S. Stidham Jr. and R. Weber, “Monotonie and insensitive optimal policies for control of queues with undiscounted costs,” Op. Res. 87, 611–625, 1989.

    Article  Google Scholar 

  40. H. Takagi, Analysis of Polling Systems, MIT, Cambridge, 1986.

    Google Scholar 

  41. H. Takagi, “Queueing analysis of polling models: an update,” in Stochastic Analysis of Computer and Communication Shystems, edited by H. Takagi, North Holland, New York, 1990.

    Google Scholar 

  42. H. Takagi, “Queueing analysis of polling models: progress in 1990–1994,” in Frontiers in Queueing, edited by J. Dshalalow. CRC Press, Boca Raton, 1997.

    Google Scholar 

  43. H. Taylor, “Markovian seqential replacement processes,” Ann. Math. Stat. 36, 1677–1694, 1965.

    Article  Google Scholar 

  44. E. Titchmarsh, Theory of Functions, Second Ed., Oxford University Press, Oxford, 1939.

    Google Scholar 

  45. D. Widder, The Laplace Transform, Princeton University Press, Princeton, 1941.

    Google Scholar 

  46. J. Wijngaard, “Existence of average optimal strategies in Markovian decision problems with strictly unbounded costs,” in Dynamic Programming and Its Applications, edited by M. Puterman, Academic, New York, 1978.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Sennott, L.I. (2002). Average Reward Optimization Theory for Denumerable State Spaces. In: Feinberg, E.A., Shwartz, A. (eds) Handbook of Markov Decision Processes. International Series in Operations Research & Management Science, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0805-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0805-2_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5248-8

  • Online ISBN: 978-1-4615-0805-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics