Abstract
In this chapter we deal with certain aspects of average reward optimality. It is assumed that the state space X is denumerably infinite, and that for each x ∈ X, the set A(x) of available actions is finite. It is possible to extend the theory to compact action sets, but at the expense of increased mathematical complexity. Finite action sets are sufficient for digitally implemented controls, and so we restrict our attention to this case.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
E. Altman, P. Konstantopoulos, and Z. Liu, “Stability, monotonicity and invariant quantities in general polling systems,” Queueing Sys. 11, 35–57, 1992.
A. Arapostathis, V. Borkar, E. Fernandez-Gaucherand, M. Ghosh, and S. Marcus, “Discrete-time controlled Markov processes with average cost criterion: a survey,” SIAM J. Control Optim. 31, 282–344, 1993.
V. Borkar, “On minimum cost per unit time control of Markov chains,” SIAM J. Control Optim. 22, 965–978, 1984.
V. Borkar, “Control of Markov chains with long-run average cost criterion,” in Stochastic Differential Systems, Stochastic Control Theory and Applications, edited by W. Fleming and P. L. Lions, Springer-Verlag, New York, 1988.
V. Borkar, “Control of Markov chains with long-run average cost criterion: the dynamic programming equations,” SIAM J. Control Optim. 27, 642–657, 1989.
V. Borkar, Topics in Controlled Markov Chains, Pitman Research Notes in Mathematics No. 240, Longman Scientific-Wiley, New York, 1991.
R. Cavazos-Cadena, “Weak conditions for the existence of optimal stationary policies in average Markov decision chains with unbounded costs, “Kybernetiha 25, 145–156, 1989.
R. Cavazos-Cadena, “Solution to the optimality equation in a class of Markov decision chains with the average cost criterion,” Kybernetiha 27, 23–37, 1991.
R. Cavazos-Cadena, “A counterexample on the optimality equation in Markov decision chains with the average cost criterion,” Sys. Control Letters 16, 387–392, 1991.
R. Cavazos-Cadena, “Recent results on conditions for the existence of av- erage optimal stationary policies,” Ann. Op. Res. 28, 3–27, 1991.
R. Cavazos-Cadena and L. Sennott, “Comparing recent assumptions for the existence of average optimal stationary policies,” Op. Res. Letters 11, 33–37, 1992.
C. Derman, “Denumerable state Markovian decision processes-average cost criterion,” Ann. Math. Stat. 37, 1545–1553, 1966.
C. Derman, Finite State Markovian Decision Processes, Academic, New York, 1970.
E. Dynkin and A. Yushkevich, Controlled Markov Processes, Springer-Verlag, New York, 1979.
A. Federgruen and H. Tijms, “The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms,” J. Appl. Prob. 15, 356–373, 1978.
A. Federgruen, A. Hordijk, and H. Tijms, “Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion,” Stoc. Proc. Appl. 9, 223–235, 1979.
A. Federgruen, P. Schweitzer, and H. Tijms, “Denumerable undiscounted semi-Markov decision processes with unbounded costs,” Math. Op. Res. 8, 298–313, 1983.
E. Feinberg, “An ε-optimal control of a finite Markov chain with an average reward criterion,” SIAM Theory Probability Appl. 25, 70–81, 1980.
C. Fricker and M. Jaibi, “Monotonicity and stability of periodic polling models,” Queueing Sys. 15, 211–238, 1994.
L. Georgiadis and W. Szpankowski, “Stability of token passing rings,” Queueing Sys. 11, 7–33, 1992.
O. Hernández-Lerma and J. Lasserre, “Average cost optimal policies for Markov control processes with Borel state space and unbounded costs,” Sys. Control Letters 15, 349–356, 1990.
O. Hernández-Lerma, “Average optimality in dynamic programming on Borel spaces—unbounded costs and controls,” Sys. Control Letters 17, 237–242, 1991.
O. Hernández-Lerma, “Existence of average optimal policies in Markov control processes with strictly unbounded costs,” Kybernetika 29, 1–17, 1993.
O. Hernández-Lerma and J. Lasserre, Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1996.
A. Hordijk, “Regenerative Markov decision models,” Math. Prog. Study 6, 49–72, 1976.
A. Hordijk, Dynamic Programming and Markov Potential Theory Second Ed., Mathematisch Centrum Tract 51, Amsterdam, 1977.
Q. Hu, “Discounted and average Markov decision processes with unbounded rewards: new conditions,” J. Math. Anal. Appl. 171, 111–124, 1992.
M. Kitaev and V. Rykov, Controlled Queueing Systems, CRC Press, Boca Raton, 1995.
S. Lippman, “On dynamic programming with unbounded rewards,” Man. Sci. 21, 1225–1233, 1975.
R. Montes-de-Oca and O. Hernandez-Lerma, “Conditions for average optimally in Markov control processes with unbounded costs and controls,” J. Math. Sys. Estimation and Control 4, 1–19, 1994.
M. Puterman, Markov Decision Processes, Wiley, New York, 1994.
R. Ritt and L. Sennott, “Optimal stationary policies in general state space Markov decision chains with finite action sets,” Math. Op. Res. 17, 901–909, 1992.
S. Ross, “Non-discounted denumerable Markovian decision models,” Ann. Math. Stat. 39, 412–423, 1968.
S. Ross, Introduction to Stochastic Dynamic Programming, Academic Press, New York, 1983.
M. Schäl, “Average optimality in dynamic programming with general state space,” Math. Op. Res. 18, 163–172, 1993.
L. Sennott, “The average cost optimality equation and critical number policies,” Prob. Eng. Info. Sci. 7, 47–67, 1993.
L. Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley, New York, 1999.
F. Spieksma, Geometrically Ergodic Markov Chains and the Optimal Control of Queues, Ph.D. thesis, Leiden University, 1990.
S. Stidham Jr. and R. Weber, “Monotonie and insensitive optimal policies for control of queues with undiscounted costs,” Op. Res. 87, 611–625, 1989.
H. Takagi, Analysis of Polling Systems, MIT, Cambridge, 1986.
H. Takagi, “Queueing analysis of polling models: an update,” in Stochastic Analysis of Computer and Communication Shystems, edited by H. Takagi, North Holland, New York, 1990.
H. Takagi, “Queueing analysis of polling models: progress in 1990–1994,” in Frontiers in Queueing, edited by J. Dshalalow. CRC Press, Boca Raton, 1997.
H. Taylor, “Markovian seqential replacement processes,” Ann. Math. Stat. 36, 1677–1694, 1965.
E. Titchmarsh, Theory of Functions, Second Ed., Oxford University Press, Oxford, 1939.
D. Widder, The Laplace Transform, Princeton University Press, Princeton, 1941.
J. Wijngaard, “Existence of average optimal strategies in Markovian decision problems with strictly unbounded costs,” in Dynamic Programming and Its Applications, edited by M. Puterman, Academic, New York, 1978.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Sennott, L.I. (2002). Average Reward Optimization Theory for Denumerable State Spaces. In: Feinberg, E.A., Shwartz, A. (eds) Handbook of Markov Decision Processes. International Series in Operations Research & Management Science, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0805-2_5
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0805-2_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5248-8
Online ISBN: 978-1-4615-0805-2
eBook Packages: Springer Book Archive