Abstract
This paper is concerned with finite state multichain MDPs with compact action set. The optimality criterion is the long-run average cost. Simple examples illustrate that optimal stationaryu Markov policies do not always exist. We establish the existence of e-optimal policies which are stationary Markovian, and develop an algorithm which computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy which is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Bather (1973). Optimal decision procedures for finite Markov chains, II: Communicating systems, Adv. AppL Probab. 5 521–540.
J. Bather (1973). Optimal decision procedures for finite Markov chains, III: General convex systems, Adv. AppL Probab. 5 541–553.
A. Berman and R. J. Plemmons (1979). Nonnegative Matrices in The Mathematical Sciences, Academic Press, New York.
D. Blackwell (1962). Discrete dynamic programming, Ann. Math. Stat. 33 719–726.
V. S. Borkar (1984). On minimum cost per unit time control of Markov chains, SIAM J. Control Optim. 22 965–984.
V. S. Borkar (1989). Control of Markov chains with long-run average cost criterion: The dynamic programming equations, SIAM J. Control Optim. 27 642–657.
E. V. Denardo and B. Fox (1968). Multichain Markov renewal programs, SIAM J. Appl. Math.468–487.
C. Derman (1970). Finite States Markovian Decision Processes, Academic Press, New York.
E. B. Dynkin and A. A. Yushkevich (1979). Controlled Markov Processes, Springer-Verlag, New York.
A. Federgruën and J. P. Schweitzer (1984). A fixed-point approach to undiscounted Markov renewal programs, SIAM J. Alg. Disc. Math. 5 539–550.
A. Federgruën, P. J. Schweitzer and H. C. Tijms (1983). Denu-merable undiscounted semi-Markov decision processes with unbounded rewards, Math. Op. Res. 8 298–313.
E. A. Feinberg (1975). On controlled finite state Markov processes with compact control sets, Theor. Prob. Appl. 20 856–861.
E. A. Feinberg (1992). On stationary strategies Borel dynamic programming, Math. Op. Res. 17 392–397.
J. Filar and K. Vrieze (1997). Competitive Markov Decision Processes, Springer-Verlag, New York.
M. Haviv and M. L. Puterman (1991). An improved algorithm for solving communicating average reward Markov decision processes, Ann. Op. Res. 28 229–242.
D. P. Heyman and M. J. Sobel (1984). Stochastic Models in Operations Research, vol. II: Stochastic Optimization, McGraw-Hill, New York.
K. Hinderer (1970). Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter, Lecture Notes in Operations Research 33, Springer-Verlag, New York.
A. Hordijk (1974). Dynamic programming and Markov potential theory, Math. Centre Tracts 51, Amsterdam.
A. Hordijk and L. C. M. Kallenberg, Constrained undiscounted stochastic dynamic programming, Math. Op. Res. 9 276–289.
A. Hordijk and M. L. Puterman (1987). On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case, Math. Oper. Res. 12 163–176.
A. Leizarowitz (1996). Overtaking and almost-sure optimality for infinite horizon Markov Decision Processes. Math. Oper. Res.21 158–181.
A. Leizarowitz (2000). An algorithm to identify average optimal policies in multichain finite state compact action Markov decision processes, preprint.
A. Martin-Löf (1967). Existence of stationary control for a Markov chain maximizing the average reward, Oper. Res. 15 866–871.
L. K. Platzman (1977). Improved conditions for convergence in undiscounted Markov renewal programming, Op. Res. 25 529–533.
M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, Inc., New York.
K. W. Ross and R. Varadarajan (1991). Multichain Markov decision processes with a sample-path constraint: A decomposition approach, Math. Op. Res. 16 195–217.
M. Schäl (1975). Conditions for optimality in dynamic programming and for the limit of n-stage optimal policy to be optimal, Z. Wahrch. verw. Gebiete 32 179–196.
M. Schäl (1992). On the second optimality equation for semi-Markov decision models, Math. Op. Res. 17 470–486.
P. J. Schweitzer (1983). On the solvability of Bellman’s functional equations for Markov renewal programs, J. Math. Anal. Appl. 96 13–23.
P. J. Schweitzer (1987). A Brouwer fixed-point mapping approach to communicating Markov Decision Processes, J. Math. Anal. Appl. 123117–130.
P. J. Schweitzer and A. Federgruën (1978). The functional equations of undiscounted Markov renewal programming, Math. Op. Res.3 308–321.
E. Seneta (1981). Non-negative Matrices and Markov Chains, Springer-Verlag, New York.
R. E. Strauch (1966). Negative dynamic programming, Ann. Math. Stat. 37 871–890.
A. A. Yushkevich (1973). On a class of policies in general Markov decision models, Theor. Prog. Appl. 18 777–779.
R. Varadarajan (1987). Reliability and performance models for reconfigurable computer systems. PhD thesis, University of Pennsylvania, Philadelphia, PA.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Leizarowitz, A. (2002). On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes. In: Zaccour, G. (eds) Decision & Control in Management Science. Advances in Computational Management Science, vol 4. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3561-1_5
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3561-1_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4995-0
Online ISBN: 978-1-4757-3561-1
eBook Packages: Springer Book Archive