On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes

Leizarowitz, Arie

doi:10.1007/978-1-4757-3561-1_5

Arie Leizarowitz

Part of the book series: Advances in Computational Management Science ((AICM,volume 4))

271 Accesses

Abstract

This paper is concerned with finite state multichain MDPs with compact action set. The optimality criterion is the long-run average cost. Simple examples illustrate that optimal stationaryu Markov policies do not always exist. We establish the existence of e-optimal policies which are stationary Markovian, and develop an algorithm which computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy which is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Bather (1973). Optimal decision procedures for finite Markov chains, II: Communicating systems, Adv. AppL Probab. 5 521–540.
Article Google Scholar
J. Bather (1973). Optimal decision procedures for finite Markov chains, III: General convex systems, Adv. AppL Probab. 5 541–553.
Article Google Scholar
A. Berman and R. J. Plemmons (1979). Nonnegative Matrices in The Mathematical Sciences, Academic Press, New York.
Google Scholar
D. Blackwell (1962). Discrete dynamic programming, Ann. Math. Stat. 33 719–726.
Article Google Scholar
V. S. Borkar (1984). On minimum cost per unit time control of Markov chains, SIAM J. Control Optim. 22 965–984.
Article Google Scholar
V. S. Borkar (1989). Control of Markov chains with long-run average cost criterion: The dynamic programming equations, SIAM J. Control Optim. 27 642–657.
Article Google Scholar
E. V. Denardo and B. Fox (1968). Multichain Markov renewal programs, SIAM J. Appl. Math.468–487.
Google Scholar
C. Derman (1970). Finite States Markovian Decision Processes, Academic Press, New York.
Google Scholar
E. B. Dynkin and A. A. Yushkevich (1979). Controlled Markov Processes, Springer-Verlag, New York.
Book Google Scholar
A. Federgruën and J. P. Schweitzer (1984). A fixed-point approach to undiscounted Markov renewal programs, SIAM J. Alg. Disc. Math. 5 539–550.
Article Google Scholar
A. Federgruën, P. J. Schweitzer and H. C. Tijms (1983). Denu-merable undiscounted semi-Markov decision processes with unbounded rewards, Math. Op. Res. 8 298–313.
Article Google Scholar
E. A. Feinberg (1975). On controlled finite state Markov processes with compact control sets, Theor. Prob. Appl. 20 856–861.
Article Google Scholar
E. A. Feinberg (1992). On stationary strategies Borel dynamic programming, Math. Op. Res. 17 392–397.
Article Google Scholar
J. Filar and K. Vrieze (1997). Competitive Markov Decision Processes, Springer-Verlag, New York.
Google Scholar
M. Haviv and M. L. Puterman (1991). An improved algorithm for solving communicating average reward Markov decision processes, Ann. Op. Res. 28 229–242.
Article Google Scholar
D. P. Heyman and M. J. Sobel (1984). Stochastic Models in Operations Research, vol. II: Stochastic Optimization, McGraw-Hill, New York.
Google Scholar
K. Hinderer (1970). Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter, Lecture Notes in Operations Research 33, Springer-Verlag, New York.
Book Google Scholar
A. Hordijk (1974). Dynamic programming and Markov potential theory, Math. Centre Tracts 51, Amsterdam.
Google Scholar
A. Hordijk and L. C. M. Kallenberg, Constrained undiscounted stochastic dynamic programming, Math. Op. Res. 9 276–289.
Google Scholar
A. Hordijk and M. L. Puterman (1987). On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case, Math. Oper. Res. 12 163–176.
Article Google Scholar
A. Leizarowitz (1996). Overtaking and almost-sure optimality for infinite horizon Markov Decision Processes. Math. Oper. Res.21 158–181.
Article Google Scholar
A. Leizarowitz (2000). An algorithm to identify average optimal policies in multichain finite state compact action Markov decision processes, preprint.
Google Scholar
A. Martin-Löf (1967). Existence of stationary control for a Markov chain maximizing the average reward, Oper. Res. 15 866–871.
Article Google Scholar
L. K. Platzman (1977). Improved conditions for convergence in undiscounted Markov renewal programming, Op. Res. 25 529–533.
Article Google Scholar
M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, Inc., New York.
Book Google Scholar
K. W. Ross and R. Varadarajan (1991). Multichain Markov decision processes with a sample-path constraint: A decomposition approach, Math. Op. Res. 16 195–217.
Article Google Scholar
M. Schäl (1975). Conditions for optimality in dynamic programming and for the limit of n-stage optimal policy to be optimal, Z. Wahrch. verw. Gebiete 32 179–196.
Article Google Scholar
M. Schäl (1992). On the second optimality equation for semi-Markov decision models, Math. Op. Res. 17 470–486.
Article Google Scholar
P. J. Schweitzer (1983). On the solvability of Bellman’s functional equations for Markov renewal programs, J. Math. Anal. Appl. 96 13–23.
Article Google Scholar
P. J. Schweitzer (1987). A Brouwer fixed-point mapping approach to communicating Markov Decision Processes, J. Math. Anal. Appl. 123117–130.
Article Google Scholar
P. J. Schweitzer and A. Federgruën (1978). The functional equations of undiscounted Markov renewal programming, Math. Op. Res.3 308–321.
Article Google Scholar
E. Seneta (1981). Non-negative Matrices and Markov Chains, Springer-Verlag, New York.
Google Scholar
R. E. Strauch (1966). Negative dynamic programming, Ann. Math. Stat. 37 871–890.
Article Google Scholar
A. A. Yushkevich (1973). On a class of policies in general Markov decision models, Theor. Prog. Appl. 18 777–779.
Article Google Scholar
R. Varadarajan (1987). Reliability and performance models for reconfigurable computer systems. PhD thesis, University of Pennsylvania, Philadelphia, PA.
Google Scholar

Download references

Authors

Arie Leizarowitz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GERAD and HEC-Montréal, Canada
Georges Zaccour

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Leizarowitz, A. (2002). On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes. In: Zaccour, G. (eds) Decision & Control in Management Science. Advances in Computational Management Science, vol 4. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3561-1_5

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3561-1_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4995-0
Online ISBN: 978-1-4757-3561-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics