Skip to main content

On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes

  • Chapter
Decision & Control in Management Science

Part of the book series: Advances in Computational Management Science ((AICM,volume 4))

  • 271 Accesses

Abstract

This paper is concerned with finite state multichain MDPs with compact action set. The optimality criterion is the long-run average cost. Simple examples illustrate that optimal stationaryu Markov policies do not always exist. We establish the existence of e-optimal policies which are stationary Markovian, and develop an algorithm which computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy which is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Bather (1973). Optimal decision procedures for finite Markov chains, II: Communicating systems, Adv. AppL Probab. 5 521–540.

    Article  Google Scholar 

  2. J. Bather (1973). Optimal decision procedures for finite Markov chains, III: General convex systems, Adv. AppL Probab. 5 541–553.

    Article  Google Scholar 

  3. A. Berman and R. J. Plemmons (1979). Nonnegative Matrices in The Mathematical Sciences, Academic Press, New York.

    Google Scholar 

  4. D. Blackwell (1962). Discrete dynamic programming, Ann. Math. Stat. 33 719–726.

    Article  Google Scholar 

  5. V. S. Borkar (1984). On minimum cost per unit time control of Markov chains, SIAM J. Control Optim. 22 965–984.

    Article  Google Scholar 

  6. V. S. Borkar (1989). Control of Markov chains with long-run average cost criterion: The dynamic programming equations, SIAM J. Control Optim. 27 642–657.

    Article  Google Scholar 

  7. E. V. Denardo and B. Fox (1968). Multichain Markov renewal programs, SIAM J. Appl. Math.468–487.

    Google Scholar 

  8. C. Derman (1970). Finite States Markovian Decision Processes, Academic Press, New York.

    Google Scholar 

  9. E. B. Dynkin and A. A. Yushkevich (1979). Controlled Markov Processes, Springer-Verlag, New York.

    Book  Google Scholar 

  10. A. Federgruën and J. P. Schweitzer (1984). A fixed-point approach to undiscounted Markov renewal programs, SIAM J. Alg. Disc. Math. 5 539–550.

    Article  Google Scholar 

  11. A. Federgruën, P. J. Schweitzer and H. C. Tijms (1983). Denu-merable undiscounted semi-Markov decision processes with unbounded rewards, Math. Op. Res. 8 298–313.

    Article  Google Scholar 

  12. E. A. Feinberg (1975). On controlled finite state Markov processes with compact control sets, Theor. Prob. Appl. 20 856–861.

    Article  Google Scholar 

  13. E. A. Feinberg (1992). On stationary strategies Borel dynamic programming, Math. Op. Res. 17 392–397.

    Article  Google Scholar 

  14. J. Filar and K. Vrieze (1997). Competitive Markov Decision Processes, Springer-Verlag, New York.

    Google Scholar 

  15. M. Haviv and M. L. Puterman (1991). An improved algorithm for solving communicating average reward Markov decision processes, Ann. Op. Res. 28 229–242.

    Article  Google Scholar 

  16. D. P. Heyman and M. J. Sobel (1984). Stochastic Models in Operations Research, vol. II: Stochastic Optimization, McGraw-Hill, New York.

    Google Scholar 

  17. K. Hinderer (1970). Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter, Lecture Notes in Operations Research 33, Springer-Verlag, New York.

    Book  Google Scholar 

  18. A. Hordijk (1974). Dynamic programming and Markov potential theory, Math. Centre Tracts 51, Amsterdam.

    Google Scholar 

  19. A. Hordijk and L. C. M. Kallenberg, Constrained undiscounted stochastic dynamic programming, Math. Op. Res. 9 276–289.

    Google Scholar 

  20. A. Hordijk and M. L. Puterman (1987). On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case, Math. Oper. Res. 12 163–176.

    Article  Google Scholar 

  21. A. Leizarowitz (1996). Overtaking and almost-sure optimality for infinite horizon Markov Decision Processes. Math. Oper. Res.21 158–181.

    Article  Google Scholar 

  22. A. Leizarowitz (2000). An algorithm to identify average optimal policies in multichain finite state compact action Markov decision processes, preprint.

    Google Scholar 

  23. A. Martin-Löf (1967). Existence of stationary control for a Markov chain maximizing the average reward, Oper. Res. 15 866–871.

    Article  Google Scholar 

  24. L. K. Platzman (1977). Improved conditions for convergence in undiscounted Markov renewal programming, Op. Res. 25 529–533.

    Article  Google Scholar 

  25. M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, Inc., New York.

    Book  Google Scholar 

  26. K. W. Ross and R. Varadarajan (1991). Multichain Markov decision processes with a sample-path constraint: A decomposition approach, Math. Op. Res. 16 195–217.

    Article  Google Scholar 

  27. M. Schäl (1975). Conditions for optimality in dynamic programming and for the limit of n-stage optimal policy to be optimal, Z. Wahrch. verw. Gebiete 32 179–196.

    Article  Google Scholar 

  28. M. Schäl (1992). On the second optimality equation for semi-Markov decision models, Math. Op. Res. 17 470–486.

    Article  Google Scholar 

  29. P. J. Schweitzer (1983). On the solvability of Bellman’s functional equations for Markov renewal programs, J. Math. Anal. Appl. 96 13–23.

    Article  Google Scholar 

  30. P. J. Schweitzer (1987). A Brouwer fixed-point mapping approach to communicating Markov Decision Processes, J. Math. Anal. Appl. 123117–130.

    Article  Google Scholar 

  31. P. J. Schweitzer and A. Federgruën (1978). The functional equations of undiscounted Markov renewal programming, Math. Op. Res.3 308–321.

    Article  Google Scholar 

  32. E. Seneta (1981). Non-negative Matrices and Markov Chains, Springer-Verlag, New York.

    Google Scholar 

  33. R. E. Strauch (1966). Negative dynamic programming, Ann. Math. Stat. 37 871–890.

    Article  Google Scholar 

  34. A. A. Yushkevich (1973). On a class of policies in general Markov decision models, Theor. Prog. Appl. 18 777–779.

    Article  Google Scholar 

  35. R. Varadarajan (1987). Reliability and performance models for reconfigurable computer systems. PhD thesis, University of Pennsylvania, Philadelphia, PA.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Leizarowitz, A. (2002). On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes. In: Zaccour, G. (eds) Decision & Control in Management Science. Advances in Computational Management Science, vol 4. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3561-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3561-1_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4995-0

  • Online ISBN: 978-1-4757-3561-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics