On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes

  • Arie Leizarowitz
Part of the Advances in Computational Management Science book series (AICM, volume 4)


This paper is concerned with finite state multichain MDPs with compact action set. The optimality criterion is the long-run average cost. Simple examples illustrate that optimal stationaryu Markov policies do not always exist. We establish the existence of e-optimal policies which are stationary Markovian, and develop an algorithm which computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy which is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.


Optimal Policy Markov Decision Process Average Cost Optimality Equation Deterministic Policy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    J. Bather (1973). Optimal decision procedures for finite Markov chains, II: Communicating systems, Adv. AppL Probab. 5 521–540.CrossRefGoogle Scholar
  2. [2]
    J. Bather (1973). Optimal decision procedures for finite Markov chains, III: General convex systems, Adv. AppL Probab. 5 541–553.CrossRefGoogle Scholar
  3. [3]
    A. Berman and R. J. Plemmons (1979). Nonnegative Matrices in The Mathematical Sciences, Academic Press, New York.Google Scholar
  4. [4]
    D. Blackwell (1962). Discrete dynamic programming, Ann. Math. Stat. 33 719–726.CrossRefGoogle Scholar
  5. [5]
    V. S. Borkar (1984). On minimum cost per unit time control of Markov chains, SIAM J. Control Optim. 22 965–984.CrossRefGoogle Scholar
  6. [6]
    V. S. Borkar (1989). Control of Markov chains with long-run average cost criterion: The dynamic programming equations, SIAM J. Control Optim. 27 642–657.CrossRefGoogle Scholar
  7. [7]
    E. V. Denardo and B. Fox (1968). Multichain Markov renewal programs, SIAM J. Appl. Math.468–487.Google Scholar
  8. [8]
    C. Derman (1970). Finite States Markovian Decision Processes, Academic Press, New York.Google Scholar
  9. [9]
    E. B. Dynkin and A. A. Yushkevich (1979). Controlled Markov Processes, Springer-Verlag, New York.CrossRefGoogle Scholar
  10. [10]
    A. Federgruën and J. P. Schweitzer (1984). A fixed-point approach to undiscounted Markov renewal programs, SIAM J. Alg. Disc. Math. 5 539–550.CrossRefGoogle Scholar
  11. [11]
    A. Federgruën, P. J. Schweitzer and H. C. Tijms (1983). Denu-merable undiscounted semi-Markov decision processes with unbounded rewards, Math. Op. Res. 8 298–313.CrossRefGoogle Scholar
  12. [12]
    E. A. Feinberg (1975). On controlled finite state Markov processes with compact control sets, Theor. Prob. Appl. 20 856–861.CrossRefGoogle Scholar
  13. [13]
    E. A. Feinberg (1992). On stationary strategies Borel dynamic programming, Math. Op. Res. 17 392–397.CrossRefGoogle Scholar
  14. [14]
    J. Filar and K. Vrieze (1997). Competitive Markov Decision Processes, Springer-Verlag, New York.Google Scholar
  15. [15]
    M. Haviv and M. L. Puterman (1991). An improved algorithm for solving communicating average reward Markov decision processes, Ann. Op. Res. 28 229–242.CrossRefGoogle Scholar
  16. [16]
    D. P. Heyman and M. J. Sobel (1984). Stochastic Models in Operations Research, vol. II: Stochastic Optimization, McGraw-Hill, New York.Google Scholar
  17. [17]
    K. Hinderer (1970). Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter, Lecture Notes in Operations Research 33, Springer-Verlag, New York.CrossRefGoogle Scholar
  18. [18]
    A. Hordijk (1974). Dynamic programming and Markov potential theory, Math. Centre Tracts 51, Amsterdam.Google Scholar
  19. [19]
    A. Hordijk and L. C. M. Kallenberg, Constrained undiscounted stochastic dynamic programming, Math. Op. Res. 9 276–289.Google Scholar
  20. [20]
    A. Hordijk and M. L. Puterman (1987). On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case, Math. Oper. Res. 12 163–176.CrossRefGoogle Scholar
  21. [21]
    A. Leizarowitz (1996). Overtaking and almost-sure optimality for infinite horizon Markov Decision Processes. Math. Oper. Res.21 158–181.CrossRefGoogle Scholar
  22. [22]
    A. Leizarowitz (2000). An algorithm to identify average optimal policies in multichain finite state compact action Markov decision processes, preprint.Google Scholar
  23. [23]
    A. Martin-Löf (1967). Existence of stationary control for a Markov chain maximizing the average reward, Oper. Res. 15 866–871.CrossRefGoogle Scholar
  24. [24]
    L. K. Platzman (1977). Improved conditions for convergence in undiscounted Markov renewal programming, Op. Res. 25 529–533.CrossRefGoogle Scholar
  25. [25]
    M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, Inc., New York.CrossRefGoogle Scholar
  26. [26]
    K. W. Ross and R. Varadarajan (1991). Multichain Markov decision processes with a sample-path constraint: A decomposition approach, Math. Op. Res. 16 195–217.CrossRefGoogle Scholar
  27. [27]
    M. Schäl (1975). Conditions for optimality in dynamic programming and for the limit of n-stage optimal policy to be optimal, Z. Wahrch. verw. Gebiete 32 179–196.CrossRefGoogle Scholar
  28. [28]
    M. Schäl (1992). On the second optimality equation for semi-Markov decision models, Math. Op. Res. 17 470–486.CrossRefGoogle Scholar
  29. [29]
    P. J. Schweitzer (1983). On the solvability of Bellman’s functional equations for Markov renewal programs, J. Math. Anal. Appl. 96 13–23.CrossRefGoogle Scholar
  30. [30]
    P. J. Schweitzer (1987). A Brouwer fixed-point mapping approach to communicating Markov Decision Processes, J. Math. Anal. Appl. 123117–130.CrossRefGoogle Scholar
  31. [31]
    P. J. Schweitzer and A. Federgruën (1978). The functional equations of undiscounted Markov renewal programming, Math. Op. Res.3 308–321.CrossRefGoogle Scholar
  32. [32]
    E. Seneta (1981). Non-negative Matrices and Markov Chains, Springer-Verlag, New York.Google Scholar
  33. [33]
    R. E. Strauch (1966). Negative dynamic programming, Ann. Math. Stat. 37 871–890.CrossRefGoogle Scholar
  34. [34]
    A. A. Yushkevich (1973). On a class of policies in general Markov decision models, Theor. Prog. Appl. 18 777–779.CrossRefGoogle Scholar
  35. [35]
    R. Varadarajan (1987). Reliability and performance models for reconfigurable computer systems. PhD thesis, University of Pennsylvania, Philadelphia, PA.Google Scholar

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • Arie Leizarowitz

There are no affiliations available

Personalised recommendations