Optimal Control of Markov Chains

  • Shaler StidhamJr.
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 24)


Add costs and decisions to a Markov chain and you have a Markov decision chain (MDC), the subject of this chapter. Roughly speaking, the problem is how to control a Markov chain to achieve an economic objective. Control is exercised by taking a sequence of actions, each of which may depend on the currently observed state and may influence both the immediate cost and the next state transition.


Decision Rule Optimal Policy Markov Decision Process Computational Probability Infinite Horizon 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Bertsekas, 1995]
    Bertsekas, D. (1995). Dynamic Programming and Optimal Control, Vol. H. Athena Scientific, Belmont, Massachusetts.Google Scholar
  2. [Lippman and Stidham, 1977]
    Lippman, S. and Stidham, S. (1977). Individual versus social optimization in exponential congestion systems. Operations Research, 25: 233–247.CrossRefGoogle Scholar
  3. [Lippman, 1975]
    Lippman, S. A. (1975). Applying a new device in the optimization of exponential queuing systems. Operations Research, 23: 687–710.CrossRefGoogle Scholar
  4. [Puterman, 1994]
    Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.CrossRefGoogle Scholar
  5. [Schäl, 1975]
    Schäl, M. (1975). Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie verw. Geb, 32: 179–196.Google Scholar
  6. [Sennott, 1989]
    Sennott, L. (1989). Average cost optimal stationary policies in finite state Markov decision processes with unbounded costs. Operations Research, 37: 626–633.CrossRefGoogle Scholar
  7. [Sennott, 1998]
    Sennott, L. (1998). Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York. (forthcoming).CrossRefGoogle Scholar
  8. [Serfozo, 1979]
    Serfozo, R. (1979). An equivalence between continuous and discrete-time Markov decision processes. Operations Research, 27: 616–620.CrossRefGoogle Scholar
  9. [Stidham, 1981]
    Stidham, S. (1981). On the convergence of successive approximations in dynamic programming with non-zero terminal reward. Z. Operations Res, 25: 57–77.Google Scholar
  10. [Stidham, 1985]
    Stidham, S. (1985). Optimal control of admission to a queueing system. IEEE Transactions on Automatic Control, 30: 705–713.CrossRefGoogle Scholar
  11. [Stidham, 1988]
    Stidham, S. (1988). Scheduling, routing, and flow control in stochastic networks. Stochastic Differential Systems, Stochastic Control Theory and Applications, IMA-10:529–561. W. Fleming and P.L. Lions, eds.Google Scholar
  12. [Stidham, 1994]
    Stidham, S. (1994). Successive approximations for Markovian decision processes with unbounded rewards: a review. In Kelly, F., editor, Probability, Statistics, and Optimisation, pages 467–483, London. John Wiley and Sons Ltd.Google Scholar
  13. [Stidham and van Nunen, 1983]
    Stidham, S. and van Nunen, J. (1983). The shift-function approach for Markov decision processes with unbounded returns. Research Report, Program in Operations Research, N.C. State University, Raleigh.Google Scholar
  14. [van der Wal, 1974]
    van der Wal, J. (1974). Stochastic Dynamic Programming. Math. Centre Tract 139. Mathematisch Centrum, Amsterdam.Google Scholar
  15. van Hee et al., 1977] van Hee, K., Hordijk, A., and van der Wal, J. (1977). Successive approximations for convergent dynamic programming In Tijms, H. and Wessels, J., editors, Markov Decision Theory. Math. Centre Tract 93,pages 183–211, Amsterdam Mathematisch Centrum.Google Scholar
  16. [Weber and Stidham, 1987]
    Weber, R. and Stidham, S. (1987). Control of service rates in networks of queues. Advances in Applied Probability, 24: 202–218.CrossRefGoogle Scholar
  17. [Whittle, 1979]
    Whittle, P. (1979). A simple condition for regularity in negative programming. J. Appl. Prob., 16: 305–318.CrossRefGoogle Scholar
  18. [Whittle, 1980]
    Whittle, P. (1980). Stability and characterisation conditions in negative programming. J. Appl. Prob., 17: 635–645.CrossRefGoogle Scholar
  19. [Whittle, 1983]
    Whittle, P. (1983). Optimization over Time: Dynamic Programming and Stochastic Control, Vol. II. John Wiley, New York.Google Scholar

Copyright information

© Springer Science+Business Media New York 2000

Authors and Affiliations

  • Shaler StidhamJr.
    • 1
  1. 1.Department of Operations ResearchUniversity of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations