Policy improvement algorithm for continuous time Markov decision processes with switching costs

  • Bharat Doshi
Part II: Research Reports
Part of the Lecture Notes in Control and Information Sciences book series (LNCIS, volume 16)


This paper deals with computation of an optimal policy for Markov decision processes involving continuous movement as well as switching costs. Recently the author derived conditions for the optimality of a policy in such decision processes. These conditions can be used to verify the optimality of a given stationary policy but cannot be used to obtain one directly. Some computational procedure is needed to arrive at a stationary optimal policy. In this paper we develop an algorithm which generates a sequence of successively improving policies converging (at least along a subsequence) to an optimal stationary policy. Two special cases are considered. The first one is a general continuous-time Markov decision process with a countable state space. In this case the sufficient conditions for optimality suggest an algorithm procedure. It is shown that this algorithm either terminates at a stationary optimal policy or converges to one (at least along a subsequence). The second special case is the case of controlled one dimensional diffusion process. In this case the simple algorithm suggested by the sufficient conditions does give a sequence of successively improving policies. However, this may terminate at or converge to a suboptimal policy. An additional step in the algorithm is proposed. It is shown that this modified algorithm does work. That is, it either terminates at a stationary optimal policy or converges to one along a subsequence. Similarly modified algorithms can be developed for the Markov decision processes in which the underlying process is compound Poisson with a drift. Such processes frequently occur in controlled queues and inventory systems.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Chernoff, H. and Petkau, J. (1977). Optimal Control of a Brownian Motion. Tech. Rpt. Univ. of British Columbia.Google Scholar
  2. [2]
    De Leve, G. and Tijms, H. D. (1974). A General Markov Lecision Method, with Applications to Controlled Queueing System. Mathematisch Centrum, Amsterdam.Google Scholar
  3. [3]
    De Leve, G., Fodergruen, A., and Tijms, H. C. (1976). A General Markov Decision Mathod I:Model and Techniques. Mathematisch Centrum.Google Scholar
  4. [4]
    Doshi, B. T. (1976). Markov Decision Processes with Both Continuous and Lump Costs. Tech. Rpt. Rutgers University.Google Scholar
  5. [5]
    Doshi, B. T. (1978). Optimal Control of a Diffusion Process with Relecting Boundaries and Both Continuous and Lump Costs. to appear in Dynamic Programming and its Applications. Ed. M. Puterman.Google Scholar
  6. [6]
    Doshi, B. T. (1978). Two Mode Control of a Brownian Motion with Quadratic Loss and Switching Costs. Stochastic Processes and Their Applications 6, 277–289.Google Scholar
  7. [7]
    Doshi, B. T. (1978). Production Inventory Control Models with Average Cost Criterion. Submitted for publication.Google Scholar
  8. [8]
    Dynkin, E. (1965). Markov Processes I–II. Academic Press.Google Scholar
  9. [9]
    Mandl, P. (1968). Analytical Treatment of One-Dimensional Markov Processes. Springer-Verlag, New York.Google Scholar
  10. [10]
    Whitt, W. (1975). Continuity of Markov Processes and Dynamic Programs. Tech. Rpt. Yale University.Google Scholar

Copyright information

© Springer-Verlag 1979

Authors and Affiliations

  • Bharat Doshi
    • 1
  1. 1.Departement of StatisticsRutgers UniversityNew BrunswickU.S.A.

Personalised recommendations