Discrete Event Dynamic Systems

, Volume 21, Issue 1, pp 11–38 | Cite as

Stochastic control via direct comparison

  • Xi-Ren Cao
  • De-Xin Wang
  • Tao Lu
  • Yifan Xu


The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.


Dynamic programming Markov decision processes HJB equation Performance potentials Poisson equation Perturbation analysis Sensitivity-based optimization 


  1. Bertsekas DP (2007) Dynamic programming and optimal control, vols I and II. Athena Scientific, BelmontGoogle Scholar
  2. Bertsekas DP, Tsitsiklis TN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATHGoogle Scholar
  3. Billingsley P (1979) Probability and measure. Wiley, New YorkMATHGoogle Scholar
  4. Brockett R (2009) Stochastic control. Lecture Notes, Harvard UniversityGoogle Scholar
  5. Cao X-R (2003) From perturbation analysis to Markov decision processes and reinforcement learning. Discrete Event Dyn Syst 13:9–39MATHCrossRefMathSciNetGoogle Scholar
  6. Cao X-R (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Automat Contr 49:2129–2142CrossRefGoogle Scholar
  7. Cao X-R (2007) Stochastic learning and optimization—a sensitivity-based approach. SpringerGoogle Scholar
  8. Cao X-R (2009a) Stochastic control of continuous-time and continuous-state systems via direct comparison. In: The proceedings of the 48th IEEE conference on decision and control, pp 1593–1598Google Scholar
  9. Cao X-R (2009b) A new model of continuous-time Markov processes and impulse stochastic control. In: The proceedings of the 48th IEEE conference on decision and control, pp 525–530Google Scholar
  10. Cao XR, Zhang JY (2008) The Nth-order bias optimality for multichain Markov decision processes. IEEE Trans Automat Contr 53:496–508CrossRefMathSciNetGoogle Scholar
  11. Cao X-R, Yuan XM, Qiu L (1996) A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans Automat Contr 41:1814–1817MATHCrossRefMathSciNetGoogle Scholar
  12. Cassandras CG, Lafortune S (1999) Introduction to discrete event systems. Kluwer, BostonMATHGoogle Scholar
  13. De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51:850–865MATHCrossRefMathSciNetGoogle Scholar
  14. Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. WileyGoogle Scholar
  15. Feinberg EA, Shwartz A (eds) (2002) Handbook of Markov decision processes: methods and application. Kluwer, BostonGoogle Scholar
  16. Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. SpringerGoogle Scholar
  17. Glynn PW, Meyn SP (1996) A Lyapunov bound for solutions of the Poisson equation. Ann Probab 24:916–931MATHCrossRefMathSciNetGoogle Scholar
  18. Hernandez-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New YorkGoogle Scholar
  19. Ho YC, Cao X-R (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer, BostonMATHGoogle Scholar
  20. Hojgaard B, Taksar M (2009) Diffusion optimization models in insurance and finance. PreprintGoogle Scholar
  21. Howard RA (1960) Dynamic programming and Markov processes. WileyGoogle Scholar
  22. Karatzas I, Shreve SE (1991) Brownian motion and stochastic calculus, 2nd edn. SpringerGoogle Scholar
  23. Kumar S, Muthuraman K (2004) A numerical method for solving singular stochastic control problems. Oper Res 52:563–582MATHCrossRefMathSciNetGoogle Scholar
  24. Meyn SP (1997) The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans Automat Contr 42:1663–1680MATHCrossRefMathSciNetGoogle Scholar
  25. Meyn SP, Tweedie RL (1993) Stability of Markovian processes III: Foster–Lyapunov criteria for continuous time processes. Adv Appl Probab 25:518–548MATHCrossRefMathSciNetGoogle Scholar
  26. Meyn SP, Tweedie RL (2009) Markov chains and stochastic stability, 2nd edn. Cambridge University Press, LondonMATHGoogle Scholar
  27. Oksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. SpringerGoogle Scholar
  28. Philbrick CR Jr, Kitanidis PK (2001) Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper Res 49:398–468MATHCrossRefMathSciNetGoogle Scholar
  29. Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New YorkMATHCrossRefGoogle Scholar
  30. Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. WileyGoogle Scholar
  31. Revuz D, Yor M (1991) Continuous martingales and Brownian motion. SpringerGoogle Scholar
  32. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, CambridgeGoogle Scholar
  33. Schweitzer PJ (1968) Perturbation theory and finite Markov chains. J Appl Probab 5(2):401–413MATHCrossRefMathSciNetGoogle Scholar
  34. Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40(5):1635–1660MATHCrossRefMathSciNetGoogle Scholar
  35. Xia L, Cao X-R (2006) Relationship between perturbation realization factors with queueing models and Markov models. IEEE Trans Automat Contr 51(10):1699–1704CrossRefMathSciNetGoogle Scholar
  36. Xia L, Chen X, Cao X-R (2009) Policy iteration of customer-average performance in queueing systems. Automatica 45:1639–1648MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Shanghai Jiaotong UniversityShanghaiChina
  2. 2.Hong Kong University of Science and TechnologyHong KongHong Kong
  3. 3.Fudan UniversityShanghaiChina

Personalised recommendations