The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.
Cao X-R (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Automat Contr 49:2129–2142CrossRefGoogle Scholar
Cao X-R (2007) Stochastic learning and optimization—a sensitivity-based approach. SpringerGoogle Scholar
Cao X-R (2009a) Stochastic control of continuous-time and continuous-state systems via direct comparison. In: The proceedings of the 48th IEEE conference on decision and control, pp 1593–1598Google Scholar
Cao X-R (2009b) A new model of continuous-time Markov processes and impulse stochastic control. In: The proceedings of the 48th IEEE conference on decision and control, pp 525–530Google Scholar
Cao XR, Zhang JY (2008) The Nth-order bias optimality for multichain Markov decision processes. IEEE Trans Automat Contr 53:496–508CrossRefMathSciNetGoogle Scholar
Cao X-R, Yuan XM, Qiu L (1996) A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans Automat Contr 41:1814–1817MATHCrossRefMathSciNetGoogle Scholar
Cassandras CG, Lafortune S (1999) Introduction to discrete event systems. Kluwer, BostonMATHGoogle Scholar