Abstract
In this chapter, we focus on relative optimization of the long-run average of time-nonhomogeneous continuous-time and continuous-state stochastic processes, with a general Markov model. The under-selectivity issue (meaning that the long-run average does not depend on the actions taken in any finite period) is solved and necessary and sufficient optimality conditions are derived; bias optimization is addressed. State classification is implemented with the notion of state comparability and weak ergodicity: all the states can be classified into weakly ergodic states and branching states, which is slightly different from the ergodic and transient states in time-homogeneous systems; and we show that the former is more natural for optimization. Optimality conditions for multi-class stochastic processes are derived with relative optimization. Optimality conditions for discounted performance are also derived.
To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in sciences [1].
Albert Einstein
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Roughly speaking, the 2nd bias is the bias of bias, and the nth bias is the bias of the \((n-1)\)th bias, \(n=1,2 \ldots \), see Sect. 2.7.
- 2.
This type of interchangeability has been widely studied in calculus, and Ref. [12] discovers that it is one of the main issues in perturbation analysis, and provides some intuitive explanations. We note that in (2.12), the first expectation is on X(t). Because of the “smoothing” nature of the mean value \(E[h[t', X(t')]|X(t)]\), the interchangeability (2.12) can be intuitively explained.
- 3.
For any matrix R and vector g, Rg is a vector, and (Rg)(i) denotes its ith item.
- 4.
- 5.
This is termed as “ergodic control” in [17], in which \(\eta (t,x)\equiv \eta \) is almost assumed, and dynamic programming is first applied to an infinite-horizon discounted reward problem and the discount factor is then set to vanish.
- 6.
Note the nonsymmetric form of x and y with \({\breve{\mathbb {A}}}_{t, \cdot x}\) and \(\mathbb {A}_{t, \cdot y}\).
- 7.
In history, performance potential was named after the potential energy, because both satisfy the conservation law. It was discovered later that this name is consistent with the solution to a Poisson equation [6].
- 8.
For sets outside \({\mathscr {S}}_t\), the probabilities are zero.
- 9.
In the theorem, we prove that policy \({\widetilde{u}}\) is better; however, there is an issue whether it is admissible, which depends on the special features of every special system and the definition of admissible policies. For diffusion processes, see Chap. 3 for more discussion.
- 10.
- 11.
For the bias-difference formula (2.75) to take a similar form as the performance-difference formula (2.47), we need a modification in the sign of of \(\chi (t,x,y)\); in fact, \(\chi (t,x,y)\) in (2.69) corresponds to the negative value of \(\gamma (t,x,y)\) in (2.21), cf. (4.14) of [6] for discrete-time Markov chains.
- 12.
Note the difference of the signs in the definitions of \(\gamma (t,x,y)\) and \(\chi (t,x,y)\). See footnote 11.
- 13.
- 14.
Roughly speaking, a period \(\mathscr {T}\) with \(\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T I (t \in {\mathscr {T}}) dt =0\).
- 15.
“uniformly in \(t \in [0, \infty )\)” can be replaced with “uniformly in \(t \in [t_0, \infty )\)”, for any \(t_0>0\).
- 16.
- 17.
See footnote 9.
References
Einstein A, Infeld L (1938) The evolution of physics. Cambridge University Press, Cambridge
Jasso-Fuentes H, Hernández-Lerma O (2009) Blackwell optimality for controlled diffusion processes. J Appl Prob 46:372–391
Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Stat 37:1284–1294
Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40:1635–1660
Cao XR (2016) State classification of time nonhomogeneous Markov chains and average reward optimization of multi-chains. IEEE Trans Autom Control 61:3001–3015
Cao XR (2007) Stochastic learning and optimization - a sensitivity-based approach. Springer, Berlin
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Cao XR (2015) Optimization of average rewards of time nonhomogeneous Markov chains. IEEE Trans Autom Control 60:1841–1856
Cao XR (2017) Optimality conditions for long-run average rewards with under selectivity and non-smooth features. IEEE Trans Autom Control 62:4318–4332
Cao XR (2019) State classification and multi-class optimization of continuous-time and continuous-state Markov processes. IEEE Trans Autom Control 64:3632–3646
Çinlar E (1975) Introduction to stochastic processes. Prentice Hall, Englewood Cliffs, New Jersey
Cao XR (1985) Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Trans Autom Control 30:845–853
DiBenedetto E (2002) Real analysis. Birkhauser Advanced Texts, Birkhauser
Folland GB (1984) Real analysis: modern techniques and their applications. Wiley, New York
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer, Berlin
Klebaner FC (2005) Introduction to stochastic calculus with applications, 2nd edn. Imperial Colleague Press, London
Taksar MI (2008) Diffusion optimization models in insurance and finance. University of Texas, Lecture Notes
Billingsley P (1979) Probability and measure. Wiley, New York
Hajnal J (1958) Weak ergodicity in non-homogeneous Markov chains. Proc Canbridge Phylos Soc 54:233–246
Park Y, Bean JC, Smith RL (1993) Optimal average value convergence in nonhomogeneous Markov decision processes. J Math Anal Appl 179:525–536
Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer, Berlin
Øksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer, Berlin
Guo XP, Song XY, Zhang JY (2009) Bias optimality for multichain continuous time Markov decision processes. Oper Res Lett 37
Lewis ME, Puterman ML (2001) A probabilistic analysis of bias optimality in unichain Markov decision processes. IEEE Trans Autom Control 46:96–100
Cao XR. The Nth bias and Blackwell optimality of time nonhomogeneous Markov chains. IEEE Trans Autom Control. Submitted
Zhang JY, Cao XR (2009) Continuous-time Markov decision processes with \(n\)th-bias optimality criteria. Automatica 45:1628–1638
Cao XR, Zhang JY (2008) The nth-order bias optimality for multi-chain Markov decision processes. IEEE Trans Autom Control 53:496–508
Taylor HM (1976) A Laurent series for the resolvent of a strongly continuous stochastic semi-group. Math Program Stud 6:258–263
Miller BL (1968) Finite state continuous time Markov decision processes with an infinite planning horizon. J Math Anal Appl 22:552–569
Prieto-Rumeau T, Hernández-Lerma O (2005) The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains. Math Methods Oper Res 61:123–145
Prieto-Rumeau T, Hernández-Lerma O (2006) Bias optimality for continuous time controlled Markov chains. SIAM J Control Optim 45:51–73
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cao, XR. (2020). Optimal Control of Markov Processes: Infinite-Horizon. In: Relative Optimization of Continuous-Time and Continuous-State Stochastic Systems. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-41846-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-41846-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41845-8
Online ISBN: 978-3-030-41846-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)