Optimal Control of Markov Processes: Infinite-Horizon

Cao, Xi-Ren

doi:10.1007/978-3-030-41846-5_2

Xi-Ren Cao ORCID: orcid.org/0000-0001-5165-8804^6,7

Part of the book series: Communications and Control Engineering ((CCE))

329 Accesses

Abstract

In this chapter, we focus on relative optimization of the long-run average of time-nonhomogeneous continuous-time and continuous-state stochastic processes, with a general Markov model. The under-selectivity issue (meaning that the long-run average does not depend on the actions taken in any finite period) is solved and necessary and sufficient optimality conditions are derived; bias optimization is addressed. State classification is implemented with the notion of state comparability and weak ergodicity: all the states can be classified into weakly ergodic states and branching states, which is slightly different from the ergodic and transient states in time-homogeneous systems; and we show that the former is more natural for optimization. Optimality conditions for multi-class stochastic processes are derived with relative optimization. Optimality conditions for discounted performance are also derived.

To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in sciences [1].

Albert Einstein

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Roughly speaking, the 2nd bias is the bias of bias, and the nth bias is the bias of the \((n-1)\)th bias, \(n=1,2 \ldots \), see Sect. 2.7.
2.
This type of interchangeability has been widely studied in calculus, and Ref. [12] discovers that it is one of the main issues in perturbation analysis, and provides some intuitive explanations. We note that in (2.12), the first expectation is on X(t). Because of the “smoothing” nature of the mean value \(E[h[t', X(t')]|X(t)]\), the interchangeability (2.12) can be intuitively explained.
3.
For any matrix R and vector g, Rg is a vector, and (Rg)(i) denotes its ith item.
4.
This concept originally comes from perturbation analysis (PA) [6, 8]; it is called “realization factor” in PA.
5.
This is termed as “ergodic control” in [17], in which \(\eta (t,x)\equiv \eta \) is almost assumed, and dynamic programming is first applied to an infinite-horizon discounted reward problem and the discount factor is then set to vanish.
6.
Note the nonsymmetric form of x and y with \({\breve{\mathbb {A}}}_{t, \cdot x}\) and \(\mathbb {A}_{t, \cdot y}\).
7.
In history, performance potential was named after the potential energy, because both satisfy the conservation law. It was discovered later that this name is consistent with the solution to a Poisson equation [6].
8.
For sets outside \({\mathscr {S}}_t\), the probabilities are zero.
9.
In the theorem, we prove that policy \({\widetilde{u}}\) is better; however, there is an issue whether it is admissible, which depends on the special features of every special system and the definition of admissible policies. For diffusion processes, see Chap. 3 for more discussion.
10.
For stationary processes, there is no under-selectivity. However, there are many optimal policies; their initial behaviors may be different, and thus bias optimality still makes sense, see, e.g., [6, 23, 24].
11.
For the bias-difference formula (2.75) to take a similar form as the performance-difference formula (2.47), we need a modification in the sign of of \(\chi (t,x,y)\); in fact, \(\chi (t,x,y)\) in (2.69) corresponds to the negative value of \(\gamma (t,x,y)\) in (2.21), cf. (4.14) of [6] for discrete-time Markov chains.
12.
Note the difference of the signs in the definitions of \(\gamma (t,x,y)\) and \(\chi (t,x,y)\). See footnote 11.
13.
More precisely, in (2.75), \({\breve{\mathbb {A}}}_t w (t,x) = g(t,x)\) and g(t, x) can be in any form as long as it is a solution to the Poisson equation (2.30) (different solutions exist). For bias comparison, we need to choose g(t, x) as the bias, i.e., in the form of (2.67).
14.
Roughly speaking, a period \(\mathscr {T}\) with \(\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T I (t \in {\mathscr {T}}) dt =0\).
15.
“uniformly in \(t \in [0, \infty )\)” can be replaced with “uniformly in \(t \in [t_0, \infty )\)”, for any \(t_0>0\).
16.
Condition (2.108) is quite technical, and it is satisfied for most systems, and this condition and the proof of Lemma 2.13 can be ignored for first reading. Example 2.15 illustrates when it might be violated.
17.
See footnote 9.

References

Einstein A, Infeld L (1938) The evolution of physics. Cambridge University Press, Cambridge
Google Scholar
Jasso-Fuentes H, Hernández-Lerma O (2009) Blackwell optimality for controlled diffusion processes. J Appl Prob 46:372–391
Article MathSciNet Google Scholar
Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Stat 37:1284–1294
Article MathSciNet Google Scholar
Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40:1635–1660
Article MathSciNet Google Scholar
Cao XR (2016) State classification of time nonhomogeneous Markov chains and average reward optimization of multi-chains. IEEE Trans Autom Control 61:3001–3015
Article MathSciNet Google Scholar
Cao XR (2007) Stochastic learning and optimization - a sensitivity-based approach. Springer, Berlin
Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Google Scholar
Cao XR (2015) Optimization of average rewards of time nonhomogeneous Markov chains. IEEE Trans Autom Control 60:1841–1856
Article MathSciNet Google Scholar
Cao XR (2017) Optimality conditions for long-run average rewards with under selectivity and non-smooth features. IEEE Trans Autom Control 62:4318–4332
Article Google Scholar
Cao XR (2019) State classification and multi-class optimization of continuous-time and continuous-state Markov processes. IEEE Trans Autom Control 64:3632–3646
Article Google Scholar
Çinlar E (1975) Introduction to stochastic processes. Prentice Hall, Englewood Cliffs, New Jersey
MATH Google Scholar
Cao XR (1985) Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Trans Autom Control 30:845–853
Article MathSciNet Google Scholar
DiBenedetto E (2002) Real analysis. Birkhauser Advanced Texts, Birkhauser
Google Scholar
Folland GB (1984) Real analysis: modern techniques and their applications. Wiley, New York
Google Scholar
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer, Berlin
Google Scholar
Klebaner FC (2005) Introduction to stochastic calculus with applications, 2nd edn. Imperial Colleague Press, London
Google Scholar
Taksar MI (2008) Diffusion optimization models in insurance and finance. University of Texas, Lecture Notes
Google Scholar
Billingsley P (1979) Probability and measure. Wiley, New York
Google Scholar
Hajnal J (1958) Weak ergodicity in non-homogeneous Markov chains. Proc Canbridge Phylos Soc 54:233–246
Article MathSciNet Google Scholar
Park Y, Bean JC, Smith RL (1993) Optimal average value convergence in nonhomogeneous Markov decision processes. J Math Anal Appl 179:525–536
Article MathSciNet Google Scholar
Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer, Berlin
Google Scholar
Øksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer, Berlin
Google Scholar
Guo XP, Song XY, Zhang JY (2009) Bias optimality for multichain continuous time Markov decision processes. Oper Res Lett 37
Google Scholar
Lewis ME, Puterman ML (2001) A probabilistic analysis of bias optimality in unichain Markov decision processes. IEEE Trans Autom Control 46:96–100
Article MathSciNet Google Scholar
Cao XR. The Nth bias and Blackwell optimality of time nonhomogeneous Markov chains. IEEE Trans Autom Control. Submitted
Google Scholar
Zhang JY, Cao XR (2009) Continuous-time Markov decision processes with \(n\)th-bias optimality criteria. Automatica 45:1628–1638
Article MathSciNet Google Scholar
Cao XR, Zhang JY (2008) The nth-order bias optimality for multi-chain Markov decision processes. IEEE Trans Autom Control 53:496–508
Article Google Scholar
Taylor HM (1976) A Laurent series for the resolvent of a strongly continuous stochastic semi-group. Math Program Stud 6:258–263
Article MathSciNet Google Scholar
Miller BL (1968) Finite state continuous time Markov decision processes with an infinite planning horizon. J Math Anal Appl 22:552–569
Article MathSciNet Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2005) The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains. Math Methods Oper Res 61:123–145
Article MathSciNet Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2006) Bias optimality for continuous time controlled Markov chains. SIAM J Control Optim 45:51–73
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Xi-Ren Cao
Professor Emeritus, Department of Electrical and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Xi-Ren Cao

Authors

Xi-Ren Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi-Ren Cao .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cao, XR. (2020). Optimal Control of Markov Processes: Infinite-Horizon. In: Relative Optimization of Continuous-Time and Continuous-State Stochastic Systems. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-41846-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-41846-5_2
Published: 14 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41845-8
Online ISBN: 978-3-030-41846-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics