Skip to main content

Abstract

In this chapter, we focus on relative optimization of the long-run average of time-nonhomogeneous continuous-time and continuous-state stochastic processes, with a general Markov model. The under-selectivity issue (meaning that the long-run average does not depend on the actions taken in any finite period) is solved and necessary and sufficient optimality conditions are derived; bias optimization is addressed. State classification is implemented with the notion of state comparability and weak ergodicity: all the states can be classified into weakly ergodic states and branching states, which is slightly different from the ergodic and transient states in time-homogeneous systems; and we show that the former is more natural for optimization. Optimality conditions for multi-class stochastic processes are derived with relative optimization. Optimality conditions for discounted performance are also derived.

To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in sciences [1].

Albert Einstein

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Roughly speaking, the 2nd bias is the bias of bias, and the nth bias is the bias of the \((n-1)\)th bias, \(n=1,2 \ldots \), see Sect. 2.7.

  2. 2.

    This type of interchangeability has been widely studied in calculus, and Ref. [12] discovers that it is one of the main issues in perturbation analysis, and provides some intuitive explanations. We note that in (2.12), the first expectation is on X(t). Because of the “smoothing” nature of the mean value \(E[h[t', X(t')]|X(t)]\), the interchangeability (2.12) can be intuitively explained.

  3. 3.

    For any matrix R and vector g, Rg is a vector, and (Rg)(i) denotes its ith item.

  4. 4.

    This concept originally comes from perturbation analysis (PA) [6, 8]; it is called “realization factor” in PA.

  5. 5.

    This is termed as “ergodic control” in [17], in which \(\eta (t,x)\equiv \eta \) is almost assumed, and dynamic programming is first applied to an infinite-horizon discounted reward problem and the discount factor is then set to vanish.

  6. 6.

    Note the nonsymmetric form of x and y with \({\breve{\mathbb {A}}}_{t, \cdot x}\) and \(\mathbb {A}_{t, \cdot y}\).

  7. 7.

    In history, performance potential was named after the potential energy, because both satisfy the conservation law. It was discovered later that this name is consistent with the solution to a Poisson equation [6].

  8. 8.

    For sets outside \({\mathscr {S}}_t\), the probabilities are zero.

  9. 9.

    In the theorem, we prove that policy \({\widetilde{u}}\) is better; however, there is an issue whether it is admissible, which depends on the special features of every special system and the definition of admissible policies. For diffusion processes, see Chap. 3 for more discussion.

  10. 10.

    For stationary processes, there is no under-selectivity. However, there are many optimal policies; their initial behaviors may be different, and thus bias optimality still makes sense, see, e.g., [6, 23, 24].

  11. 11.

    For the bias-difference formula (2.75) to take a similar form as the performance-difference formula (2.47), we need a modification in the sign of of \(\chi (t,x,y)\); in fact, \(\chi (t,x,y)\) in (2.69) corresponds to the negative value of \(\gamma (t,x,y)\) in (2.21), cf. (4.14) of [6] for discrete-time Markov chains.

  12. 12.

    Note the difference of the signs in the definitions of \(\gamma (t,x,y)\) and \(\chi (t,x,y)\). See footnote 11.

  13. 13.

    More precisely, in (2.75), \({\breve{\mathbb {A}}}_t w (t,x) = g(t,x)\) and g(tx) can be in any form as long as it is a solution to the Poisson equation (2.30) (different solutions exist). For bias comparison, we need to choose g(tx) as the bias, i.e., in the form of (2.67).

  14. 14.

    Roughly speaking, a period \(\mathscr {T}\) with \(\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T I (t \in {\mathscr {T}}) dt =0\).

  15. 15.

    “uniformly in \(t \in [0, \infty )\)” can be replaced with “uniformly in \(t \in [t_0, \infty )\)”, for any \(t_0>0\).

  16. 16.

    Condition (2.108) is quite technical, and it is satisfied for most systems, and this condition and the proof of Lemma 2.13 can be ignored for first reading. Example 2.15 illustrates when it might be violated.

  17. 17.

    See footnote 9.

References

  1. Einstein A, Infeld L (1938) The evolution of physics. Cambridge University Press, Cambridge

    Google Scholar 

  2. Jasso-Fuentes H, Hernández-Lerma O (2009) Blackwell optimality for controlled diffusion processes. J Appl Prob 46:372–391

    Article  MathSciNet  Google Scholar 

  3. Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Stat 37:1284–1294

    Article  MathSciNet  Google Scholar 

  4. Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40:1635–1660

    Article  MathSciNet  Google Scholar 

  5. Cao XR (2016) State classification of time nonhomogeneous Markov chains and average reward optimization of multi-chains. IEEE Trans Autom Control 61:3001–3015

    Article  MathSciNet  Google Scholar 

  6. Cao XR (2007) Stochastic learning and optimization - a sensitivity-based approach. Springer, Berlin

    Google Scholar 

  7. Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Google Scholar 

  8. Cao XR (2015) Optimization of average rewards of time nonhomogeneous Markov chains. IEEE Trans Autom Control 60:1841–1856

    Article  MathSciNet  Google Scholar 

  9. Cao XR (2017) Optimality conditions for long-run average rewards with under selectivity and non-smooth features. IEEE Trans Autom Control 62:4318–4332

    Article  Google Scholar 

  10. Cao XR (2019) State classification and multi-class optimization of continuous-time and continuous-state Markov processes. IEEE Trans Autom Control 64:3632–3646

    Article  Google Scholar 

  11. Çinlar E (1975) Introduction to stochastic processes. Prentice Hall, Englewood Cliffs, New Jersey

    MATH  Google Scholar 

  12. Cao XR (1985) Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Trans Autom Control 30:845–853

    Article  MathSciNet  Google Scholar 

  13. DiBenedetto E (2002) Real analysis. Birkhauser Advanced Texts, Birkhauser

    Google Scholar 

  14. Folland GB (1984) Real analysis: modern techniques and their applications. Wiley, New York

    Google Scholar 

  15. Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer, Berlin

    Google Scholar 

  16. Klebaner FC (2005) Introduction to stochastic calculus with applications, 2nd edn. Imperial Colleague Press, London

    Google Scholar 

  17. Taksar MI (2008) Diffusion optimization models in insurance and finance. University of Texas, Lecture Notes

    Google Scholar 

  18. Billingsley P (1979) Probability and measure. Wiley, New York

    Google Scholar 

  19. Hajnal J (1958) Weak ergodicity in non-homogeneous Markov chains. Proc Canbridge Phylos Soc 54:233–246

    Article  MathSciNet  Google Scholar 

  20. Park Y, Bean JC, Smith RL (1993) Optimal average value convergence in nonhomogeneous Markov decision processes. J Math Anal Appl 179:525–536

    Article  MathSciNet  Google Scholar 

  21. Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer, Berlin

    Google Scholar 

  22. Øksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer, Berlin

    Google Scholar 

  23. Guo XP, Song XY, Zhang JY (2009) Bias optimality for multichain continuous time Markov decision processes. Oper Res Lett 37

    Google Scholar 

  24. Lewis ME, Puterman ML (2001) A probabilistic analysis of bias optimality in unichain Markov decision processes. IEEE Trans Autom Control 46:96–100

    Article  MathSciNet  Google Scholar 

  25. Cao XR. The Nth bias and Blackwell optimality of time nonhomogeneous Markov chains. IEEE Trans Autom Control. Submitted

    Google Scholar 

  26. Zhang JY, Cao XR (2009) Continuous-time Markov decision processes with \(n\)th-bias optimality criteria. Automatica 45:1628–1638

    Article  MathSciNet  Google Scholar 

  27. Cao XR, Zhang JY (2008) The nth-order bias optimality for multi-chain Markov decision processes. IEEE Trans Autom Control 53:496–508

    Article  Google Scholar 

  28. Taylor HM (1976) A Laurent series for the resolvent of a strongly continuous stochastic semi-group. Math Program Stud 6:258–263

    Article  MathSciNet  Google Scholar 

  29. Miller BL (1968) Finite state continuous time Markov decision processes with an infinite planning horizon. J Math Anal Appl 22:552–569

    Article  MathSciNet  Google Scholar 

  30. Prieto-Rumeau T, Hernández-Lerma O (2005) The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains. Math Methods Oper Res 61:123–145

    Article  MathSciNet  Google Scholar 

  31. Prieto-Rumeau T, Hernández-Lerma O (2006) Bias optimality for continuous time controlled Markov chains. SIAM J Control Optim 45:51–73

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi-Ren Cao .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cao, XR. (2020). Optimal Control of Markov Processes: Infinite-Horizon. In: Relative Optimization of Continuous-Time and Continuous-State Stochastic Systems. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-41846-5_2

Download citation

Publish with us

Policies and ethics