Structures of Optimal Policies in MDPs with Unbounded Jumps: The State of Our Art
The derivation of structural properties of countable state Markov decision processes (MDPs) is generally based on sample path methods or value iteration arguments. In the latter case, the method is to inductively prove the structural properties of interest for the n-horizon value function. A limit argument then should allow to deduce the structural properties for the infinite-horizon value function.
In the case of discrete time MDPs with the objective to minimise the total expected α-discounted cost, this procedure is justified under mild conditions. When the objective is to minimise the long run average expected cost, value iteration does not necessarily converge. Allowing time to be continuous does not generate any further complications when the jump rates are bounded as a function of state, due to applicability of uniformisation. However, when the jump rates are unbounded as a function of state, uniformisation is only applicable after a suitable perturbation of the jump rates that does not destroy the desired structural properties. Thus, also a second limit argument is required.
The importance of unbounded rate countable state MDPs has increased lately, due to applications modelling customer or patient impatience and abandonment. The theory validating the required limit arguments however does not seem to be complete, and results are scattered over the literature.
In this chapter our objective has been to provide a systematic way to tackle this problem under relatively mild conditions, and to provide the necessary theory validating the presented approach. The base model is a parametrised Markov process (MP): both perturbed MPs and MDPs are special cases of a parametrised MP. The advantage is that the parameter can simultaneously model a policy and a perturbation.
- 1.I.J.B.F. Adan, V.G. Kulkarni, A.C.C. van Wijk, Optimal control of a server farm. INFOR 51 (4), 241–252 (2013)Google Scholar
- 3.T.M. Apostol, Mathematical Analysis (Addison Wesley Publishing Company, 1974)Google Scholar
- 5.R. Bellman, A Markovian decision process. Technical report, DTIC Document (1957)Google Scholar
- 9.S. Bhulai, H. Blok, F.M. Spieksma, Competing queues with customer abandonment: optimality of a generalised cμ-rule by the smoothed rate truncation method. Technical report, Mathematisch Instituut Leiden (2016)Google Scholar
- 10.P. Billingsley, Convergence of Probability Measures. Wiley Series in Probability and Statistics, 2nd edn. (Wiley, New York, 1999)Google Scholar
- 11.H. Blok, Markov decision processes with unbounded transition rates: structural properties of the relative value function. Master’s thesis, Utrecht University, 2011Google Scholar
- 12.H. Blok, Unbounded-rate Markov Decision Processes and optimal control. Structural properties via a parametrisation approach. Universiteit Leiden (2016). http://pub.math.leidenuniv.nl/~spieksmafm/theses/blok.pdf
- 14.H. Blok, F.M. Spieksma, Structural properties of the server farm model. Technical report, Mathematisch Instituut Leiden (2016, in preparation)Google Scholar
- 15.V.S. Borkar, Topics in Controlled Markov Chains (Longman Scientific & Technical, Harlow, 1991)Google Scholar
- 16.R. Dekker, Denumerable Markov decision chains, optimal policies for small interest rates. PhD thesis, Universiteit Leiden, 1985Google Scholar
- 20.C. Derman, Finite State Markovian Decision Processes (Academic, New York, 1970)Google Scholar
- 21.S. Doroudi, B. Fralix, M. Harchol-Balter, Clearing analysis on phases: exact limiting probabilities for skip-free, unidirectional, Quasi-Birth-Death processes (2016, submitted for publication)Google Scholar
- 24.E.A. Feinberg, Total reward criteria, in Handbook of Markov Decision Processes, ed. by E.A. Feinberg, A. Shwartz. International Series in Operations Research and Management Science, vol. 40, chap. 5 (Kluwer Academic Publishers, Amsterdam, 2002), pp. 155–189Google Scholar
- 27.X.P. Guo, O. Hernández-Lerma, Continuous-Time Markov Decision Processes. Stochastic Modelling and Applied Probability, vol. 62 (Springer, Berlin, 2009)Google Scholar
- 29.A. Hordijk, Dynamic Programming and Markov Potential Theory. Mathematical Centre Tracts, vol. 51 (C.W.I., Amsterdam, 1974)Google Scholar
- 33.G.M. Koole, Monotonicity in Markov Reward and Decision Chains: Theory and Applications, vol. 1 (Now Publishers Inc., Hanover, 2007)Google Scholar
- 36.A. Piunovskiy, Y. Zhang, Discounted continuous-time Markov decision processes with unbounded rates and history dependent policies: the dynamic programming approach. 4OR-Q J. Oper. Res. 12, 49–75 (2014)Google Scholar
- 37.T. Prieto-Rumeau, O. Hernández-Lerma, Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Probab. 49 (4), 1072–1090 (2012)Google Scholar
- 38.T. Prieto-Rumeau, O. Hernández-Lerma, Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. ICP Advanced Texts in Mathematics, vol. 5 (Imperial College Press, London, 2012)Google Scholar
- 39.M.L. Puterman, Markov Decision Processes: Discrete Stochastic Programming, 2nd edn. (Wiley, Hoboken, NJ, 2005)Google Scholar
- 41.H.L. Royden, Real Analysis, 2nd edn. (Macmillan Publishing Company, New York, 1988)Google Scholar
- 44.L.I. Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley Series in Probability and Statistics (Wiley, New York, 1999)Google Scholar
- 46.F.M. Spieksma, Geometrically ergodic Markov Chains and the optimal Control of Queues. PhD thesis, Leiden University, 1990. Available on request from the authorGoogle Scholar
- 49.E.C. Titchmarsh. The Theory of Functions, 2nd edn. (Oxford University Press, Oxford, 1986)Google Scholar
- 52.N.M. van Dijk, Error bounds and comparison results: the Markov reward approach, in Queueing Networks. A Fundamental Approach, ed. by R.J. Boucherie, N.M. van Dijk. International Series in Operations Research and Management Science, chap. 9 (Springer US, 2011), pp. 379–461Google Scholar
- 55.D.V. Widder, The Laplace Transform (Princeton University Press, Princeton, 1946)Google Scholar