Constrained Continuous-Time Markov Decision Processes on the Finite Horizon
- 254 Downloads
This paper studies the constrained (nonhomogeneous) continuous-time Markov decision processes on the finite horizon. The performance criterion to be optimized is the expected total reward on the finite horizon, while N constraints are imposed on similar expected costs. Introducing the appropriate notion of the occupation measures for the concerned optimal control problem, we establish the following under some suitable conditions: (a) the class of Markov policies is sufficient; (b) every extreme point of the space of performance vectors is generated by a deterministic Markov policy; and (c) there exists an optimal Markov policy, which is a mixture of no more than \(N+1\) deterministic Markov policies.
KeywordsContinuous-time Markov decision process Constrained-optimality Finite horizon Mixture of N + 1 deterministic Markov policies Occupation measure
Mathematics Subject Classification90C40 60J27
- 8.Ghosh, M.K., Saha, S.: Continuous-time controlled jump Markov processes on the finite horizon. In: Optimization, Control, and Applications of Stochastic Systems, pp. 99–109. Birkhäuser, New York (2012)Google Scholar
- 18.Huang, Y.H.: Finite horizon continuous-time Markov decision processes with mean and variance criteria. Submitted (2015)Google Scholar