In this chapter, we deal with Markov decision processes in semi-Markov environments with the discounted criterion. The model can describe such a system that itself can be modeled by a Markov decision process, but the system is influenced by its environment which is modeled by a semi-Markov process. The influence of the environment on the system occurs when the environment state changes, and consists of the following three things: (1) an instantaneous state (of the system) transition, (2) an instantaneous reward, and (3) the parameters of the Markov decision process change. We study CTMDPs and then SMDPs in semi-Markov environments. Based on them, we study mixed MDPs in a semi-Markov environment, where the underlying MDP model can be either CTMDPs or SMDPs according to which environment states are entered. The criterion considered is the discounted criterion here. The standard results for all the models are obtained.
Unable to display preview. Download preview PDF.