Discretetimemarkovdecisionprocesses: Total Reward

Part of the Advances in Mechanics and Mathematics book series (AMMA, volume 14)

This chapter studies a discrete time Markov decision process with the total reward criterion, where the state space is countable, the action sets are measurable, the reward function is extended real-valued, and the discount factor β Î (–∞,+) may be any real number although β Î [0, 1] used to be required in the literature. Two conditions are presented, which are necessary for studying MDPs and are weaker than those presented in the literature. By eliminating some worst actions, the state space S can be partitioned into subsets S∞, S?∞, S0, on which the optimal value function equals +∞,?∞, or is finite, respectively. Furthermore, the validity of the optimality equation is shown when its right-hand side is well defined, especially, when it is restricted to the subset S0. The reward function r(i, a) becomes finite and bounded above in a for each i Î S0. Then, the optimal value function is characterized as a solution of the optimality equation in S0 and the structure of optimal policies is studied. Moreover, successive approximation is studied. Finally, some sufficient conditions for the necessary conditions are presented. The method we use here is elementary. In fact, only some basic concepts from MDPs and discrete time Markov chains are used.


Optimal Policy Discount Factor Markov Decision Process Reward Function Optimality Equation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media, LLC 2008

Personalised recommendations