Discretetimemarkovdecisionprocesses: Average Criterion
In this chapter, we study average optimality in the discrete time Markov decision processes with countable state space and measurable action sets. The average criterion differs from the discounted criterion. In the discounted criterion, the reward at period n should be discounted to period 0 by multiplying βn. Hence, the smaller the period n is, the more important the reward of period n in the criterion will be. The reverse is also true; that is, the larger the period n is, the less important the reward of period n in the criterion will be. Contrary to it, in the average criterion, the reward in any period accounts for nothing in the criterion. Here, only the future trend of the reward is considered.
KeywordsOptimal Policy Discount Factor Markov Decision Process Reward Function Optimality Equation
Unable to display preview. Download preview PDF.