Advertisement

Discretetimemarkovdecisionprocesses: Average Criterion

Part of the Advances in Mechanics and Mathematics book series (AMMA, volume 14)

In this chapter, we study average optimality in the discrete time Markov decision processes with countable state space and measurable action sets. The average criterion differs from the discounted criterion. In the discounted criterion, the reward at period n should be discounted to period 0 by multiplying βn. Hence, the smaller the period n is, the more important the reward of period n in the criterion will be. The reverse is also true; that is, the larger the period n is, the less important the reward of period n in the criterion will be. Contrary to it, in the average criterion, the reward in any period accounts for nothing in the criterion. Here, only the future trend of the reward is considered.

Keywords

Optimal Policy Discount Factor Markov Decision Process Reward Function Optimality Equation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media, LLC 2008

Personalised recommendations