Discretetimemarkovdecisionprocesses: Total Reward

doi:10.1007/978-0-387-36951-8_2

Part of the book series: Advances in Mechanics and Mathematics ((AMMA,volume 14))

2994 Accesses

This chapter studies a discrete time Markov decision process with the total reward criterion, where the state space is countable, the action sets are measurable, the reward function is extended real-valued, and the discount factor β Î (–∞,+∞) may be any real number although β Î [0, 1] used to be required in the literature. Two conditions are presented, which are necessary for studying MDPs and are weaker than those presented in the literature. By eliminating some worst actions, the state space S can be partitioned into subsets S∞, S?∞, S0, on which the optimal value function equals +∞,?∞, or is finite, respectively. Furthermore, the validity of the optimality equation is shown when its right-hand side is well defined, especially, when it is restricted to the subset S0. The reward function r(i, a) becomes finite and bounded above in a for each i Î S0. Then, the optimal value function is characterized as a solution of the optimality equation in S0 and the structure of optimal policies is studied. Moreover, successive approximation is studied. Finally, some sufficient conditions for the necessary conditions are presented. The method we use here is elementary. In fact, only some basic concepts from MDPs and discrete time Markov chains are used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

(2008). Discretetimemarkovdecisionprocesses: Total Reward. In: Markov Decision Processes With Their Applications. Advances in Mechanics and Mathematics, vol 14. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-36951-8_2

Download citation

DOI: https://doi.org/10.1007/978-0-387-36951-8_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36950-1
Online ISBN: 978-0-387-36951-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics