# Markov and semi-Markov decision models and optimal stopping

• Manfred Schäl
Chapter

## Abstract

We consider a system with a finite number of states i ϵ S. Periodically we observe the current state of the system, and then choose an action a from a set A of possible actions. As a result of the current state i and the chosen action a, the system moves to a new state j with the probability pij (a). As a further consequence, an immediate reward r(i, a) is earned. If the control process is stopped in a state i, then we obtain a terminal reward u (i). Thus, the underlying model is given by a tupel M = (S,A,p,r,u).
1. (i)

S stands for the state space and is assumed to be finite.

2. (ii)

A is the action space.

3. (iii)

pij (a) are the transition probabilities, where Σjϵs pij (a) = 1 for all i ϵ S, a ϵ A.

4. (iv)

r (i,a) is the real valued one-step reward.

5. (v)

u (i) is the real valued terminal reward or the utility function.

## Keywords

Decision Model Markov Decision Process Optimality Equation Average Reward Total Reward
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. D. Blackwell (1962). Discrete dynamic programming. Ann. Math. Statist. 33, 719–726.
2. L. Breiman (1964), Stopping-rule problems. Chapter 10 of Applied Combinatorial Mathematics, ed. E. Beckenbach, Wiley, New York.Google Scholar
3. H.-H. Bodewig (1985), Markow-Entscheidungsmodelle mit nichtkonvergent en Gesamtkosten. Doctoral thesis, Inst. Angew.Math., Univ. Bonn.Google Scholar
4. R.Ya. Chitashvili (1975). A controlled finite Markov chain with an arbitrary set of decisions. Siam Theory Prob. Appl. 20, 839–847.
5. Y.S. Chow, H. Robbins, D. Siegmund (1971), Great expectations: the theory of optimal stopping. Boston: Houghton Mifflin Company.
6. H. Deppe (1984). On the existence of average optimal policies in semi-regenerative decision models. Math. Op. Res. 9, 558–575.
7. L.E. Dubins, L. J. Savage (1965), How to gamble if you must. New York, McGraw-Hill.
8. A. Hordijk (1974), Dynamic Programming and Markov Potential Theory. Amsterdam, Mathematical Centre Tracts 51.
9. A. Hordijk, R. Dekker (1983), Denumerable Markov Chains, sensitive optimality criteria. Operations Research Proc. 1982, Springer-Verlag Berlin Heidelberg.Google Scholar
10. E. Mann (1983), Optimality equations and bias-optimality in bounded Markov decision processes. Univ. Bonn, Inst. Angew. Math. SFB 72, preprint no 574, to appear in: optimization 16 (1985).Google Scholar
11. B.L. Miller (1981). Countable-state average-cost regenerative stopping problems. J. Appl. Prob. 18, 361–377.
12. B.L. Miller, A.F. Jr. Veinott (1969), Discrete dynamic programming with a small interest rate.Google Scholar
13. M. Schäl (1983). Stationary policies in dynamic programming models under compactness assumptions. Math. Op. Res. 8, 336–372.
14. M. Schäl (1984a), Markovian decision models with bounded finite-state rewards. Operations Research Proc. 1983, Springer-Verlag Berlin Heidelberg.Google Scholar
15. — (1984b), On the value iteration in Markov decision models. Zamm 65.Google Scholar
16. — (1985), Optimal stopping and leavable gambling models with the average return criterion. To be published.Google Scholar
17. W.D. Sudderth (1971). On measurable gambling problems. Ann. Math. Statist. 42, 260–269.
18. P.J. Schweitzer (1968). Perturbation theory and finite Markov chains. J. Appl.Prob, 5, 401–413.
19. W.H.M. Zijm (1982), The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the bounded cost case. A & E report 2182, Univ. of Amsterdam.Google Scholar