Semi-Markov Models pp 39-61 | Cite as
Markov and semi-Markov decision models and optimal stopping
Chapter
Abstract
We consider a system with a finite number of states i ϵ S. Periodically we observe the current state of the system, and then choose an action a from a set A of possible actions. As a result of the current state i and the chosen action a, the system moves to a new state j with the probability pij (a). As a further consequence, an immediate reward r(i, a) is earned. If the control process is stopped in a state i, then we obtain a terminal reward u (i). Thus, the underlying model is given by a tupel M = (S,A,p,r,u).
- (i)
S stands for the state space and is assumed to be finite.
- (ii)
A is the action space.
- (iii)
pij (a) are the transition probabilities, where Σjϵs pij (a) = 1 for all i ϵ S, a ϵ A.
- (iv)
r (i,a) is the real valued one-step reward.
- (v)
u (i) is the real valued terminal reward or the utility function.
Keywords
Decision Model Markov Decision Process Optimality Equation Average Reward Total Reward
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- D. Blackwell (1962). Discrete dynamic programming. Ann. Math. Statist. 33, 719–726.MathSciNetMATHCrossRefGoogle Scholar
- L. Breiman (1964), Stopping-rule problems. Chapter 10 of Applied Combinatorial Mathematics, ed. E. Beckenbach, Wiley, New York.Google Scholar
- H.-H. Bodewig (1985), Markow-Entscheidungsmodelle mit nichtkonvergent en Gesamtkosten. Doctoral thesis, Inst. Angew.Math., Univ. Bonn.Google Scholar
- R.Ya. Chitashvili (1975). A controlled finite Markov chain with an arbitrary set of decisions. Siam Theory Prob. Appl. 20, 839–847.MATHCrossRefGoogle Scholar
- Y.S. Chow, H. Robbins, D. Siegmund (1971), Great expectations: the theory of optimal stopping. Boston: Houghton Mifflin Company.MATHGoogle Scholar
- H. Deppe (1984). On the existence of average optimal policies in semi-regenerative decision models. Math. Op. Res. 9, 558–575.MathSciNetMATHCrossRefGoogle Scholar
- L.E. Dubins, L. J. Savage (1965), How to gamble if you must. New York, McGraw-Hill.MATHGoogle Scholar
- A. Hordijk (1974), Dynamic Programming and Markov Potential Theory. Amsterdam, Mathematical Centre Tracts 51.MATHGoogle Scholar
- A. Hordijk, R. Dekker (1983), Denumerable Markov Chains, sensitive optimality criteria. Operations Research Proc. 1982, Springer-Verlag Berlin Heidelberg.Google Scholar
- E. Mann (1983), Optimality equations and bias-optimality in bounded Markov decision processes. Univ. Bonn, Inst. Angew. Math. SFB 72, preprint no 574, to appear in: optimization 16 (1985).Google Scholar
- B.L. Miller (1981). Countable-state average-cost regenerative stopping problems. J. Appl. Prob. 18, 361–377.MATHCrossRefGoogle Scholar
- B.L. Miller, A.F. Jr. Veinott (1969), Discrete dynamic programming with a small interest rate.Google Scholar
- M. Schäl (1983). Stationary policies in dynamic programming models under compactness assumptions. Math. Op. Res. 8, 336–372.CrossRefGoogle Scholar
- M. Schäl (1984a), Markovian decision models with bounded finite-state rewards. Operations Research Proc. 1983, Springer-Verlag Berlin Heidelberg.Google Scholar
- — (1984b), On the value iteration in Markov decision models. Zamm 65.Google Scholar
- — (1985), Optimal stopping and leavable gambling models with the average return criterion. To be published.Google Scholar
- W.D. Sudderth (1971). On measurable gambling problems. Ann. Math. Statist. 42, 260–269.MathSciNetMATHCrossRefGoogle Scholar
- P.J. Schweitzer (1968). Perturbation theory and finite Markov chains. J. Appl.Prob, 5, 401–413.MathSciNetMATHCrossRefGoogle Scholar
- W.H.M. Zijm (1982), The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the bounded cost case. A & E report 2182, Univ. of Amsterdam.Google Scholar
Copyright information
© Springer Science+Business Media New York 1986