Advertisement

Markov and semi-Markov decision models and optimal stopping

  • Manfred Schäl

Abstract

We consider a system with a finite number of states i ϵ S. Periodically we observe the current state of the system, and then choose an action a from a set A of possible actions. As a result of the current state i and the chosen action a, the system moves to a new state j with the probability pij (a). As a further consequence, an immediate reward r(i, a) is earned. If the control process is stopped in a state i, then we obtain a terminal reward u (i). Thus, the underlying model is given by a tupel M = (S,A,p,r,u).
  1. (i)

    S stands for the state space and is assumed to be finite.

     
  2. (ii)

    A is the action space.

     
  3. (iii)

    pij (a) are the transition probabilities, where Σjϵs pij (a) = 1 for all i ϵ S, a ϵ A.

     
  4. (iv)

    r (i,a) is the real valued one-step reward.

     
  5. (v)

    u (i) is the real valued terminal reward or the utility function.

     

Keywords

Decision Model Markov Decision Process Optimality Equation Average Reward Total Reward 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Blackwell (1962). Discrete dynamic programming. Ann. Math. Statist. 33, 719–726.MathSciNetzbMATHCrossRefGoogle Scholar
  2. L. Breiman (1964), Stopping-rule problems. Chapter 10 of Applied Combinatorial Mathematics, ed. E. Beckenbach, Wiley, New York.Google Scholar
  3. H.-H. Bodewig (1985), Markow-Entscheidungsmodelle mit nichtkonvergent en Gesamtkosten. Doctoral thesis, Inst. Angew.Math., Univ. Bonn.Google Scholar
  4. R.Ya. Chitashvili (1975). A controlled finite Markov chain with an arbitrary set of decisions. Siam Theory Prob. Appl. 20, 839–847.zbMATHCrossRefGoogle Scholar
  5. Y.S. Chow, H. Robbins, D. Siegmund (1971), Great expectations: the theory of optimal stopping. Boston: Houghton Mifflin Company.zbMATHGoogle Scholar
  6. H. Deppe (1984). On the existence of average optimal policies in semi-regenerative decision models. Math. Op. Res. 9, 558–575.MathSciNetzbMATHCrossRefGoogle Scholar
  7. L.E. Dubins, L. J. Savage (1965), How to gamble if you must. New York, McGraw-Hill.zbMATHGoogle Scholar
  8. A. Hordijk (1974), Dynamic Programming and Markov Potential Theory. Amsterdam, Mathematical Centre Tracts 51.zbMATHGoogle Scholar
  9. A. Hordijk, R. Dekker (1983), Denumerable Markov Chains, sensitive optimality criteria. Operations Research Proc. 1982, Springer-Verlag Berlin Heidelberg.Google Scholar
  10. E. Mann (1983), Optimality equations and bias-optimality in bounded Markov decision processes. Univ. Bonn, Inst. Angew. Math. SFB 72, preprint no 574, to appear in: optimization 16 (1985).Google Scholar
  11. B.L. Miller (1981). Countable-state average-cost regenerative stopping problems. J. Appl. Prob. 18, 361–377.zbMATHCrossRefGoogle Scholar
  12. B.L. Miller, A.F. Jr. Veinott (1969), Discrete dynamic programming with a small interest rate.Google Scholar
  13. M. Schäl (1983). Stationary policies in dynamic programming models under compactness assumptions. Math. Op. Res. 8, 336–372.CrossRefGoogle Scholar
  14. M. Schäl (1984a), Markovian decision models with bounded finite-state rewards. Operations Research Proc. 1983, Springer-Verlag Berlin Heidelberg.Google Scholar
  15. — (1984b), On the value iteration in Markov decision models. Zamm 65.Google Scholar
  16. — (1985), Optimal stopping and leavable gambling models with the average return criterion. To be published.Google Scholar
  17. W.D. Sudderth (1971). On measurable gambling problems. Ann. Math. Statist. 42, 260–269.MathSciNetzbMATHCrossRefGoogle Scholar
  18. P.J. Schweitzer (1968). Perturbation theory and finite Markov chains. J. Appl.Prob, 5, 401–413.MathSciNetzbMATHCrossRefGoogle Scholar
  19. W.H.M. Zijm (1982), The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the bounded cost case. A & E report 2182, Univ. of Amsterdam.Google Scholar

Copyright information

© Springer Science+Business Media New York 1986

Authors and Affiliations

  • Manfred Schäl
    • 1
  1. 1.Institut für Angewandte MathematikUniversität BonnGermany

Personalised recommendations