Advertisement

Reduction of Discounted Continuous-Time MDPs with Unbounded Jump and Reward Rates to Discrete-Time Total-Reward MDPs

  • Eugene A. Feinberg
Chapter
Part of the Systems & Control: Foundations & Applications book series (SCFA)

Abstract

This chapter discusses a reduction of discounted continuous-time Markov decision processes (CTMDPs) to discrete-time Markov decision processes (MDPs). This reduction is based on the equivalence of a randomized policy that chooses actions only at jump epochs to a nonrandomized policy that can switch actions between jumps. For discounted CTMDPs with bounded jump rates, this reduction was introduced by the author in 2004 as a reduction to discounted MDPs. Here we show that this reduction also holds for unbounded jump and reward rates, but the corresponding MDP may not be discounted. However, the analysis of the equivalent total-reward MDP leads to the description of optimal policies for the CTMDP and provides methods for their computation.

Keywords

Optimal Policy Reward Function Jump Rate Reward Rate Total Reward 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This research was partially supported by NSF grants CMMI-0900206 and CMMI-0928490.

References

  1. 1.
    Bather, J.: Optimal stationary policies for denumerable Markov chains in continuous time. Adv. Appl. Prob. 8, 144–158 (1976)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton, N.J. (1957)MATHGoogle Scholar
  3. 3.
    Bertsekas, D.P. and Shreve, S.E.: Stochastic Optimal Control: the Discrete Time Case. Academic Press, New York (1978)MATHGoogle Scholar
  4. 4.
    Blackwell, D.: Positive dynamic programming. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability 1 415–418, University of California Press, Berkeley (1967).Google Scholar
  5. 5.
    Dynkin, E.B. and Yushkevich, A.A.: Controlled Markov Processes. Springer, Berlin (1979)CrossRefGoogle Scholar
  6. 6.
    Feinberg, E.A.: Non-randomized Markov and semi-Markov strategies in dynamic programming. Theory Probability Appl., 27, 116–126 (1982)CrossRefGoogle Scholar
  7. 7.
    Feinberg, E.A.: Controlled Markov processes with arbitrary numerical criteria. Theory Probability Appl., 27, 486–503 (1982)CrossRefGoogle Scholar
  8. 8.
    Feinberg, E.A.: A generalization of “expectation equals reciprocal of intensity” to nonstationary distributions. J. Appl. Probability, 31, 262–267 (1994)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Feinberg, E.A.: Total reward criteria. In Feinberg, E.A., Shwartz, A. (eds.) Handbook of Markov Decision Processes, pp. 173–207, Kluwer, Boston (2002)CrossRefGoogle Scholar
  10. 10.
    Feinberg, E.A.: Continuous time discounted jump Markov Decision Processes: a discrete-event approach. Math. Oper. Res. 29, 492–524 (2004)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Feinberg, E.A., Lewis, M.E.: Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem. Mathematics of Operations Research 32 769–783 (2007)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Guo, X. Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin (2009)MATHCrossRefGoogle Scholar
  13. 13.
    Guo, X. Hernández-Lerma, O., Prieto-Rumeau, T: A survey of recent results on continuous-time Markov Decision Processes. Top 14, 177–261 (2006)MATHGoogle Scholar
  14. 14.
    Guo, X., Piunovskiy, A. Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Mathematics of Operations Research 36 105–132 (2011)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Hernández-Lerma O.: Lectures on Continuous-Time Markov Control Processes. Aportaciones Mathemáticas. 3. Sociedad Mathemática Mexicana, México (1994)Google Scholar
  16. 16.
    Hernández-Lerma, O., Lasserre J.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (1996)CrossRefGoogle Scholar
  17. 17.
    Hordijk, A. and F. A. van der Duyn Schouten: Discretization procedures for continuous time decision processes. In Transactions of the Eighth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Volume C, pp. 143–154, Academia, Prague (1979)Google Scholar
  18. 18.
    Howard, R.: Dynamic Programming and Markov Processes. Wiley, New York (1960)MATHGoogle Scholar
  19. 19.
    Jacod, J.: Multivariate point processes: predictable projections, Radon-Nikodym derivatives, representation of martingales. Z. Wahr. verw. Geb. 31 235–253 (1975)MathSciNetMATHCrossRefGoogle Scholar
  20. 20.
    Kakumanu, P.: Continuously discounted Markov decision models with countable state and action space. Ann. Math. Stat. 42 919–926 (1971).MathSciNetMATHCrossRefGoogle Scholar
  21. 21.
    Kakumanu, P.: Relation between continuous and discrete time Markovian decision problems. Naval. Res. Log. Quart. 24 431–439 (1977)MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Kallianpur, G.:Stochastic Filtering Theory. Springer, New York (1980)Google Scholar
  23. 23.
    Kitaev, Yu. M.: Semi-Markov and jump Markov controlled models: average cost criterion. Theory Probab. Appl. 30 272–288 (1985).MathSciNetCrossRefGoogle Scholar
  24. 24.
    Kitaev, Yu. M. and V. V. Rykov.: Controlled Queueing Systems. CRC Press, New York (1995)MATHGoogle Scholar
  25. 25.
    Lippman, S.: Applying a new device in the optimization of exponential queueing systems. Oper. Res. 23, 687–710 (1975)MathSciNetMATHCrossRefGoogle Scholar
  26. 26.
    Martin-Löf, A.: Optimal control of a continuous-time Markov chain with periodic transition probabilities. Oper. Res. 15, 872–881 (1966)CrossRefGoogle Scholar
  27. 27.
    Miller, B.L.: 1968. Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J. Control 6 266–280 (1968)Google Scholar
  28. 28.
    Miller, B.L.: Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22 552–569 (1968)MathSciNetMATHCrossRefGoogle Scholar
  29. 29.
    Piunovskiy, A., Zhang, Y.: The transformation method for continuous-time Markov decision processes, Preprint, University of Liverpool, 2011Google Scholar
  30. 30.
    Rykov, V.V.: Markov Decision Processes with finite spaces of states and decisions. Theory Prob. Appl. 11, 302–311 (1966)MathSciNetMATHCrossRefGoogle Scholar
  31. 31.
    Schäl, M.: Conditions for optimality and for the limit of n-stage optimal policies to be optimal. Z. Wahrs. verw. Gebr. 32 179–196 (1975)MATHCrossRefGoogle Scholar
  32. 32.
    Schäl, M.: On dynamic programming: compactness of the space of policies. Stoch. Processes Appl. 3 345–364, 1975.MATHCrossRefGoogle Scholar
  33. 33.
    Serfozo, R.F.: An equivalence between continuous and discrete time Markov decision processes. Oper. Res. 27, 616–620 (1979)MathSciNetMATHCrossRefGoogle Scholar
  34. 34.
    Strauch, R.E.: Negative dynamic programming, Ann. Math. Stat, 37, 871–890 (1966)MathSciNetMATHCrossRefGoogle Scholar
  35. 35.
    Yushkevich, A. A.: Controlled Markov models with countable state space and continuous time. Theory Probab. Appl. 22 215–235 (1977)MATHCrossRefGoogle Scholar
  36. 36.
    Zachrisson, L.E.: Markov games. In: Dresher, M., Shapley, L., Tucker A. (eds.) Advances in Game Theory, pp. 211–253. Princeton University Press, Princeton, N.J. (1964)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Applied Mathematics and StatisticsStony Brook UniversityStony BrookUSA

Personalised recommendations