Pure Stationary Optimal Strategies in Markov Decision Processes

Gimbert, Hugo

doi:10.1007/978-3-540-70918-3_18

Pure Stationary Optimal Strategies in Markov Decision Processes

Hugo Gimbert¹

Conference paper

1175 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4393))

Abstract

Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies.

There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications.

Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit pure stationary optimal strategies.

In this paper, we introduce the class of prefix-independent and submixing payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies.

This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bierth, K.-J.: An expected average reward criterion. Stochastic Processes and Applications 26, 133–140 (1987)
Article MathSciNet Google Scholar
Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press, London (1978)
MATH Google Scholar
Björklund, H., Sandberg, S., Vorobyov, S.: Memoryless determinacy of parity and mean payoff games: a simple proof (2004)
Google Scholar
Chatterjee, K.: Concurrent games with tail objectives. In: CSL’06 (2006)
Google Scholar
Chatterjee, K., Henzinger, T.A., Jurdzinski, M.: Mean-payoff parity games. In: LICS’05, pp. 178–187 (2005)
Google Scholar
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS’06, pp. 325–336 (2006)
Google Scholar
Colcombet, T., Niwinski, D.: On the positional determinacy of edge-labeled games. Theor. Comput. Sci. 352(1-3), 190–196 (2006)
Article MathSciNet MATH Google Scholar
Courcoubetis, C., Yannakakis, M.: Markov decision processes and regular events. In: Paterson, M.S. (ed.) Automata, Languages and Programming. LNCS, vol. 443, pp. 336–349. Springer, Heidelberg (1990)
Chapter Google Scholar
de Alfaro, L.: Formal Verification of Probabilistic Systems. PhD thesis, Stanford University (Dec. 1997)
Google Scholar
de Alfaro, L.: How to specify and verify the long-run average behavior of probabilistic systems. In: LICS, pp. 454–465 (1998)
Google Scholar
Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Heidelberg (1997)
MATH Google Scholar
Gilette, D.: Stochastic games with zero stop probabilities (1957)
Google Scholar
Gimbert, H.: Pure stationary optimal strategies in Markov decision processes. http://www.lix.polytechnique.fr/~gimbert/recherche/mdp_gimbert.ps
Grädel, E.: Positional determinacy of infinite games. In: Diekert, V., Habib, M. (eds.) STACS 2004. LNCS, vol. 2996, pp. 4–18. Springer, Heidelberg (2004)
Google Scholar
Grädel, E., Thomas, W., Wilke, T. (eds.): Automata, Logics, and Infinite Games. LNCS, vol. 2500. Springer, Heidelberg (2002)
MATH Google Scholar
Gimbert, H., Zielonka, W.: When can you play positionally? In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 686–697. Springer, Heidelberg (2004)
Google Scholar
Gimbert, H., Zielonka, W.: Games where you can play optimally without any memory. In: Abadi, M., de Alfaro, L. (eds.) CONCUR 2005. LNCS, vol. 3653, pp. 428–442. Springer, Heidelberg (2005)
Chapter Google Scholar
Gimbert, H., Zielonka, W.: Deterministic priority mean-payoff games as limits of discounted games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)
Chapter Google Scholar
Kopczyński, E.: Half-positional determinacy of infinite games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)
Google Scholar
Maitra, A.P., Sudderth, W.D.: Discrete gambling and stochastic games. Springer, Heidelberg (1996)
MATH Google Scholar
Neyman, A., Sorin, S.: Stochastic games and applications, p. 2. Kluwer Academic Publishers, Dordrecht (2003)
MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)
MATH Google Scholar
Shapley, L.S.: Stochastic games. Proceedings of the National Academy of Science USA 39, 1095–1100 (1953)
Article MathSciNet MATH Google Scholar
Thomas, W.: On the synthesis of strategies in infinite games. In: Mayr, E.W., Puech, C. (eds.) STACS 1995. LNCS, vol. 900, pp. 1–13. Springer, Heidelberg (1995)
Google Scholar
Thuijsman, F., Vrieze, O.J.: The Bad Match, a total reward stochastic game, vol. 9, pp. 93–99 (1987)
Google Scholar

Download references

Author information

Authors and Affiliations

LIX, Ecole Polytechnique, France
Hugo Gimbert

Authors

Hugo Gimbert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Wolfgang Thomas Pascal Weil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gimbert, H. (2007). Pure Stationary Optimal Strategies in Markov Decision Processes. In: Thomas, W., Weil, P. (eds) STACS 2007. STACS 2007. Lecture Notes in Computer Science, vol 4393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70918-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-70918-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70917-6
Online ISBN: 978-3-540-70918-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics