Abstract
Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies.
There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications.
Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit pure stationary optimal strategies.
In this paper, we introduce the class of prefix-independent and submixing payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies.
This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bierth, K.-J.: An expected average reward criterion. Stochastic Processes and Applications 26, 133–140 (1987)
Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press, London (1978)
Björklund, H., Sandberg, S., Vorobyov, S.: Memoryless determinacy of parity and mean payoff games: a simple proof (2004)
Chatterjee, K.: Concurrent games with tail objectives. In: CSL’06 (2006)
Chatterjee, K., Henzinger, T.A., Jurdzinski, M.: Mean-payoff parity games. In: LICS’05, pp. 178–187 (2005)
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS’06, pp. 325–336 (2006)
Colcombet, T., Niwinski, D.: On the positional determinacy of edge-labeled games. Theor. Comput. Sci. 352(1-3), 190–196 (2006)
Courcoubetis, C., Yannakakis, M.: Markov decision processes and regular events. In: Paterson, M.S. (ed.) Automata, Languages and Programming. LNCS, vol. 443, pp. 336–349. Springer, Heidelberg (1990)
de Alfaro, L.: Formal Verification of Probabilistic Systems. PhD thesis, Stanford University (Dec. 1997)
de Alfaro, L.: How to specify and verify the long-run average behavior of probabilistic systems. In: LICS, pp. 454–465 (1998)
Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Heidelberg (1997)
Gilette, D.: Stochastic games with zero stop probabilities (1957)
Gimbert, H.: Pure stationary optimal strategies in Markov decision processes. http://www.lix.polytechnique.fr/~gimbert/recherche/mdp_gimbert.ps
Grädel, E.: Positional determinacy of infinite games. In: Diekert, V., Habib, M. (eds.) STACS 2004. LNCS, vol. 2996, pp. 4–18. Springer, Heidelberg (2004)
Grädel, E., Thomas, W., Wilke, T. (eds.): Automata, Logics, and Infinite Games. LNCS, vol. 2500. Springer, Heidelberg (2002)
Gimbert, H., Zielonka, W.: When can you play positionally? In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 686–697. Springer, Heidelberg (2004)
Gimbert, H., Zielonka, W.: Games where you can play optimally without any memory. In: Abadi, M., de Alfaro, L. (eds.) CONCUR 2005. LNCS, vol. 3653, pp. 428–442. Springer, Heidelberg (2005)
Gimbert, H., Zielonka, W.: Deterministic priority mean-payoff games as limits of discounted games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)
Kopczyński, E.: Half-positional determinacy of infinite games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)
Maitra, A.P., Sudderth, W.D.: Discrete gambling and stochastic games. Springer, Heidelberg (1996)
Neyman, A., Sorin, S.: Stochastic games and applications, p. 2. Kluwer Academic Publishers, Dordrecht (2003)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)
Shapley, L.S.: Stochastic games. Proceedings of the National Academy of Science USA 39, 1095–1100 (1953)
Thomas, W.: On the synthesis of strategies in infinite games. In: Mayr, E.W., Puech, C. (eds.) STACS 1995. LNCS, vol. 900, pp. 1–13. Springer, Heidelberg (1995)
Thuijsman, F., Vrieze, O.J.: The Bad Match, a total reward stochastic game, vol. 9, pp. 93–99 (1987)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Gimbert, H. (2007). Pure Stationary Optimal Strategies in Markov Decision Processes. In: Thomas, W., Weil, P. (eds) STACS 2007. STACS 2007. Lecture Notes in Computer Science, vol 4393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70918-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-70918-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70917-6
Online ISBN: 978-3-540-70918-3
eBook Packages: Computer ScienceComputer Science (R0)