Skip to main content

Pure Stationary Optimal Strategies in Markov Decision Processes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4393))

Abstract

Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies.

There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications.

Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit pure stationary optimal strategies.

In this paper, we introduce the class of prefix-independent and submixing payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies.

This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bierth, K.-J.: An expected average reward criterion. Stochastic Processes and Applications 26, 133–140 (1987)

    Article  MathSciNet  Google Scholar 

  2. Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press, London (1978)

    MATH  Google Scholar 

  3. Björklund, H., Sandberg, S., Vorobyov, S.: Memoryless determinacy of parity and mean payoff games: a simple proof (2004)

    Google Scholar 

  4. Chatterjee, K.: Concurrent games with tail objectives. In: CSL’06 (2006)

    Google Scholar 

  5. Chatterjee, K., Henzinger, T.A., Jurdzinski, M.: Mean-payoff parity games. In: LICS’05, pp. 178–187 (2005)

    Google Scholar 

  6. Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS’06, pp. 325–336 (2006)

    Google Scholar 

  7. Colcombet, T., Niwinski, D.: On the positional determinacy of edge-labeled games. Theor. Comput. Sci. 352(1-3), 190–196 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Courcoubetis, C., Yannakakis, M.: Markov decision processes and regular events. In: Paterson, M.S. (ed.) Automata, Languages and Programming. LNCS, vol. 443, pp. 336–349. Springer, Heidelberg (1990)

    Chapter  Google Scholar 

  9. de Alfaro, L.: Formal Verification of Probabilistic Systems. PhD thesis, Stanford University (Dec. 1997)

    Google Scholar 

  10. de Alfaro, L.: How to specify and verify the long-run average behavior of probabilistic systems. In: LICS, pp. 454–465 (1998)

    Google Scholar 

  11. Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  12. Gilette, D.: Stochastic games with zero stop probabilities (1957)

    Google Scholar 

  13. Gimbert, H.: Pure stationary optimal strategies in Markov decision processes. http://www.lix.polytechnique.fr/~gimbert/recherche/mdp_gimbert.ps

  14. Grädel, E.: Positional determinacy of infinite games. In: Diekert, V., Habib, M. (eds.) STACS 2004. LNCS, vol. 2996, pp. 4–18. Springer, Heidelberg (2004)

    Google Scholar 

  15. Grädel, E., Thomas, W., Wilke, T. (eds.): Automata, Logics, and Infinite Games. LNCS, vol. 2500. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  16. Gimbert, H., Zielonka, W.: When can you play positionally? In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 686–697. Springer, Heidelberg (2004)

    Google Scholar 

  17. Gimbert, H., Zielonka, W.: Games where you can play optimally without any memory. In: Abadi, M., de Alfaro, L. (eds.) CONCUR 2005. LNCS, vol. 3653, pp. 428–442. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Gimbert, H., Zielonka, W.: Deterministic priority mean-payoff games as limits of discounted games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Kopczyński, E.: Half-positional determinacy of infinite games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)

    Google Scholar 

  20. Maitra, A.P., Sudderth, W.D.: Discrete gambling and stochastic games. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  21. Neyman, A., Sorin, S.: Stochastic games and applications, p. 2. Kluwer Academic Publishers, Dordrecht (2003)

    MATH  Google Scholar 

  22. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)

    MATH  Google Scholar 

  23. Shapley, L.S.: Stochastic games. Proceedings of the National Academy of Science USA 39, 1095–1100 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  24. Thomas, W.: On the synthesis of strategies in infinite games. In: Mayr, E.W., Puech, C. (eds.) STACS 1995. LNCS, vol. 900, pp. 1–13. Springer, Heidelberg (1995)

    Google Scholar 

  25. Thuijsman, F., Vrieze, O.J.: The Bad Match, a total reward stochastic game, vol. 9, pp. 93–99 (1987)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Wolfgang Thomas Pascal Weil

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Gimbert, H. (2007). Pure Stationary Optimal Strategies in Markov Decision Processes. In: Thomas, W., Weil, P. (eds) STACS 2007. STACS 2007. Lecture Notes in Computer Science, vol 4393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70918-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70918-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70917-6

  • Online ISBN: 978-3-540-70918-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics