Advertisement

The Steady-State Control Problem for Markov Decision Processes

  • S. Akshay
  • Nathalie Bertrand
  • Serge Haddad
  • Loïc Hélouët
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8054)

Abstract

This paper addresses a control problem for probabilistic models in the setting of Markov decision processes (MDP). We are interested in the steady-state control problem which asks, given an ergodic MDP \(\mathcal{M}\) and a distribution δ goal, whether there exists a (history-dependent randomized) policy π ensuring that the steady-state distribution of \(\mathcal{M}\) under π is exactly δ goal. We first show that stationary randomized policies suffice to achieve a given steady-state distribution. Then we infer that the steady-state control problem is decidable for MDP, and can be represented as a linear program which is solvable in PTIME. This decidability result extends to labeled MDP (LMDP) where the objective is a steady-state distribution on labels carried by the states, and we provide a PSPACE algorithm. We also show that a related steady-state language inclusion problem is decidable in EXPTIME for LMDP. Finally, we prove that if we consider MDP under partial observation (POMDP), the steady-state control problem becomes undecidable.

Keywords

Markov Chain Decision Rule Markov Decision Process Discrete Time Markov Chain Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bertoni, A.: The solution of problems relative to probabilistic automata in the frame of the formal languages theory. In: Siefkes, D. (ed.) GI 1974. LNCS, vol. 26, pp. 107–112. Springer, Heidelberg (1975)Google Scholar
  2. 2.
    Canny, J.F.: Some algebraic and geometric computations in PSPACE. In: 20th ACM Symp. on Theory of Computing, pp. 460–467 (1988)Google Scholar
  3. 3.
    Chadha, R., Korthikanti, V., Vishwanathan, M., Agha, G., Kwon, Y.: Model checking MDPs with a unique compact invariant set of distributions. In: QEST 2011 (2011)Google Scholar
  4. 4.
    Doyen, L., Henzinger, T.A., Raskin, J.-F.: Equivalence of labeled Markov chains. Int. J. Found. Comput. Sci. 19(3), 549–563 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Kemeny, J.G., Snell, J.L.: Finite Markov chains. Princeton University Press (1960)Google Scholar
  6. 6.
    Korthikanti, V.A., Viswanathan, M., Agha, G., Kwon, Y.: Reasoning about MDPs as transformers of probability distributions. In: QEST. IEEE Computer Society (2010)Google Scholar
  7. 7.
    Norris, J.R.: Markov chains. Cambridge series on statistical and probabilistic mathematics, vol. 2. Cambridge University Press (1997)Google Scholar
  8. 8.
    Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons (1994)Google Scholar
  9. 9.
    Roos, C., Terlaky, T., Vial, J.-P.: Theory and Algorithms for Linear Optimization: An interior point approach. John Wiley & Sons (1997)Google Scholar
  10. 10.
    Sigaud, O., Buffet, O. (eds.): Markov decision processes in artifical intelligence. John Wiley & Sons (2010)Google Scholar
  11. 11.
    Thorsley, D., Teneketzis, D.: Diagnosability of stochastic discrete-event systems. IEEE Trans. Automat. Contr. 50(4), 476–492 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • S. Akshay
    • 1
    • 2
  • Nathalie Bertrand
    • 1
  • Serge Haddad
    • 3
  • Loïc Hélouët
    • 1
  1. 1.Inria RennesFrance
  2. 2.IIT BombayIndia
  3. 3.LSV, ENS Cachan & CNRS & INRIAFrance

Personalised recommendations