A Non-stationary Infinite Partially-Observable Markov Decision Process

  • Sotirios P. Chatzis
  • Dimitrios Kosmopoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8681)


Partially Observable Markov Decision Processes (POMDPs) have been met with great success in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Recently, nonparametric Bayesian methods have been successfully applied to POMDPs to obviate the need of a priori knowledge of the size of the state space, allowing to assume that the number of visited states may grow as the agent explores its environment. These approaches rely on the assumption that the agent’s environment remains stationary; however, in real-world scenarios the environment may change over time. In this work, we aim to address this inadequacy by introducing a dynamic nonparametric Bayesian POMDP model that both allows for automatic inference of the (distributional) representations of POMDP states, and for capturing non-stationarity in the modeled environments. Formulation of our method is based on imposition of a suitable dynamic hierarchical Dirichlet process (dHDP) prior over state transitions. We derive efficient algorithms for model inference and action planning and evaluate it on several benchmark tasks.


Importance Sampling Markov Decision Process Dirichlet Process Hierarchical Dirichlet Process Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carter, C.K., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81, 541–553 (1994)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proc. AAAI, pp. 183–188 (1992)Google Scholar
  3. 3.
    Doshi-Velez, F.: The infinite partially observable Markov decision process. In: Proc. NIPS (2009)Google Scholar
  4. 4.
    Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96, 161–173 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Jaulmes, R., Pineau, J., Precup, D.: Learning in non-stationary Partially Observable Markov Decision Processes. In: ECML Workshop on Reinforcement Learning in Non-Stationary Environments (2005)Google Scholar
  6. 6.
    Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Proc. ICML (1995)Google Scholar
  7. 7.
    Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proc. ICML, pp. 697–704 (2006)Google Scholar
  8. 8.
    Ren, L., Carin, L., Dunson, D.B.: The dynamic hierarchical Dirichlet process. In: Proc. International Conference on Machine Learning (ICML) (2008)Google Scholar
  9. 9.
    Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Proc. NIPS (2008)Google Scholar
  10. 10.
    Ross, S., Chaib-draa, B., Pineau, J.: Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: Proc. ICRA (2008)Google Scholar
  11. 11.
    Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for pomdps. Journal of Artificial Intelligence Research 32, 663–704 (2008)zbMATHMathSciNetGoogle Scholar
  12. 12.
    Sethuraman, J.: A constructive definition of the Dirichlet prior. Statistica Sinica 2, 639–650 (1994)MathSciNetGoogle Scholar
  13. 13.
    Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent Multi-Agent Syst. 27(1), 1–51 (2012)CrossRefGoogle Scholar
  14. 14.
    Siegmund, D.: Importance sampling in the Monte Carlo study of sequential tests. The Annals of Statistics 4, 673–684 (1976)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Theocharous, G., Kaelbling, L.P.: Approximate planning in POMDPs with macro-actions. In: Proc. NIPS (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sotirios P. Chatzis
    • 1
  • Dimitrios Kosmopoulos
    • 2
  1. 1.Department of Electrical Eng., Computer Eng., and InformaticsCyprus University of TechnologyCyprus
  2. 2.Department of Informatics EngineeringTEI CreteGreece

Personalised recommendations