Abstract
Partially Observable Markov Decision Processes (POMDPs) have been met with great success in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Recently, nonparametric Bayesian methods have been successfully applied to POMDPs to obviate the need of a priori knowledge of the size of the state space, allowing to assume that the number of visited states may grow as the agent explores its environment. These approaches rely on the assumption that the agent’s environment remains stationary; however, in real-world scenarios the environment may change over time. In this work, we aim to address this inadequacy by introducing a dynamic nonparametric Bayesian POMDP model that both allows for automatic inference of the (distributional) representations of POMDP states, and for capturing non-stationarity in the modeled environments. Formulation of our method is based on imposition of a suitable dynamic hierarchical Dirichlet process (dHDP) prior over state transitions. We derive efficient algorithms for model inference and action planning and evaluate it on several benchmark tasks.
Chapter PDF
References
Carter, C.K., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81, 541–553 (1994)
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proc. AAAI, pp. 183–188 (1992)
Doshi-Velez, F.: The infinite partially observable Markov decision process. In: Proc. NIPS (2009)
Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96, 161–173 (2001)
Jaulmes, R., Pineau, J., Precup, D.: Learning in non-stationary Partially Observable Markov Decision Processes. In: ECML Workshop on Reinforcement Learning in Non-Stationary Environments (2005)
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Proc. ICML (1995)
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proc. ICML, pp. 697–704 (2006)
Ren, L., Carin, L., Dunson, D.B.: The dynamic hierarchical Dirichlet process. In: Proc. International Conference on Machine Learning, ICML (2008)
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Proc. NIPS (2008)
Ross, S., Chaib-draa, B., Pineau, J.: Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: Proc. ICRA (2008)
Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for pomdps. Journal of Artificial Intelligence Research 32, 663–704 (2008)
Sethuraman, J.: A constructive definition of the Dirichlet prior. Statistica Sinica 2, 639–650 (1994)
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton Agent Multi-Agent Syst. 27(1), 1–51 (2012)
Siegmund, D.: Importance sampling in the Monte Carlo study of sequential tests. The Annals of Statistics 4, 673–684 (1976)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)
Theocharous, G., Kaelbling, L.P.: Approximate planning in POMDPs with macro-actions. In: Proc. NIPS (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Chatzis, S.P., Kosmopoulos, D. (2014). A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds) Artificial Intelligence Applications and Innovations. AIAI 2014. IFIP Advances in Information and Communication Technology, vol 436. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44654-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-662-44654-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44653-9
Online ISBN: 978-3-662-44654-6
eBook Packages: Computer ScienceComputer Science (R0)