A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

Chatzis, Sotirios P.; Kosmopoulos, Dimitrios

doi:10.1007/978-3-662-44654-6_11

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

Sotirios P. Chatzis⁴ &
Dimitrios Kosmopoulos⁵

Conference paper

1822 Accesses

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 436))

Abstract

Partially Observable Markov Decision Processes (POMDPs) have been met with great success in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Recently, nonparametric Bayesian methods have been successfully applied to POMDPs to obviate the need of a priori knowledge of the size of the state space, allowing to assume that the number of visited states may grow as the agent explores its environment. These approaches rely on the assumption that the agent’s environment remains stationary; however, in real-world scenarios the environment may change over time. In this work, we aim to address this inadequacy by introducing a dynamic nonparametric Bayesian POMDP model that both allows for automatic inference of the (distributional) representations of POMDP states, and for capturing non-stationarity in the modeled environments. Formulation of our method is based on imposition of a suitable dynamic hierarchical Dirichlet process (dHDP) prior over state transitions. We derive efficient algorithms for model inference and action planning and evaluate it on several benchmark tasks.

Download to read the full chapter text

Chapter PDF

References

Carter, C.K., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81, 541–553 (1994)
Article MathSciNet MATH Google Scholar
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proc. AAAI, pp. 183–188 (1992)
Google Scholar
Doshi-Velez, F.: The infinite partially observable Markov decision process. In: Proc. NIPS (2009)
Google Scholar
Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96, 161–173 (2001)
Article MathSciNet MATH Google Scholar
Jaulmes, R., Pineau, J., Precup, D.: Learning in non-stationary Partially Observable Markov Decision Processes. In: ECML Workshop on Reinforcement Learning in Non-Stationary Environments (2005)
Google Scholar
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Proc. ICML (1995)
Google Scholar
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proc. ICML, pp. 697–704 (2006)
Google Scholar
Ren, L., Carin, L., Dunson, D.B.: The dynamic hierarchical Dirichlet process. In: Proc. International Conference on Machine Learning, ICML (2008)
Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Proc. NIPS (2008)
Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: Proc. ICRA (2008)
Google Scholar
Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for pomdps. Journal of Artificial Intelligence Research 32, 663–704 (2008)
MathSciNet MATH Google Scholar
Sethuraman, J.: A constructive definition of the Dirichlet prior. Statistica Sinica 2, 639–650 (1994)
MathSciNet Google Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton Agent Multi-Agent Syst. 27(1), 1–51 (2012)
Article Google Scholar
Siegmund, D.: Importance sampling in the Monte Carlo study of sequential tests. The Annals of Statistics 4, 673–684 (1976)
Article MathSciNet MATH Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)
Article MathSciNet MATH Google Scholar
Theocharous, G., Kaelbling, L.P.: Approximate planning in POMDPs with macro-actions. In: Proc. NIPS (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Eng., Computer Eng., and Informatics, Cyprus University of Technology, Cyprus
Sotirios P. Chatzis
Department of Informatics Engineering, TEI Crete, Greece
Dimitrios Kosmopoulos

Authors

Sotirios P. Chatzis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Kosmopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Forestry and Management of the Environment, Democritus University of Thrace, Pandazidou 193, 68200, Orestiada, Greece
Lazaros Iliadis
Department of Digital Systems, University of Piraeus, 80, Karaoli and Dimitriou Str., 18534, Piraeus, Greece
Ilias Maglogiannis
Department of Computer Science and Engineering, Frederick University, 7 Yianni Frederickou Str., Pallouriotissa, 1036, Nicosia, Cyprus
Harris Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatzis, S.P., Kosmopoulos, D. (2014). A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds) Artificial Intelligence Applications and Innovations. AIAI 2014. IFIP Advances in Information and Communication Technology, vol 436. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44654-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-662-44654-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44653-9
Online ISBN: 978-3-662-44654-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics