A Non-stationary Infinite Partially-Observable Markov Decision Process

Chatzis, Sotirios P.; Kosmopoulos, Dimitrios

doi:10.1007/978-3-319-11179-7_45

Sotirios P. Chatzis²¹ &
Dimitrios Kosmopoulos²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

International Conference on Artificial Neural Networks

4304 Accesses
1 Citations

Abstract

Partially Observable Markov Decision Processes (POMDPs) have been met with great success in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Recently, nonparametric Bayesian methods have been successfully applied to POMDPs to obviate the need of a priori knowledge of the size of the state space, allowing to assume that the number of visited states may grow as the agent explores its environment. These approaches rely on the assumption that the agent’s environment remains stationary; however, in real-world scenarios the environment may change over time. In this work, we aim to address this inadequacy by introducing a dynamic nonparametric Bayesian POMDP model that both allows for automatic inference of the (distributional) representations of POMDP states, and for capturing non-stationarity in the modeled environments. Formulation of our method is based on imposition of a suitable dynamic hierarchical Dirichlet process (dHDP) prior over state transitions. We derive efficient algorithms for model inference and action planning and evaluate it on several benchmark tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Carter, C.K., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81, 541–553 (1994)
Article MATH MathSciNet Google Scholar
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proc. AAAI, pp. 183–188 (1992)
Google Scholar
Doshi-Velez, F.: The infinite partially observable Markov decision process. In: Proc. NIPS (2009)
Google Scholar
Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96, 161–173 (2001)
Article MATH MathSciNet Google Scholar
Jaulmes, R., Pineau, J., Precup, D.: Learning in non-stationary Partially Observable Markov Decision Processes. In: ECML Workshop on Reinforcement Learning in Non-Stationary Environments (2005)
Google Scholar
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: scaling up. In: Proc. ICML (1995)
Google Scholar
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proc. ICML, pp. 697–704 (2006)
Google Scholar
Ren, L., Carin, L., Dunson, D.B.: The dynamic hierarchical Dirichlet process. In: Proc. International Conference on Machine Learning (ICML) (2008)
Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Proc. NIPS (2008)
Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: Proc. ICRA (2008)
Google Scholar
Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for pomdps. Journal of Artificial Intelligence Research 32, 663–704 (2008)
MATH MathSciNet Google Scholar
Sethuraman, J.: A constructive definition of the Dirichlet prior. Statistica Sinica 2, 639–650 (1994)
MathSciNet Google Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent Multi-Agent Syst. 27(1), 1–51 (2012)
Article Google Scholar
Siegmund, D.: Importance sampling in the Monte Carlo study of sequential tests. The Annals of Statistics 4, 673–684 (1976)
Article MATH MathSciNet Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)
Article MATH MathSciNet Google Scholar
Theocharous, G., Kaelbling, L.P.: Approximate planning in POMDPs with macro-actions. In: Proc. NIPS (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Eng., Computer Eng., and Informatics, Cyprus University of Technology, Cyprus
Sotirios P. Chatzis
Department of Informatics Engineering, TEI Crete, Greece
Dimitrios Kosmopoulos

Authors

Sotirios P. Chatzis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Kosmopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Stefan Wermter , Cornelius Weber & Sven Magg , &
Department of Informatics, Nicolaus Compernicus University, ul. Grudziądzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Modern Languages, University of Helsinki, P.O. Box 24, 00014, Helsinki, Finland
Timo Honkela
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl. 25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89069, Oberer Eselsberg, Ulm, Germany
Günther Palm
Department of Information Systems, Quartier UNIL-Dorigny, Bâtiment Internef, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatzis, S.P., Kosmopoulos, D. (2014). A Non-stationary Infinite Partially-Observable Markov Decision Process. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-11179-7_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics