Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Poupart, Pascal; Lang, Tobias; Toussaint, Marc

doi:10.1007/978-3-642-23783-6_39

Pascal Poupart²³,
Tobias Lang²⁴ &
Marc Toussaint²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6912))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3175 Accesses
3 Citations

Abstract

Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We investigate the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM). We show that EM converges to controllers that are optimal with respect to a one-step lookahead. To escape local optima, we propose two algorithms: the first one adds nodes to the controller to ensure optimality with respect to a multi-step lookahead, while the second one splits nodes in a greedy fashion to improve reward likelihood. The approaches are demonstrated empirically on benchmark problems.

Download to read the full chapter text

Chapter PDF

Interactive POMDPs with finite-state models of other agents

Article 25 January 2017

Controller synthesis for linear temporal logic and steady-state specifications

Article 03 May 2024

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Amato, C., Bernstein, D., Zilberstein, S.: Solving POMDPs using quadratically constrained linear programs. In: IJCAI, pp. 2418–2424 (2007)
Google Scholar
Amato, C., Bonet, B., Zilberstein, S.: Finite-state controllers based on mealy machines for centralized and decentralized POMDPs. In: AAAI (2010)
Google Scholar
Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI, pp. 690–696 (2004)
Google Scholar
Cassandra, A.: Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University, Dept. of Computer Science (1998)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Stat. Society, Series B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Hansen, E.: An improved policy iteration algorithm for partially observable MDPs. In: NIPS (1998)
Google Scholar
Hoffman, M., Kueck, H., Doucet, A., de Freitas, N.: New inference strategies for solving Markov decision processes using reversible jump MCMC. In: UAI (2009)
Google Scholar
Kumar, A., Zilberstein, S.: Anytime planning for decentralized POMDPs using expectation maximization. In: UAI (2010)
Google Scholar
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proc. Robotics: Science and Systems (2008)
Google Scholar
Littman, M., Cassandra, T., Kaelbling, L.: Learning policies for partially observable environments: scaling up. In: ICML, pp. 362–370 (1995)
Google Scholar
Meuleau, N., Kim, K.-E., Kaelbling, L., Cassandra, A.: Solving POMDPs by searching the space of finite policies. In: UAI, pp. 417–426 (1999)
Google Scholar
Pineau, J.: Tractable Planning Under Uncertainty: Exploiting Structure. PhD thesis, Robotics Institute, Carnegie Mellon University (2004)
Google Scholar
Poupart, P.: Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, University of Toronto (2005)
Google Scholar
Poupart, P., Boutilier, C.: Bounded finite state controllers. In: NIPS (2003)
Google Scholar
Siddiqi, S., Gordon, G., Moore, A.: Fast state discovery for HMM model selection and learning. In: AI-STATS (2007)
Google Scholar
Smith, T., Simmons, R.: Point-based POMDP algorithms: improved analysis and implementation. In: UAI (2005)
Google Scholar
Sondik, E.: The optimal control of partially observable decision processes over the infinite horizon: Discounted cost. Operations Research 26(2), 282–304 (1978)
Article MathSciNet MATH Google Scholar
Toussaint, M., Charlin, L., Poupart, P.: Hierarchical POMDP controller optimization by likelihood maximization. In: UAI (2008)
Google Scholar
Toussaint, M., Harmeling, S., Storkey, A.: Probabilistic inference for solving (PO)MDPs. Technical Report EDI-INF-RR-0934, School of Informatics, University of Edinburgh (2006)
Google Scholar
Toussaint, M., Storkey, A.J.: Probabilistic inference for solving discrete and continuous state Markov decision processes. In: ICML (2006)
Google Scholar
Vlassis, N., Toussaint, M.: Model free reinforcement learning as mixture learning. In: ICML (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada
Pascal Poupart
Machine Learning and Robotics Lab, FU Berlin, Berlin, Germany
Tobias Lang & Marc Toussaint

Authors

Pascal Poupart
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Lang
View author publications
You can also search for this author in PubMed Google Scholar
Marc Toussaint
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Poupart, P., Lang, T., Toussaint, M. (2011). Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23783-6_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-23783-6_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23782-9
Online ISBN: 978-3-642-23783-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Abstract

Chapter PDF

Similar content being viewed by others

Interactive POMDPs with finite-state models of other agents

Controller synthesis for linear temporal logic and steady-state specifications

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Abstract

Chapter PDF

Similar content being viewed by others

Interactive POMDPs with finite-state models of other agents

Controller synthesis for linear temporal logic and steady-state specifications

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation