Abstract
Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods such as PBVI, Perseus, and HSVI, which quickly converge to an approximate solution for medium-sized problems. These algorithms improve a value function by using backup operations over a single belief point. In the simpler domain of MDP solvers, prioritizing the order of equivalent backup operations on states is well known to speed up convergence.
We generalize the notion of prioritized backups to the POMDP framework, and show that the ordering of backup operations on belief points is important. We also present a new algorithm, Prioritized Value Iteration (PVI), and show empirically that it outperforms current point-based algorithms. Finally, a new empirical evaluation measure, based on the number of backups and the number of belief points, is proposed, in order to provide more accurate benchmark comparisons.
Partially supported by the Lynn and William Frankel Center for Computer Sciences, and by the Paul Ivanier Center for Robotics and Production Management at BGU. Ronen Brafman is partially supported by NSF grants SES-0527650 and IIS-0534662.
Chapter PDF
Similar content being viewed by others
References
Brafman, R.I.: A heuristic variable grid solution method for pomdps. In: AAAI 1997 (1997)
Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning: A simple, fast, exact method for partially observable markov decision processes. In: UAI 1997, pp. 54–61 (1997)
Izadi, M., Precup, D., Azar, D.: Belief selection in point-based planning algorithms for pomdps. In: AI 2006 (2006)
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: Scaling up. In: ICML 1995 (1995)
Lovejoy, W.S.: Computationally feasible bounds for partially observable markov decison processes. Operations Research 39, 175–192 (1991)
Paquet, S., Tobin, L., Chaib-draa, B.: Real-time decision making for large pomdps. In: AI 2005 (2005)
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI 2003 (August 2003)
Poupart, P., Boutilier, C.: VDCBPI: an approximate scalable algorithm for large POMDPs. In: NIPS, vol. 17. MIT Press, Cambridge (2004)
Smallwood, R., Sondik, E.: The optimal control of partially observable processes over a finite horizon. Operations Research 21 (1973)
Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: UAI 2004 (2004)
Smith, T., Simmons, R.: Point-based pomdp algorithms: Improved analysis and implementation. In: UAI 2005 (2005)
Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. JAIR 24, 195–220 (2005)
Wingate, D., Seppi, K.D.: Prioritization methods for accelerating mdp solvers. JMLR 6, 851–881 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shani, G., Brafman, R.I., Shimony, S.E. (2006). Prioritizing Point-Based POMDP Solvers. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_38
Download citation
DOI: https://doi.org/10.1007/11871842_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)