Advertisement

Journal of Systems Science and Complexity

, Volume 31, Issue 6, pp 1423–1436 | Cite as

A Fast Approximation Method for Partially Observable Markov Decision Processes

  • Bingbing Liu
  • Yu Kang
  • Xiaofeng Jiang
  • Jiahu Qin
Article
  • 18 Downloads

Abstract

This paper develops a new lower bound method for POMDPs that approximates the update of a belief by the update of its non-zero states. It uses the underlying MDP to explore the optimal reachable state space from initial belief and select actions during value iterations, which significantly accelerates the convergence speed. Also, an algorithm which collects and prunes belief points based on the upper and lower bounds is presented, and experimental results show that it outperforms some of the state-of-art point-based algorithms.

Keywords

Lower bound point-based POMDP 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Pineau J and Gordon G J, POMDP planning for robust robot control, Robotics Research, Springer, Berlin Heidelberg, 2007, 69–82.Google Scholar
  2. [2]
    Hsu D, Lee W S, and Rong N, A point-based POMDP planner for target tracking, IEEE International Conference on Robotics and Automation, 2008, 2644–2650.Google Scholar
  3. [3]
    Dai P, Lin C H, Weld D S, et al., Pomdp-based control of workflows for crowdsourcing, Artificial Intelligence, 2013, 202: 52–85.MathSciNetCrossRefzbMATHGoogle Scholar
  4. [4]
    Grady D K, Moll M, and Kavraki L E, Extending the applicabilityof pomdp solutions to robotic tasks, IEEE Transactions on Robotics, 2015, 31(4): 948–961.CrossRefGoogle Scholar
  5. [5]
    Seo J, Sung Y, Lee G, et al., Training beam sequence design for millimeter-wave MIMO systems: A POMDP framework, IEEE Transactions on Signal Processing, 2016, 64(5): 1228–1242.MathSciNetCrossRefGoogle Scholar
  6. [6]
    Papadimitriou C H and Tsitsiklis J N, The complexity of markov decision processes, Mathematics of Operations Research, 1987, 12(3): 441–450.MathSciNetCrossRefzbMATHGoogle Scholar
  7. [7]
    Grzes M, Poupart P, Yang X, et al., Energy efficient execution of pomdp policies, IEEE Transactions on Cybernetics, 2015, 45(11): 2484–2497.CrossRefGoogle Scholar
  8. [8]
    Veiga T S, Spaan M T J, and Lima P U, Point-based POMDP solving with factored value function approximation, 28th AAAI Conference on Artificial Intelligence, 2014, 2513–2519.Google Scholar
  9. [9]
    Hauskrecht M, Value-function approximations for partially observable markov decision processes, Journal of Artificial Intelligence Research, 2000, 13(1): 33–94.MathSciNetCrossRefzbMATHGoogle Scholar
  10. [10]
    Littman M L, Cassandra A R, and Kaelbling L P, Learning Policies for Partially Observable Environments: Scaling up, Reading in agent, Morgan Kaufman Publishers Inc, 1997.Google Scholar
  11. [11]
    Pineau J, Gordon G, Thrun S, et al., Point-based value iteration: An anytime algorithm for pomdps, IJCAI, 2003, 3: 1025–1032.Google Scholar
  12. [12]
    Smith T and Simmons R, Heuristic search value iteration for pomdps, Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2004, 520–527.Google Scholar
  13. [13]
    Poupart P, Kim K E, and Kim D, Closing the gap: Improved boundson optimal pomdp solutions, ICAPS, 2011.Google Scholar
  14. [14]
    Smith T and Simmons R, Point-based pomdp algorithms: Improved analysis and implementation, Computer Science, 2005, 542–555.Google Scholar
  15. [15]
    Kurniawati H, Hsu D, and Lee W S, Sarsop: Efficient Point-Based Pomdp Planning by Approximating Optimally Reachable Belief Spaces, MIT Press, 2008, 65–72.Google Scholar
  16. [16]
    Shani G, Brafman R I, and Shimony S E, Forward search value iteration for pomdps, IJCAI, 2007, 2619–2624.Google Scholar
  17. [17]
    Shani G, Pineau J, and Kaplow R, A survey of point-based POMDP solvers, Autonomous Agents and Multi-Agent Systems, 2013, 27(1): 1–51.CrossRefGoogle Scholar
  18. [18]
    Spaan M T J and Vlassis N, Perseus: Randomized point-based value iteration for POMDPs, Journal of Artificial Intelligence Research, 2005, 24: 195–220.CrossRefzbMATHGoogle Scholar
  19. [19]
    Smallwood R D and Sondik E J, The optimal control of partially observable Markov processes over a finite horizon, Operations Research, 1973, 21(5): 1071–1088.CrossRefzbMATHGoogle Scholar
  20. [20]
    Hauskrecht M, Planning and control in stochastic domains with imperfect information, Ph.D. dissertation, Massachusetts Institute of Technology, 1997.Google Scholar
  21. [21]
    Murphy K P, A survey of POMDP solution techniques, Environment, 2000, 2: X3.Google Scholar
  22. [22]
    Izadi M T, Precup D, and Azar D, Belief Selection in Point-Based Planning Algorithms for Pomdps, Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2006, 383–394.CrossRefGoogle Scholar
  23. [23]
    Dai P and Goldsmith J, Topological value iteration algorithm for markov decision processes, IJCAI, 2007, 1860–1865.Google Scholar
  24. [24]
    Dibangoye J S, Shani G, Chaib-Draa B, et al., Topological order planner for pomdps, IJCAI, 2009, 9: 1684–1689.Google Scholar
  25. [25]
    Poon K M, A fast heuristic algorithm for decision-theoretic planning, Thesis (M. Phil.), Hong Kong University of Science and Technology, 2001.Google Scholar

Copyright information

© Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Bingbing Liu
    • 1
  • Yu Kang
    • 1
  • Xiaofeng Jiang
    • 1
  • Jiahu Qin
    • 1
  1. 1.Department of AutomationUniversity of Science and Technology of ChinaHefeiChina

Personalised recommendations