Skip to main content

Reinforcement Based U-Tree: A Novel Approach for Solving POMDP

  • Chapter
Handbook on Decision Making

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 4))

  • 2604 Accesses

Abstract

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  2. Bellman, R.E.: Dynamic Programming. Dover Publications, Incorporated (2003)

    MATH  Google Scholar 

  3. Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. I. Athena Scientific, Belmont (2005)

    MATH  Google Scholar 

  4. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  5. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  6. Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21(5), 1071–1088 (1973)

    Article  MATH  Google Scholar 

  7. Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Operations Research 26(2), 282–304 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  8. Monahan, G.E.: A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms. Management Science 28(1), 1–16 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  9. Lovejoy, W.S.: A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research 28(1), 47–65 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  10. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Papadimitriou, C.H., Tsitsiklis, J.N.: The Complexity of Markov Decision Processes. Mathematics of Operations Research 12(3), 441–450 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  12. Madani, O., Hanks, S., Condon, A.: On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems. In: Proceedings of the National Conference on Artificial Intelligence (1999)

    Google Scholar 

  13. Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13, 33–94 (2000)

    MATH  MathSciNet  Google Scholar 

  14. McCallum, A.K.: Learning to use selective attention and short-term memory in sequential tasks. In: From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior. MIT Press, Cambridge (1996)

    Google Scholar 

  15. Singh, S.P.: Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes. In: Proceedings of the 12th National Conference on Artificial Intelligence. MIT Press, Cambridge (1994)

    Google Scholar 

  16. Mahadevan, S.: Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning 22(1), 159–195 (1996)

    Google Scholar 

  17. Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. II. Athena Scientific, Belmont (2007)

    Google Scholar 

  18. Watkins, C.J.C.H.: Learning from delayed rewards. University of Cambridge, England (1989)

    Google Scholar 

  19. Watkins, C.J.C.H., Dayan, P.: Technical Note: Q-Learning. Machine Learning 8(3), 279–292 (1992)

    MATH  Google Scholar 

  20. Sondik, E.J.: The optimal control of partially observable Markov decision processes. Stanford University, Stanford (1971)

    Google Scholar 

  21. Cassandra, A., Littman, M.L., Zhang, N.L.: Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco (1997)

    Google Scholar 

  22. Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: Eleventh International Conference on Uncertainty in Artificial Intelligence (1995)

    Google Scholar 

  23. Varakantham, P., et al.: Towards efficient computation of error bounded solutions in pomdps: Expected value approximation and dynamic disjunctive beliefs. In: International Joint Conferences on Artificial Intelligence (2007)

    Google Scholar 

  24. Li, H., Liao, X., Carin, L.: Region-based value iteration for partially observable Markov decision processes. In: The 23rd International Conference on Machine Learning (2006)

    Google Scholar 

  25. Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. In: Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  26. Washington, R.: Incremental Markov-model planning. In: Proceedings of TAI 1996, Eighth IEEE International Conference on Tools With Artificial Intelligence (1996)

    Google Scholar 

  27. Hansen, E.A.: Solving POMDPs by searching in policy space. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (1998)

    Google Scholar 

  28. Hauskrecht, M.: Planning and control in stochastic domains with imperfect information. Massachusetts Institute of Technology (1997)

    Google Scholar 

  29. Ross, S., Pineau, J., Chaib-draa, B.: Theoretical Analysis of Heuristic Search Methods for Online POMDPs. Advances in Neural Information Processing Systems 20 (2008)

    Google Scholar 

  30. Smith, T., Simmons, R.G.: Point-Based POMDP Algorithms: Improved Analysis and Implementation. In: Proceedings of International Conference on Uncertainty in Artificial Intelligence (2005)

    Google Scholar 

  31. Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005)

    MATH  Google Scholar 

  32. Pineau, J., Gordon, G., Thrun, S.: Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research 27, 335–380 (2006)

    MATH  Google Scholar 

  33. Ji, S., et al.: Point-based policy iteration. In: Proceedings of the 22nd National Conference on Artificial Intelligence (2007)

    Google Scholar 

  34. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (2003)

    Google Scholar 

  35. Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)

    Google Scholar 

  36. Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  37. Baird, L., Moore, A.: Gradient descent for general reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 11. MIT Press, Cambridge (1999)

    Google Scholar 

  38. Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press, Menlo Park (1992)

    Google Scholar 

  39. Littman, M.L.: Memoryless policies: Theoretical limitations and practical results. In: From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior. The MIT Press, Cambridge (1994)

    Google Scholar 

  40. Meuleau, N., et al.: Learning finite-state controllers for partially observable environments. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  41. Press, W.H., et al.: Numerical Recipes in C: the art of scientific computing, pp. 623–626. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  42. Ron, D., Singer, Y.: The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning (1996)

    Google Scholar 

  43. Shani, G., Brafman, R.I., Shimony, S.E.: Model-based online learning of POMDPs. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 353–364. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  44. McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zheng, L., Cho, SY., Quek, C. (2010). Reinforcement Based U-Tree: A Novel Approach for Solving POMDP. In: Jain, L.C., Lim, C.P. (eds) Handbook on Decision Making. Intelligent Systems Reference Library, vol 4. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13639-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13639-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13638-2

  • Online ISBN: 978-3-642-13639-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics