Reinforcement Based U-Tree: A Novel Approach for Solving POMDP

Zheng, Lei; Cho, Siu-Yeung; Quek, Chai

doi:10.1007/978-3-642-13639-9_9

Lei Zheng⁵,
Siu-Yeung Cho⁵ &
Chai Quek⁵

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 4))

2604 Accesses

Abstract

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Bellman, R.E.: Dynamic Programming. Dover Publications, Incorporated (2003)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. I. Athena Scientific, Belmont (2005)
MATH Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21(5), 1071–1088 (1973)
Article MATH Google Scholar
Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Operations Research 26(2), 282–304 (1978)
Article MATH MathSciNet Google Scholar
Monahan, G.E.: A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms. Management Science 28(1), 1–16 (1982)
Article MATH MathSciNet Google Scholar
Lovejoy, W.S.: A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research 28(1), 47–65 (1991)
Article MATH MathSciNet Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Article MATH MathSciNet Google Scholar
Papadimitriou, C.H., Tsitsiklis, J.N.: The Complexity of Markov Decision Processes. Mathematics of Operations Research 12(3), 441–450 (1987)
Article MATH MathSciNet Google Scholar
Madani, O., Hanks, S., Condon, A.: On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems. In: Proceedings of the National Conference on Artificial Intelligence (1999)
Google Scholar
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13, 33–94 (2000)
MATH MathSciNet Google Scholar
McCallum, A.K.: Learning to use selective attention and short-term memory in sequential tasks. In: From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior. MIT Press, Cambridge (1996)
Google Scholar
Singh, S.P.: Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes. In: Proceedings of the 12th National Conference on Artificial Intelligence. MIT Press, Cambridge (1994)
Google Scholar
Mahadevan, S.: Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning 22(1), 159–195 (1996)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. II. Athena Scientific, Belmont (2007)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. University of Cambridge, England (1989)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Technical Note: Q-Learning. Machine Learning 8(3), 279–292 (1992)
MATH Google Scholar
Sondik, E.J.: The optimal control of partially observable Markov decision processes. Stanford University, Stanford (1971)
Google Scholar
Cassandra, A., Littman, M.L., Zhang, N.L.: Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco (1997)
Google Scholar
Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: Eleventh International Conference on Uncertainty in Artificial Intelligence (1995)
Google Scholar
Varakantham, P., et al.: Towards efficient computation of error bounded solutions in pomdps: Expected value approximation and dynamic disjunctive beliefs. In: International Joint Conferences on Artificial Intelligence (2007)
Google Scholar
Li, H., Liao, X., Carin, L.: Region-based value iteration for partially observable Markov decision processes. In: The 23rd International Conference on Machine Learning (2006)
Google Scholar
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. In: Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Washington, R.: Incremental Markov-model planning. In: Proceedings of TAI 1996, Eighth IEEE International Conference on Tools With Artificial Intelligence (1996)
Google Scholar
Hansen, E.A.: Solving POMDPs by searching in policy space. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (1998)
Google Scholar
Hauskrecht, M.: Planning and control in stochastic domains with imperfect information. Massachusetts Institute of Technology (1997)
Google Scholar
Ross, S., Pineau, J., Chaib-draa, B.: Theoretical Analysis of Heuristic Search Methods for Online POMDPs. Advances in Neural Information Processing Systems 20 (2008)
Google Scholar
Smith, T., Simmons, R.G.: Point-Based POMDP Algorithms: Improved Analysis and Implementation. In: Proceedings of International Conference on Uncertainty in Artificial Intelligence (2005)
Google Scholar
Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005)
MATH Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research 27, 335–380 (2006)
MATH Google Scholar
Ji, S., et al.: Point-based policy iteration. In: Proceedings of the 22nd National Conference on Artificial Intelligence (2007)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (2003)
Google Scholar
Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)
Google Scholar
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Baird, L., Moore, A.: Gradient descent for general reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 11. MIT Press, Cambridge (1999)
Google Scholar
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press, Menlo Park (1992)
Google Scholar
Littman, M.L.: Memoryless policies: Theoretical limitations and practical results. In: From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior. The MIT Press, Cambridge (1994)
Google Scholar
Meuleau, N., et al.: Learning finite-state controllers for partially observable environments. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Press, W.H., et al.: Numerical Recipes in C: the art of scientific computing, pp. 623–626. Cambridge University Press, Cambridge (1997)
Google Scholar
Ron, D., Singer, Y.: The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning (1996)
Google Scholar
Shani, G., Brafman, R.I., Shimony, S.E.: Model-based online learning of POMDPs. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 353–364. Springer, Heidelberg (2005)
Chapter Google Scholar
McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University,
Lei Zheng, Siu-Yeung Cho & Chai Quek

Authors

Lei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Siu-Yeung Cho
View author publications
You can also search for this author in PubMed Google Scholar
Chai Quek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of South Australia, Adelaide, Australia
Lakhmi C. Jain
University of Science Malaysia, Malaysia
Chee Peng Lim

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zheng, L., Cho, SY., Quek, C. (2010). Reinforcement Based U-Tree: A Novel Approach for Solving POMDP. In: Jain, L.C., Lim, C.P. (eds) Handbook on Decision Making. Intelligent Systems Reference Library, vol 4. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13639-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-13639-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13638-2
Online ISBN: 978-3-642-13639-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics