Could Active Perception Aid Navigation of Partially Observable Grid Worlds?
Due to the unavoidable fact that a robot’s sensors will be limited in some manner, it is entirely possible that it can find itself unable to distinguish between differing states of the world (the world is in effect partially observable). If reinforcement learning is used to train the robot, then this confounding of states can have a serious effect on its ability to learn optimal and stable policies. Good results have been achieved by enhancing reinforcement learning algorithms through the addition of memory or the use of internal models. In our work we take a different approach and consider whether active perception could be used instead. We test this using omniscient oracles, who play the role of a robot’s active perceptual system, in a simple grid world navigation problem. Our results indicate that simple reinforcement learning algorithms can learn when to consult these oracles, and as a result learn optimal policies.
KeywordsPhysical Action Active Perception Reinforcement Learning Algorithm Global Impairment Total Step
- 1.Crook, P.A., Hayes, G.: Learning in a state of confusion: Perceptual aliasing in grid world navigation. In: Towards Intelligent Mobile Robots 2003 (TIMR 2003), 4th British Conference on (Mobile) Robotics, UWE, Bristol (2003) (in press)Google Scholar
- 2.Whitehead, S.D.: Reinforcement Learning for the Adaptive Control of Perception and Action. PhD thesis, University of Rochester, Department of Computer Science, Rochester, New York (1992)Google Scholar
- 3.Lanzi, P.L.: Adaptive agents with reinforcement learning and internal memory. In: Meyer, J.-A., et al. (eds.) From Animals to Animats 6: Proceedings of the Sixth International Conference on the Simulation of Adaptive Behavior (SAB 2000), pp. 333–342. The MIT Press, Cambridge (2000)Google Scholar
- 4.Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Tenth National Conference on Artificial Intelligence, pp. 183–188. AAAI/MIT Press (1992)Google Scholar
- 5.McCallum, A.K.: Overcoming incomplete perception with utile distinction memory. In: Tenth International Machine Learning Conference (ML 1993), Amherst, MA, pp. 190–196 (1993)Google Scholar
- 6.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)Google Scholar
- 7.Littman, M.L.: Memoryless policies: Theoretical limitations and practical results. In: Cliff, D., et al. (eds.) From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior (SAB 1994), pp. 238–245. The MIT Press, Cambridge (1994)Google Scholar
- 8.Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without stateestimation in partially observable Markovian decision processes. In: International Conference on Machine Learning, pp. 284–292 (1994)Google Scholar
- 9.Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: Proc. 15th International Conf. on Machine Learning, pp. 323–331. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 10.McCallum, A.K.: Learning to use selective attention and short-term memory in sequential tasks. In: Maes, P., et al. (eds.) From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior (SAB 1996), pp. 315–324. The MIT Press, Cambridge (1996)Google Scholar