Generalising Experience in Reinforcement Learning: Performance in Partially Observable Processes
Reinforcement learning algorithms have been used as reasonably efficient model-free techniques for solving small, perfectly observable Markov decision processes. When perfect state determination is impossible, performance is expected to degrade as a result of incorrect updates carried out in wrong regions of the state space. It is shown here that in this case a modified spreading version of Q-learning which takes into account its own uncertainty about the visited states is advantageous if the spreading mechanism fits a measure of similarity on the action-state space. In particular, an agent with an active perception capacity can use an expectation of similar past histories leading to similar results as a justification for this spreading mechanism.
KeywordsReinforcement Learning Attentional Setting Memory Window Spreading Mechanism Information Vector
Unable to display preview. Download preview PDF.
- C.H.C. Ribeiro and C. Szepesvári. Q-Learning combined with spreading: Convergence and results. In Procs. of the ISRF-IEE International Conf. on Intelligent and Cognitive Systems (Neural Networks Symposium), pages 32–36, 1996.Google Scholar
- S.J. Russell and P. Norvig. Artificial Intelligence: a modern approach. Prentice-Hall, 1995.Google Scholar
- R.S. Sutton. Generalization in reinforcement learning: Succesful examples using sparse coarse coding. In Daniel S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038–1044. MIT Press, 1996.Google Scholar
- C.K. Tham. Modular On-Line Function Approximation for Scaling Up Reinforcement Learning. PhD thesis, University of Cambridge, 1994.Google Scholar
- C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.Google Scholar