Advertisement

Generalising Experience in Reinforcement Learning: Performance in Partially Observable Processes

  • C. H. C. Ribeiro
Conference paper

Abstract

Reinforcement learning algorithms have been used as reasonably efficient model-free techniques for solving small, perfectly observable Markov decision processes. When perfect state determination is impossible, performance is expected to degrade as a result of incorrect updates carried out in wrong regions of the state space. It is shown here that in this case a modified spreading version of Q-learning which takes into account its own uncertainty about the visited states is advantageous if the spreading mechanism fits a measure of similarity on the action-state space. In particular, an agent with an active perception capacity can use an expectation of similar past histories leading to similar results as a justification for this spreading mechanism.

Keywords

Reinforcement Learning Attentional Setting Memory Window Spreading Mechanism Information Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311–365, 1992.CrossRefGoogle Scholar
  2. [2]
    C.H.C. Ribeiro and C. Szepesvári. Q-Learning combined with spreading: Convergence and results. In Procs. of the ISRF-IEE International Conf. on Intelligent and Cognitive Systems (Neural Networks Symposium), pages 32–36, 1996.Google Scholar
  3. [3]
    S.J. Russell and P. Norvig. Artificial Intelligence: a modern approach. Prentice-Hall, 1995.Google Scholar
  4. [4]
    R.S. Sutton. Generalization in reinforcement learning: Succesful examples using sparse coarse coding. In Daniel S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038–1044. MIT Press, 1996.Google Scholar
  5. [5]
    C.K. Tham. Modular On-Line Function Approximation for Scaling Up Reinforcement Learning. PhD thesis, University of Cambridge, 1994.Google Scholar
  6. [6]
    C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.Google Scholar
  7. [7]
    S.D. Whitehead and D.H. Ballard. Active perception and reinforcement learning. Neural Computation, 2:409–419, 1990.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 1998

Authors and Affiliations

  • C. H. C. Ribeiro
    • 1
  1. 1.Neural Systems Engineering GroupImperial CollegeLondonEngland

Personalised recommendations