Composing functions to speed up reinforcement learning in a changing world

  • Chris Drummond
Reinforcement Learning
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)


This paper presents a system that transfers the results of prior learning to speed up reinforcement learning in a changing world. Often, even when the change to the world is relatively small an extensive relearning effort is required. The new system exploits strong features in the multi-dimensional function produced by reinforcement learning. The features generate a partitioning of the state space. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. The experimental results investigate one important example of a changing world, a new goal position. In this situation, there is close to a two orders of magnitude increase in learning rate over using a basic reinforcement learning algorithm.


State Space Case Base Plane Graph Reinforcement Learning Goal Position 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    A. D. Christiansen. Learning to predict in uncertain continuous tasks. ICML pp 72–81, 1992.Google Scholar
  2. 2.
    L. D. Cohen and Isaac Cohen. Finite element methods for active contour models and balloons for 2-d and 3-d images. PAMI 15(11):1131–1147, Nov 1993.Google Scholar
  3. 3.
    E. W. Dijkstra. A note on two problems in connection with graphs. Numer. Math. 1:269–271, 1959.CrossRefGoogle Scholar
  4. 4.
    C. Drummond. Using a case-base of surfaces to speed-up reinforcement learning. LNAI volume 1266, pp 435–444, 1997.Google Scholar
  5. 5.
    K. J. Hammond. Case-based planning: A framework for planning from experience. Journal of Cognitive Science 14(3):85–443, July 1990.Google Scholar
  6. 6.
    A. MacDonald. Graphs: Notes on symetries, imbeddings, decompositions. Elec. Eng. Dept. TR-92-10-AJM, Brunel University, Uxbridge, Middx, U. K., Oct 1992.Google Scholar
  7. 7.
    S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55:311–365, 1992.CrossRefGoogle Scholar
  8. 8.
    J. Peng. Efficient memory-based dynamic programming. ICML pp 438–439 1995.Google Scholar
  9. 9.
    D. Precup and R. S. Sutton. Multi-time models for temporally abstract planning. NIPS 10 1997.Google Scholar
  10. 10.
    J. W. Sheppard and S. L. Salzberg. A teaching strategy for memory-based control. Artificial Intelligence Review 11:343–370, 1997.CrossRefGoogle Scholar
  11. 11.
    S. P. Singh. Reinforcement learning with a hierarchy of abstract models. AAAI pp 202–207, 1992.Google Scholar
  12. 12.
    P. Suetens, P. Fua, and A. Hanson. Computational strategies for object recognition. Computing surveys 4(1):5–61, 1992.CrossRefGoogle Scholar
  13. 13.
    R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. NIPS 8 pp 1038–1044, 1996.Google Scholar
  14. 14.
    R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. ICML pp 216–224, 1990.Google Scholar
  15. 15.
    P. Tadepalli and D. Ok. Scaling up average reward reinforcement learning by approximating learning by approximating. ICML pp 471–479, 1996.Google Scholar
  16. 16.
    S. Thrun and A. Schwartz. Finding structure in reinforcement learning. NIPS 7 pp 385–392 1994.Google Scholar
  17. 17.
    M. M. Veloso and J. G. Carbonell. Derivational analogy in prodigy: Automating case acquisition, storage and utilization. Machine Learning, 10(3):249–278, 1993.CrossRefGoogle Scholar
  18. 18.
    C. J. C. H. Watkins and P. Dayan. Technical note: Q-learning. Machine Learning, 8(3–4):279–292, May 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Chris Drummond
    • 1
  1. 1.Department of Computer ScienceUniversity of OttawaOttawaCanada

Personalised recommendations