Advertisement

Hierarchical Traces for Reduced NSM Memory Requirements

  • Torbjørn S. Dahl
Conference paper

Abstract

This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.

Keywords

Long Term Memory Reinforcement Learning Memory Requirement Memory Usage Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Dicrete Event Dynamic Systems: Theory and Applications 13, 41–77 (2003)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Chang, C.H., Shibu, M., Xiao, R.: Salf organizing feature map for color quantization on FPGA. In: A.R. Omondi, J.C. Rajapakse (eds.) FPGA Implementations of Neural Networks. Springer (2006)Google Scholar
  3. 3.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)MATHMathSciNetGoogle Scholar
  4. 4.
    Digney, B.L.: Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement learning. In: P. Maes, M. Matari´c, J.A. Meyer, J. Pollack, S.W. Wilson (eds.) From Animals to Animats 4 [Proceedings of the 4th International Conference on Simulation of Adaptive Behavior (SAB’06), Cape Cod, Massachusetts, September 9 - 13, 1996], pp. 363–372. MIT Press/Bradford Books (1996)Google Scholar
  5. 5.
    Fuster, J.M.: Cortex and Mind: Unifying Cognition. Oxford University Press, New York (2003)Google Scholar
  6. 6.
    Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: C. Sammut, A.G. Hoffmann (eds.) Machine Learning [Proceedings of the 19th International Conference (ICML’02), Sydney, Australia, July 8 - 12, 2002], pp. 243–250. Morgan Kaufmann (2002)Google Scholar
  7. 7.
    Hernandez-Gardiol, N., Mahadevan, S.: Hierarchical memory-based reinforcement learning. In: T.K. Leen, T.G. Dietterich, V. Tresp (eds.) Advances in Neural Information Processing Sys-tems 13 [Proceedings of the NIPS Conference, Denver, Colorado, November 28 - 30, 2000], pp. 1047–1053. MIT Press (2001)Google Scholar
  8. 8.
    Ijspeert, A.J., Nakanishi, J., Schaal, S.:Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA’02), pp. 1398–1403. Washington, DC (2002)Google Scholar
  9. 9.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101 (1998)Google Scholar
  10. 10.
    Kohonen, T.: Self-organizing maps, 3rd edn. Springer, New York (2001)MATHGoogle Scholar
  11. 11.
    Konidaris, G.D., Hayes, G.M.: An architechture for behavior-based reinforcement learning. Adaptive Behavior 13(1), 5–32 (2005)CrossRefGoogle Scholar
  12. 12.
    Maes, P.: How to do the right thing. Connection Science 1(3), 291–232 (1989)CrossRefGoogle Scholar
  13. 13.
    McCallum, A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the 12th International Conference on Machine Learning (ICML’95), pp. 387–395. Tahoe City, California (1995)Google Scholar
  14. 14.
    McCallum, A.: Hidden state and reinforcement learning with instance-based state identification. IEEE Transactions on Systems,Man and Cybernetics, Part B: Cybernetics (Special Issue on Robot Learning) 26(3), 464–473 (1996)CrossRefGoogle Scholar
  15. 15.
    McGovern, A.: Autonomous discovery of temporal abstractions from interactions with an environment. Ph.D. thesis, University of Massachusetts, Amherst, Amherst, Massachusetts (2002)Google Scholar
  16. 16.
    Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)Google Scholar
  17. 17.
    Quartz, S.R.: Learning and brain development: A neural constructivist perspective. In: P.T. Quinlan (ed.) Connectionist Models of Development: Developmental Processes in Real and Artificial Neural Networks, pp. 279–310. Psychology Press (2003)Google Scholar
  18. 18.
    Schaal, S., Atkeson, C.G.: Robot juggling: Implementation of memory-based learning. Control Systems Magazine 14(1), 57–71 (1994)CrossRefGoogle Scholar
  19. 19.
    Sun, R., Sessions, C.: Self-segmentation of sequences: Automatic formation of hierarchies of sequential behaviors. IEEE Transactions on Systems, Man and Cybernetics: Part B, Cybernetics 30(3), 403–418 (2000)CrossRefGoogle Scholar
  20. 20.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press, Cambridge, Massachusetts (1998)Google Scholar
  21. 21.
    Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs ans semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Cognitive Robotics Research CentreUniversity of Wales, NewportNewportUK

Personalised recommendations