On Reinforcement Memory for Non-Markovian Control

  • Hassab Elgawi Osman
Conference paper


This paper contributes on designing robotic memory controller for solving non-Markovian reinforcement tasks, which correspond to a great deal of real-life stochastic predictions and control problems. Instead of holistic search for the whole memory contents, the controller adopts associated feature analysis to produce the most likely relevant action from previous experiences. Actor-Critic (AC) learning is used to adaptively tune the control parameters, while an on-line variant of decisiontrees ensemble learner is used as memory-capable to approximate the policy of the Actor and the value function of the Critic. Learning capability is experimentally examined through non-Markovian cart-pole balancing task. The result shows that the proposed controller acquired complex behaviors such as balancing two poles simultaneously.


Reinforcement Learning Memory Controller Memory Content Adaptive Critic Reinforcement Learning Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R., Barto, A.: “Reinforcement Learning: An introduction,”. Cambring, MA: MIT Press (1998).Google Scholar
  2. 2.
    Barto A.: “Adaptive critics and the basal ganglia,”. In: Models of Information Processing in the Basal Ganglia, pp.215-232. Cambridge, MA: MIT Press (1995).Google Scholar
  3. 3.
    Suri, R., Schultz, W.: “A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task,”. In: Neuroscience 91(3):871-890 (1999).CrossRefGoogle Scholar
  4. 4.
    Suri, R., Schultz, W.: “Temporal difference model reproduces anticipatory neural activity,”. In: Neural Computation 13:841-862 (2001).MATHCrossRefGoogle Scholar
  5. 5.
    Chrisman, L.: “Reinforcement learning with perceptual aliasing: The perceptual distinctions approach,”. In: Proc. Int’l. Conf on AAAI, pp.183-188 (1992).Google Scholar
  6. 6.
    Cassandra, A., Kaelbling, L., Littman, M.: “Acting optimally in partially observable stochastic domains,”. In: Proc. Int’l. Conf on AAAI, pp.1023-1028 (1994).Google Scholar
  7. 7.
    Sutton, R., McAllester, D., Singh, S., Mansour, Y.: “Policy gradient methods for reinforcement learning with function approximation,”. In: Advances in Neural Information Processing Systems 12, pp. 1057-1063. MIT Press (2000).Google Scholar
  8. 8.
    Aberdeen, D., Baxter, J.: “Scalable Internal-State Policy-Gradient Methods for POMDPs,”. In: Proc. of 19th Int’l Conf. on Machine Learning 12, pp.3-10. Morgan Kaufmann Publishers Inc. (2002).Google Scholar
  9. 9.
    Tsitsiklis, J., Van Roy, B.: “Featured-based methods for large scale dynamic programming,”. In: Machine Learning 22:59-94 (1996).MATHGoogle Scholar
  10. 10.
    Hassab Elgawi, O.: “RL-Based Memory Controller for Scalable Autonomous Systems,” In: Advances in Neuro-Information Processing, Chi-Sing Leung, Minho Lee, Jonathan Hoyin Chan (Eds.), Part II, LNCS 5864, pp.83-92, (2009).Google Scholar
  11. 11.
    Basak, J.: “Online adaptive decision trees: Pattern classification and function approximation,”. Neural Comput 18:2062-2101 (2004).CrossRefMathSciNetGoogle Scholar
  12. 12.
    Hassab Elgawi, O.: “Online Random Forests based on CorrFS and CorrBE,” In In: Proc. of Conf on Computer Vision and Pattern Recognition Workshop, CVPR, pp.1-7 (2008).Google Scholar
  13. 13.
    Ipek, E., Mutlu, O., Martinez, J., Caruana, R.: “Self-Optimizing Memory Controllers: A Reinforcement Learning Approach,”. In: Intl. Symp. on Computer Architecture (ISCA), pp.39-50 (2008).Google Scholar
  14. 14.
    Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: “Neighbourhood Components Analysis,”. In: Advances in Neural Information Processing Systems 17, MIT Press, pp.513-520 (2005).Google Scholar
  15. 15.
    Keller, P., Mannor, S., Precup, D.: “Automatic basis function construction for approximate dynamic programming and reinforcement learning,”. In: 23rd International Conference on Machine Learning, pp.449-456 (2006).Google Scholar
  16. 16.
    Uchibe, E., Doya, K.: (2006) “Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling,”. In: Proc. of the Eighth Int’l Conf. on Simulation of Adaptive Behavior: From Animals to Animats, 8, MIT Press, Cambridge, MA, 2004, pp.287-296.Google Scholar
  17. 17.
    Leslie P., Michael L., Anthony R. “Planning and acting in partially observable stochastic domains,”. Artificial Intelligence, 101:99-134 (1995).Google Scholar
  18. 18.
    Hassab Elgawi, O.: “Architecture of behavior-based Function Approximator for Adaptive Control,”. In: Proc. 15th Int’l. Conf on Neural Information Processing ICONIP, LNCS 5507, pp.104-111 (2008).Google Scholar
  19. 19.
    Hassab Elgawi, O.: “Random-TD Function Approximator,” In: Journal of Advanced Computational Intelligence and Intelligent Informatics (JACIII), 13(2):155-161 (2009).Google Scholar
  20. 20.
    Meuleau, N., Peshkin, L., Kim, K.-E., Kaelbling, L.: “Learning finite-state controllers for partially observable environments,”. In: Proc of the 15th Int’l Conf on Uncertainty in Artificial Intelligence, pp.427-436 (1999).Google Scholar
  21. 21.
    Peshkin, L., Meuleau, N., Kaelbling, L.: “Learning policies with external memory,”. In: Proc. of the 16th Int’l Conf on Machine Learning, pp.307-314, I. Bratko and S. Dzeroski, (Eds.) (1999) On Reinforcement Memory for Non-markovian ControlGoogle Scholar
  22. 22.
    Kenneth, O.: “Efficient evolution of neural networks through complexification,”. Ph.D. Thesis; Department of Computer Sciences, The University of Texas at Austin. Technical Report AITR-04-314 (2004).Google Scholar
  23. 23.
    Gomez, F.: “Robust non-linear control through neuroevolution,”. Ph.D. Thesis; Department of Computer Sciences, The University of Texas at Austin. Technical Report AI-TR-03-303 (2003).Google Scholar
  24. 24.
    Santamaria, J., Sutton, R., Ram, A.: “Experiments with reinforcement learning in problems with continuous state and action spaces,”. In:Adaptive Behavior, 6(2):163-218 (1998).CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Hassab Elgawi Osman
    • 1
  1. 1.The University of TokyoTokyoJapan

Personalised recommendations