Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

  • Matthias Rungger
  • Hao Ding
  • Olaf Stursberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5499)


In order to establish autonomous behavior for technical systems, the well known trade-off between reactive control and deliberative planning has to be considered. Within this paper, we combine both principles by proposing a two-level hierarchical reinforcement learning scheme to enable the system to autonomously determine suitable solutions to new tasks. The approach is based on a behavior representation specified by hybrid automata, which combines continuous and discrete behavior, to predict (anticipate) the outcome of a sequence of actions. On the higher layer of the hierarchical scheme, the behavior is abstracted in the form of finite state automata, on which value function iteration is performed to obtain a goal leading sequence of subtasks. This sequence is realized on the lower layer by applying policy gradient-based reinforcement learning to the hybrid automaton model. The iteration between both layers leads to a consistent and goal-attaining behavior, as shown for a simple robot grasping task.


Reinforcement learning hierarchical model hybrid automaton behavioral programming artificial intelligence planning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arkin, R.C.: An Behavior-based Robotics. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)Google Scholar
  3. 3.
    Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
  4. 4.
    Branicky, M.S.: Behavioral Programming. In: Working Notes AAAI Spring Symp. on Hybrid Systems and AI (1999)Google Scholar
  5. 5.
    Butz, M.V., Sigaud, O., Gérard, P.: Anticipatory Behavior: Exploiting Knowledge About the Future to Improve Current Behavior. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS, vol. 2684, pp. 1–10. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Butz, M.V., Sigaud, O., Gérard, P.: Internal Models and Anticipations in Adaptive Learning Systems. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS, vol. 2684, pp. 86–109. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Ding, H., Rungger, M., Stursberg, O.: Intelligent Planning of Manufacturing Systems with Hybrid Dynamics. In: IFAC Conf. on Manufacturing Modeling, Management, and Control, pp. 181–186 (2007)Google Scholar
  9. 9.
    Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)CrossRefGoogle Scholar
  10. 10.
    Egerstedt, M.: Behavior Based Robotics Using Hybrid Automata. In: Lynch, N.A., Krogh, B.H. (eds.) HSCC 2000. LNCS, vol. 1790, pp. 103–116. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Henzinger, T.: The Theory of Hybrid Automata. In: Proceedings of the 11th Annual IEEE Symposium on Logic in Computer Science (LICS 1996), pp. 278–292 (1996)Google Scholar
  12. 12.
    Mataric, M.J.: Reward functions for accelerated learning. In: Proc. of the 11th Int. Conf. on Machine Learning, pp. 181–189. Morgan Kaufmann, San Francisco (1994)Google Scholar
  13. 13.
    Tejas, R.: Mehta and Magnus Egerstedt. Multi-modal control using adaptive motion description languages. Automatica 44, 1912–1917 (2008)CrossRefzbMATHGoogle Scholar
  14. 14.
    Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical RL. Robotics and Autonomous Systems 36(1), 37–51 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Parr, R., Russell, S.: Russell Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, vol. 10, pp. 1043–1049. The MIT Press, Cambridge (1997)Google Scholar
  16. 16.
    Pirjanian, P.: Multiple objective behavior-based control 31, 53–60 (2000)Google Scholar
  17. 17.
    Precup, D., Sutton, R.S., Singh, S.P.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Rungger, M., Stursberg, O., Spanfelner, B., Leuxner, C., Sitou, W.: Efficient Planning of Autonomous Robots using Hierarchical Composition. In: 5th Int. Conf. on Informatics, Control, Automation, Robotics, pp. 262–267 (2008)Google Scholar
  19. 19.
    Mohajerian, P., Schaal, S., Ijspeert, A.: Dynamics Systems vs. Optimal Control – A Unifying View, ch. 27, pp. 425–445. Elsevier, Amsterdam (2007)Google Scholar
  20. 20.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Matthias Rungger
    • 1
  • Hao Ding
    • 1
  • Olaf Stursberg
    • 1
  1. 1.Institute of Automatic Control EngineeringTechnische Universität MünchenMunichGermany

Personalised recommendations