Homeokinetic Reinforcement Learning

  • Simón C. Smith
  • J. Michael Herrmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7081)


In order to find a control policy for an autonomous robot by reinforcement learning, the utility of a behaviour can be revealed locally through a modulation of the motor command by probing actions. For robots with many degrees of freedom, this type of exploration becomes inefficient such that it is an interesting option to use an auxiliary controller for the selection of promising probing actions. We suggest here to optimise the exploratory modulation by a self-organising controller. The approach is illustrated by two control tasks, namely swing-up of a pendulum and walking in a simulated hexapod. The results imply that the homeokinetic approach is beneficial for high complexity problems.


Utility Function Reinforcement Learning Control Task Motor Command Reward Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics 13, 834–846 (1983)CrossRefGoogle Scholar
  2. 2.
    Der, R.: Self-organized acquisition of situated behavior. Theory Biosci. 120, 179–187 (2001)CrossRefGoogle Scholar
  3. 3.
    Der, R., Michael Herrmann, J., Liebscher, R.: Homeokinetic approach to autonomous learning in mobile robots. VDI-Berichte, vol. 1679, pp. 301–306 (2002)Google Scholar
  4. 4.
    Der, R., Liebscher, R.: True autonomy from self-organized adaptivity. In: Workshop Biologically Inspired Robotics, Bristol (2002)Google Scholar
  5. 5.
    Doya, K.: Reinforcement learning in continuous time and space. Neural Computation 12, 219–245 (2000)CrossRefGoogle Scholar
  6. 6.
    Ekeberg, Ö., Blümel, M., Büschges, A.: Dynamic simulation of insect walking. Arthropod Structure & Development 33, 287–300 (2004)CrossRefGoogle Scholar
  7. 7.
    Gullapalli, V.: A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 3, 671–692 (1990)CrossRefGoogle Scholar
  8. 8.
    Martius, G.: Goal-Oriented Control of Self-Organizing Behavior in Autonomous Robots. PhD thesis, Göttingen University (2010)Google Scholar
  9. 9.
    Martius, G., Herrmann, J.M.: Tipping the scales: Guidance and intrinsically motivated behavior. In: Proc. of Europ. Conf. on Artificial Life (2011)Google Scholar
  10. 10.
    Martius, G., Herrmann, J.M., Der, R.: Guided Self-Organisation for Autonomous Robot Development. In: Almeida e Costa, F., Rocha, L.M., Costa, E., Harvey, I., Coutinho, A. (eds.) ECAL 2007. LNCS (LNAI), vol. 4648, pp. 766–775. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Martius, G., Hesse, F., Güttler, F., Der, R.: Lpzrobots: A free and powerful robot simulator (2011),
  12. 12.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998); A Bradford BookGoogle Scholar
  14. 14.
    Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine. Hermann, Paris (1948)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Simón C. Smith
    • 1
  • J. Michael Herrmann
    • 1
  1. 1.Institute of Perception, Action and Behaviour, School of InformaticsThe University of EdinburghEdinburghU.K.

Personalised recommendations