A Model of Neuronal Specialization Using Hebbian Policy-Gradient with “Slow” Noise

  • Emmanuel Daucé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5768)


We study a model of neuronal specialization using a policy gradient reinforcement approach. (1) The neurons stochastically fire according to their synaptic input plus a noise term; (2) The environment is a closed-loop system composed of a rotating eye and a visual punctual target; (3) The network is composed of a foveated retina, a primary layer and a motoneuron layer; (4) The reward depends on the distance between the subjective target position and the fovea and (5) the weight update depends on a Hebbian trace defined according to a policy gradient principle. In order to take into account the mismatch between neuronal and environmental integration times, we distort the firing probability with a “pink noise” term whose autocorrelation is of the order of 100 ms, so that the firing probability is overestimated (or underestimated) for about 100 ms periods. The rewards occuring meanwhile assess the “value” of those elementary shifts, and modify the firing probability accordingly. Every motoneuron being associated to a particular angular direction, we test at the end of the learning process the preferred output of the visual cells. We find that accordingly with the observed final behavior, the visual cells preferentially excite the motoneurons heading in the opposite angular direction.


Neuronal Specialization Reinforcement Learning Synaptic Input Motor Command Firing Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Watkins, C.J., Dayan, P.: Q-learning. Machine learning 8, 279–292 (1992)zbMATHGoogle Scholar
  2. 2.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)zbMATHGoogle Scholar
  3. 3.
    Bartlett, P.L., Baxter, J.: Hebbian synaptic modifications in spiking neurons that learn. Technical report, Research School of Information Sciences and Engineering, Australian National University (1999)Google Scholar
  4. 4.
    Sebastian Seung, H.: Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40, 1063–1073 (2003)CrossRefGoogle Scholar
  5. 5.
    Baras, D., Meir, R.: Reinforcement learning, spike time dependent plasticity and the bcm rule. Neural Computation 19(8), 2245–2279 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Gerstner, W., Kistler, W.: Spiking Neuron Models. Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)CrossRefGoogle Scholar
  7. 7.
    Florian, R.V.: A reinforcement learning algorithm for spiking neural networks. In: Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2005), pp. 299–306 (2005)Google Scholar
  8. 8.
    Soula, H., Beslon, G., Mazet, O.: Spontaneous dynamics of asymmetric random recurrent spiking neural networks. Neural Computation 18, 60–79 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Sutton, R.S.: Learning to predict by the method of temporal differences. Machine learning 3, 9–44 (1988)Google Scholar
  10. 10.
    Hebb, D.: The Organization of behavior. Wiley, New York (1949)Google Scholar
  11. 11.
    Softky, W., Koch, C.: The highly irregular firing of cortical cells is inconsistent with temporal integration of random epsps. J. of Neuroscience 13(1), 334–450 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Emmanuel Daucé
    • 1
    • 2
  1. 1.INRIA Lille Nord-Europe, Villeneuve d’AscqFrance
  2. 2.Institute of Movement SciencesUniversity of the MediterraneanMarseilleFrance

Personalised recommendations