Advertisement

Reward responses of dopamine neurons: A biological reinforcement signal

  • Wolfram Schultz
Part I: Coding and Learning in Biology
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1327)

Abstract

A class of reinforcement models termed Temporal Difference (TD) models has been developed from theoretical grounds as effective algorithms for various learning situations. Based on the observation that learning depends on the unpredictability of primary motivating events, these models use errors in the prediction of reinforcing events as teaching signals. Independent of the theoretical work, neuophysiological experiments have revealed that neurons in the mammalian midbrain using the neurotransmitter dopamine process information about rewards and reward-predicting stimuli in a very similar manner as the teaching signal of TD models.

Keywords

Conditioned Stimulus Dendritic Spine Dopamine Neuron Synaptic Weight Striatal Neuron 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexander, G.E. and Crutcher, M.D.: Neural representations of the target (goal) of visually guided arm movements in three motor areas of the monkey. J. Neurophysiol. 64: 164–178, 1990Google Scholar
  2. 2.
    Calabresi, P., Maj, R., Mercuri, N.B. and Bernardi, G.: Coactivation of D1 and D2 dopamine receptors is required for long-term synaptic depression in the striatum. Neurosci. Lett. 142: 95–99, 1992Google Scholar
  3. 3.
    Calabresi, P., Pisani, A., Mercuri, N.B. and Bernardi, G.: Long-term potentiation in the striatum is unmasked by removing the voltage-dependent magnesium block of NMDA receptor channels. Europ. J. Neurosci. 4: 929–935, 1992Google Scholar
  4. 4.
    Contreras-Vidal, J.L. and Schultz, W.: A neural network model of reward-related learning, motivation and orienting behavior. Soc. Neurosci. Abstr. 22: 2029, 1996Google Scholar
  5. 5.
    Crutcher, M.D. and DeLong, M.R.: Single cell studies of the primate putamen. II. Relations to direction of movement and pattern of muscular activity. Exp. Brain Res. 53: 244–258, 1984Google Scholar
  6. 6.
    Dickinson, A.: Contemporary animal learning theory. Cambridge University Press, Cambridge 1980Google Scholar
  7. 7.
    Doucet, G., Descarries, L. and Garcia, S.: Quantification of the dopamine innervation in adult rat neostriatum. Neuroscience 19: 427–445, 1986Google Scholar
  8. 8.
    Filion, M., Tremblay, L. and Bédard, P.J.: Abnormal influences of passive limb movement on the activity of globus pallidus neurons in parkinsonian monkey. Brain Res. 444: 165–176, 1988Google Scholar
  9. 9.
    Flaherty, A.W. and Graybiel, A.: Two input systems for body representations in the primate striatal matrix: experimental evidence in the squirrel monkey. J. Neurosci. 13: 1120–1137, 1993Google Scholar
  10. 10.
    Freund, T.T., Powell, J.F. and Smith, A.D.: Tyrosine hydroxylaseimmunoreactive boutons in synaptic contact with identified striatonigral neurons, with particular reference to dendritic spines. Neuroscience 13: 1189–1215, 1984Google Scholar
  11. 11.
    Friston, K.J., Tononi, G., Reeke, G.N.Jr., Sporns, O. and Edelman, G.M.: Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59: 229–243, 1994Google Scholar
  12. 12.
    Goldman-Rakic, P.S., Leranth, C., Williams, M.S., Mons, N. and Geffard, M.: Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex. Proc. Natl.Acad. Sci. USA 86: 9015–9019, 1989Google Scholar
  13. 13.
    Hikosaka, O., Sakamoto, M. and Usui, S.: Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J. Neurophysiol. 61: 814–832, 1989Google Scholar
  14. 14.
    Kimura, M.: Behaviorally contingent property of movement-related activity of the primate putamen. J. Neurophysiol. 63: 1277–1296, 1990Google Scholar
  15. 15.
    Ljungberg, T., Apicella, P. and Schultz, W.: Responses of monkey midbrain dopamine neurons during delayed alternation performance. Brain Res. 586: 337–341, 1991Google Scholar
  16. 16.
    Ljungberg, T., Apicella, P. and Schultz, W.: Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 67: 145–163, 1992Google Scholar
  17. 17.
    Mackintosh, N.J.: A theory of attention: Variations in the associability of stimulus with reinforcement. Psychol. Rev. 82: 276–298, 1975Google Scholar
  18. 18.
    Mirenowicz, J. and Schultz, W.: Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72: 1024–1027, 1994Google Scholar
  19. 19.
    Mirenowicz, J. and Schultz, W.: Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379: 449–451, 1996Google Scholar
  20. 20.
    Montague, P.R., Dayan, P., Nowlan, S.J., Pouget, A. and Sejnowski, T.J.: Using aperiodic reinforcement for directed self-organization during development. In: Neural Information Processing Systems 5 (Eds. S.J. Hanson, J.D. Cowan and C.L. Giles). pp. 969–976. Morgan Kaufmann, San Mateo, 1993Google Scholar
  21. 21.
    Montague, P.R., Dayan, P. and Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16: 1936–1947, 1996Google Scholar
  22. 22.
    Pearce, J.M. and Hall, G.: A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87: 532–552, 1980Google Scholar
  23. 23.
    Rescorla, R.A. and Wagner, A.R.: A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Classical Conditioning II: Current Research and Theory (Eds. Black, A.H. and Prokasy, W.F.) New York: Appleton Century Crofts, pp. 64–99, 1972Google Scholar
  24. 24.
    Rolis, E.T., Thorpe, S.J. and Maddison, S.P.: Responses of striatal neurons in the behaving monkey. I. Head of the caudate nucleus. Behav. Brain Res. 7: 179–210, 1983Google Scholar
  25. 25.
    Romo, R. and Schultz, W.: Dopamine neurons of the monkey midbrain: Contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63: 592–606, 1990Google Scholar
  26. 26.
    Schultz, W.: Activity of dopamine neurons in the behaving primate. Sem. Neurosci. 4: 129–138, 1992Google Scholar
  27. 27.
    Schultz, W., Dayan, P. and Montague, R.R.: A neural substrate of prediction and reward. Science 275: 1593–1599, 1997Google Scholar
  28. 28.
    Schultz, W., Apicella, P., Scarnati, E. and Ljungberg, T.: Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12: 4595–4610, 1992Google Scholar
  29. 29.
    Schultz, W., Apicella, P. and Ljungberg, T.: Responses of monkey dopamine neurons during performance of a delayed response task. J. Neurosci. 13: 900–913, 1993Google Scholar
  30. 30.
    Schultz, W. and Romo, R.: Dopamine neurons of the monkey midbrain: Contingencies of responses to stimuli eliciting immediate behavioral reactions. J. Neurophysiol. 63: 607–624, 1990Google Scholar
  31. 31.
    Schultz, W. and Romo, R.: Role of primate basal ganglia and frontal cortex in the internal generation of movements: comparison with instruction-induced preparatory activity in striatal neurons. Exp. Brain Res. 91: 363–384, 1992Google Scholar
  32. 32.
    Schultz, W., Romo, R., Ljungberg, T., Mirenowicz, J., Hollerman, J.R. and Dickinson, A.: Reward-related signals carried by dopamine neurons. In: Models of Information Processing in the Basal Ganglia (Eds. J.C.Houk, J.L.Davis and D.G.Beiser) MIT Press, Cambridge, MA, pp. 233–248, 1995Google Scholar
  33. 33.
    Schultz, W., Ruffieux, A. and Aebischer, P.: The activity of pars compacta neurons of the monkey substantia nigra in relation to motor activation. Exp. Brain Res. 51: 377–387, 1983Google Scholar
  34. 34.
    Smith, A.D. and Bolam, J.P.: The neural network of the basal ganglia as revealed by the study of synaptic connections of identified neurones. Trends Neurosci. 13: 259–265, 1990Google Scholar
  35. 35.
    Steinfels, G.F., Heym, J., Strecker, R.E. and Jacobs, B.L.: Behavioral correlates of dopaminergic unit activity in freely moving cats. Brain Res. 258: 217–228, 1983Google Scholar
  36. 36.
    Suri, R. and Schultz, W.: A neural learning model based on the activity of primate dopamine neurons. Soc. Neurosci. Abstr. 22: 1389, 1996Google Scholar
  37. 37.
    Sutton, R.S. and Barto, A.G.: Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88: 135–170, 1981Google Scholar
  38. 38.
    Sutton, R.S. and Barto, A.G.: Time-derivative Models of Pavlovian Reinforcement. In: Learning and Computational Neuroscience: Foundations of Adaptive Networks (Eds. M. Gabriel and J. Moore). MIT Press, Cambridge, pp. 497–537, 1990Google Scholar
  39. 39.
    Toan, D.L. and Schultz, W.: Responses of rat pallidum cells to cortex stimulation and effects of altered dopaminergic activity. Neuroscience 15: 683–694, 1985Google Scholar
  40. 40.
    Wickens, J. and Kotter, R.: Cellular models of reinforcement. In: Models of Information Processing in the Basal Ganglia (Eds. J.C.Houk, J.L.Davis and D.G.Beiser) MIT Press, Cambridge, MA, pp. 187–214, 1995Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Wolfram Schultz
    • 1
  1. 1.Institute of PhysiologyUniversity of FribourgFribourgSwitzerland

Personalised recommendations