Skip to main content

Closing the Sensory-Motor Loop on Dopamine Signalled Reinforcement Learning

  • Conference paper
From Animals to Animats 10 (SAB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5040))

Included in the following conference series:

Abstract

It has been shown recently that dopamine signalled modulation of spike timing-dependent synaptic plasticity (DA-STDP) can enable reinforcement learning of delayed stimulus-reward associations when both stimulus and reward are delivered at precisely timed intervals. Here, we test whether a similar model can support learning in an embodied context, in which timing of both sensory input and delivery of reward depend on the agent’s behaviour. We show that effective reinforcement learning is indeed possible, but only when stimuli are gated so as to occur as near-synchronous patterns of neural activity and when neuroanatomical constraints are imposed which predispose agents to exploratative behaviours. Extinction of learned responses in this model is subsequently shown to result from agent-environment interactions and not directly from any specific neural mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience 18, 10464–10472 (1998)

    Google Scholar 

  2. Braitenberg, V.: Vehicles: Experiments in Synthetic Psychology. MIT Press, Cambridge (1984)

    Google Scholar 

  3. deCharms, R.C., Zador, A.: Neural representation and the cortical code. Annual Review of Neuroscience 23, 613–646 (2000)

    Article  Google Scholar 

  4. Gho, M., Varela, F.J.: A quantitative assessment of the dependency of the visual temporal frame upon the cortical rhythm. Journal of Physiology 83(2), 95–101 (1988)

    Google Scholar 

  5. Izhikevich, E.M.: Simple model of spiking neurons. IEEE Transactions on Neural Networks 14, 1569–1572 (2003)

    Article  Google Scholar 

  6. Izhikevich, E.M.: Solving the distal reward problem through linkage of stdp and dopamine signaling. Cerebral Cortex 17, 2443–2452 (2007)

    Article  Google Scholar 

  7. Izhikevich, E.M., Gally, J.A., Edelman, G.M.: Spike-timing dynamics of neuronal groups. Cerebral Cortex 14, 933–944 (2004)

    Article  Google Scholar 

  8. Rieke, F., Warland, D., de Ruyter van Steveninck, R.R., Bialek, W.: Spikes: exploring the neural code. MIT Press, Cambridge (1997)

    Google Scholar 

  9. Schultz, W.: Predictive reward signal of dopamine neurons. Journal of Neurophysiology 80, 1–27 (1998)

    Google Scholar 

  10. Schultz, W., Apicella, P., Ljungberg, T.: Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience 13, 900–913 (1993)

    Google Scholar 

  11. Seth, A.K.: Evolving action selection and selective attention without actions, attention, or selection. In: Pfeifer, R., Blumberg, B., Meyer, J.-A., Wilson, S.W. (eds.) From Animals to Animats 5: Proceedings of the Fifth International Conference on the Simulation of Adaptive Behaviour, pp. 139–147. MIT Press, Cambridge (1998)

    Google Scholar 

  12. Seth, A.K., McKinstry, J.L., Edelman, G.M., Krichmar, J.L.: Visual binding through reentrant connectivity and dynamic synchronization in a brain-based device. Cerebral Cortex 14, 1185–1199 (2004)

    Article  Google Scholar 

  13. Skinner, B.F.: The Behavior of Organisms. Appleton-Century-Crofts, New York (1938)

    Google Scholar 

  14. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  15. VanRullen, R., Thorpe, S.J.: Rate coding vs temporal order coding: what the retinal ganglion cells tell the visual cortex. Neural Computation 13(6), 1255–1283 (2001)

    Article  Google Scholar 

  16. Van Rullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Minoru Asada John C. T. Hallam Jean-Arcady Meyer Jun Tani

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chorley, P., Seth, A.K. (2008). Closing the Sensory-Motor Loop on Dopamine Signalled Reinforcement Learning. In: Asada, M., Hallam, J.C.T., Meyer, JA., Tani, J. (eds) From Animals to Animats 10. SAB 2008. Lecture Notes in Computer Science(), vol 5040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69134-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69134-1_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69133-4

  • Online ISBN: 978-3-540-69134-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics