Reinforcement Learning in Cortical Networks

Senn, Walter; Pfister, Jean-Pascal

doi:10.1007/978-1-4614-7320-6_580-2

Walter Senn³ &
Jean-Pascal Pfister^3,4

224 Accesses

Synonyms

Policy gradient methods; Reward-based learning; Temporal difference (TD) learning; Trial-and-error learning

Definition

Reinforcement learning represents a basic paradigm of learning in artificial intelligence and biology. The paradigm considers an agent (robot, human, animal) that acts in a typically stochastic environment and receives rewards when reaching certain states. The agent’s goal is to maximize the expected reward by choosing the optimal action at any given state. In a cortical implementation, the states are defined by sensory stimuli that feed into a neuronal network, and after the network activity is settled, an action is read out. Learning consists in adapting the synaptic connection strengths into and within the neuronal network based on a (typically binary) feedback about the appropriateness of the chosen action. Policy gradient and temporal difference learning are two methods for deriving synaptic plasticity rules that maximize the expected reward in response...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Baxter J, Bartlett P (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
Google Scholar
Castro D, Volkinshtein S, Meir R (2009) Temporal difference based actor critic learning: convergence and neural implementation. In: Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 385–392
Google Scholar
Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879
Article PubMed CAS PubMed Central Google Scholar
Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185–196
Article PubMed CAS Google Scholar
Fiete IR, Seung HS (2006) Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys Rev Lett 97:048104
Article PubMed Google Scholar
Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468–1502
Article PubMed Google Scholar
Frémaux N, Sprekeler H, Gerstner W (2010) Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 30:13326–13337
Article PubMed Google Scholar
Frémaux N, Sprekeler H, Gerstner W (2013) Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLoS Comput Biol 9:e1003024
Article PubMed PubMed Central Google Scholar
Friedrich J, Urbanczik R, Senn W (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7:e1002092
Article PubMed CAS PubMed Central Google Scholar
Friedrich J, Urbanczik R, Senn W (2014) Code-specific learning rules improve action selection by populations of spiking neurons. Int J of Neural Syst 24:1–17
Article Google Scholar
Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452
Article PubMed Google Scholar
Kolodziejski C, Porr B, Worgotter F (2009) On the asymptotic equivalence between differential hebbian and temporal difference learning. Neural Comput 21:1173–1202
Article PubMed Google Scholar
Legenstein R, Pecevski D, Maass W (2008) A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput Biol 4:e1000180
Article PubMed PubMed Central Google Scholar
Pfister J, Toyoizumi T, Barber D, Gerstner W (2006) Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 18:1318–1348
Article PubMed Google Scholar
Potjans W, Morrison A, Diesmann M (2009) A spiking neural network model of an actor-critic learning agent. Neural Comput 21:301–339
Article PubMed Google Scholar
Potjans W, Diesmann M, Morrison A (2011) An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 7:e1001133
Article PubMed CAS PubMed Central Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Article PubMed CAS Google Scholar
Seung HS (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40:1063–1073
Article PubMed CAS Google Scholar
Sjöström J, Gerstner W (2010) Spike-timing dependent plasticity. Scholarpedia 5:1362
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
Google Scholar
Tanimoto H, Heisenberg M, Gerber B (2004) Experimental psychology: event timing turns punishment to reward. Nature 430:983
Article PubMed CAS Google Scholar
Urbanczik R, Senn W (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12:250–252
Article PubMed CAS Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256
Google Scholar
Wunderlich K, Dayan P, Dolan RJ (2012) Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15:786–791
Article PubMed CAS PubMed Central Google Scholar
Xie X, Seung HS (2004) Learning in neural networks by reinforcement of irregular spiking. Phys Rev E Stat Nonlin Soft Matter Phys 69:041909
Article PubMed Google Scholar

Download references

Acknowledgments

This work was supported by the Swiss National Science Foundation with grants 31003A_133094 and CRSII2_147636 to W.S and Grants PZ00P3_137200 and PP00P3_150637 to J.-P.P. We thank Robert Urbanczik and Johannes Friedrich for valuable comments on the manuscript.

Author information

Authors and Affiliations

Institut fur Physiologie, Universität Bern, Bern, Switzerland
Walter Senn & Jean-Pascal Pfister
Theoretical Neuroscience Group, Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
Jean-Pascal Pfister

Authors

Walter Senn
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pascal Pfister
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walter Senn .

Editor information

Editors and Affiliations

Atlanta, Georgia, USA
Dieter Jaeger
Department of Biomedical Engineering, Florida International University, Miami, Florida, USA
Ranu Jung

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Senn, W., Pfister, JP. (2014). Reinforcement Learning in Cortical Networks. In: Jaeger, D., Jung, R. (eds) Encyclopedia of Computational Neuroscience. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7320-6_580-2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-2
Received: 20 May 2014
Accepted: 20 May 2014
Published: 17 September 2014
Publisher Name: Springer, New York, NY
Online ISBN: 978-1-4614-7320-6
eBook Packages: Springer Reference Biomedicine and Life SciencesReference Module Biomedical and Life Sciences

Publish with us

Policies and ethics

Chapter history

Latest
Reinforcement Learning in Cortical Networks

Published:

17 September 2014

DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-2
Original
Reinforcement Learning in Cortical Networks

Published:

25 March 2014

DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-1