Skip to main content

Reinforcement Learning in Cortical Networks

  • Living reference work entry
  • Latest version View entry history
  • First Online:
Encyclopedia of Computational Neuroscience

Synonyms

Policy gradient methods; Reward-based learning; Temporal difference (TD) learning; Trial-and-error learning

Definition

Reinforcement learning represents a basic paradigm of learning in artificial intelligence and biology. The paradigm considers an agent (robot, human, animal) that acts in a typically stochastic environment and receives rewards when reaching certain states. The agent’s goal is to maximize the expected reward by choosing the optimal action at any given state. In a cortical implementation, the states are defined by sensory stimuli that feed into a neuronal network, and after the network activity is settled, an action is read out. Learning consists in adapting the synaptic connection strengths into and within the neuronal network based on a (typically binary) feedback about the appropriateness of the chosen action. Policy gradient and temporal difference learning are two methods for deriving synaptic plasticity rules that maximize the expected reward in response...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Baxter J, Bartlett P (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350

    Google Scholar 

  • Castro D, Volkinshtein S, Meir R (2009) Temporal difference based actor critic learning: convergence and neural implementation. In: Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 385–392

    Google Scholar 

  • Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185–196

    Article  PubMed  CAS  Google Scholar 

  • Fiete IR, Seung HS (2006) Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys Rev Lett 97:048104

    Article  PubMed  Google Scholar 

  • Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468–1502

    Article  PubMed  Google Scholar 

  • Frémaux N, Sprekeler H, Gerstner W (2010) Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 30:13326–13337

    Article  PubMed  Google Scholar 

  • Frémaux N, Sprekeler H, Gerstner W (2013) Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLoS Comput Biol 9:e1003024

    Article  PubMed  PubMed Central  Google Scholar 

  • Friedrich J, Urbanczik R, Senn W (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7:e1002092

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Friedrich J, Urbanczik R, Senn W (2014) Code-specific learning rules improve action selection by populations of spiking neurons. Int J of Neural Syst 24:1–17

    Article  Google Scholar 

  • Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452

    Article  PubMed  Google Scholar 

  • Kolodziejski C, Porr B, Worgotter F (2009) On the asymptotic equivalence between differential hebbian and temporal difference learning. Neural Comput 21:1173–1202

    Article  PubMed  Google Scholar 

  • Legenstein R, Pecevski D, Maass W (2008) A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput Biol 4:e1000180

    Article  PubMed  PubMed Central  Google Scholar 

  • Pfister J, Toyoizumi T, Barber D, Gerstner W (2006) Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 18:1318–1348

    Article  PubMed  Google Scholar 

  • Potjans W, Morrison A, Diesmann M (2009) A spiking neural network model of an actor-critic learning agent. Neural Comput 21:301–339

    Article  PubMed  Google Scholar 

  • Potjans W, Diesmann M, Morrison A (2011) An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 7:e1001133

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599

    Article  PubMed  CAS  Google Scholar 

  • Seung HS (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40:1063–1073

    Article  PubMed  CAS  Google Scholar 

  • Sjöström J, Gerstner W (2010) Spike-timing dependent plasticity. Scholarpedia 5:1362

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA

    Google Scholar 

  • Tanimoto H, Heisenberg M, Gerber B (2004) Experimental psychology: event timing turns punishment to reward. Nature 430:983

    Article  PubMed  CAS  Google Scholar 

  • Urbanczik R, Senn W (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12:250–252

    Article  PubMed  CAS  Google Scholar 

  • Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256

    Google Scholar 

  • Wunderlich K, Dayan P, Dolan RJ (2012) Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15:786–791

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Xie X, Seung HS (2004) Learning in neural networks by reinforcement of irregular spiking. Phys Rev E Stat Nonlin Soft Matter Phys 69:041909

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported by the Swiss National Science Foundation with grants 31003A_133094 and CRSII2_147636 to W.S and Grants PZ00P3_137200 and PP00P3_150637 to J.-P.P. We thank Robert Urbanczik and Johannes Friedrich for valuable comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walter Senn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Senn, W., Pfister, JP. (2014). Reinforcement Learning in Cortical Networks. In: Jaeger, D., Jung, R. (eds) Encyclopedia of Computational Neuroscience. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7320-6_580-2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-2

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Online ISBN: 978-1-4614-7320-6

  • eBook Packages: Springer Reference Biomedicine and Life SciencesReference Module Biomedical and Life Sciences

Publish with us

Policies and ethics

Chapter history

  1. Latest

    Reinforcement Learning in Cortical Networks
    Published:
    17 September 2014

    DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-2

  2. Original

    Reinforcement Learning in Cortical Networks
    Published:
    25 March 2014

    DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-1