Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning

Jitsev, Jenia; Abraham, Nobi; Morrison, Abigail; Tittgemeyer, Marc

doi:10.1186/1471-2202-14-S1-P199

Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning

Poster presentation
Open access
Published: 08 July 2013

Volume 14, article number P199, (2013)
Cite this article

Download PDF

You have full access to this open access article

BMC Neuroscience Aims and scope Submit manuscript

Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning

Download PDF

Jenia Jitsev¹,
Nobi Abraham¹,
Abigail Morrison^2,3 &
…
Marc Tittgemeyer¹

1635 Accesses
Explore all metrics

The basal ganglia network is thought to be involved in adaptation of organism's behavior when facing its positive and negative consequences, that is, in reinforcement learning. It has been hypothesized that dopamine (DA) modulated plasticity of synapses projecting from different cortical areas to the input nuclei of the basal ganglia, the striatum, plays a central role in this form of learning, being responsible for updating future outcome expectations and action preferences. In this scheme, DA transmission is considered to convey a prediction error signal that is generated if internal expectations do not match the outcomes observed after action execution. So far, there has been no satisfying model for what neural circuits computing this signal within the basal ganglia may look like, how this computation is performed and what is the mechanistic role of DA release in adapting the system towards optimal behavior in a given task.

Aiming towards a model of a canonical circuit for learning task-conform behavior from both reward and punishment, we extended a previously introduced spiking actor-critic network model of the basal ganglia [1] to contain the segregation of both the dorsal (actor) and ventral (critic) striatum into populations of D1 and D2 medium spiny neurons (MSNs). This segregation allows explicit, separate representation of both positive and negative expected outcomes by the distinct populations in the ventral striatum. The positive and negative components of expected outcome were fed to dopamine (DA) neurons in SNc/VTA region, which compute and signal reward prediction error by DA release. Based on recent experimental work [2], DA level was assumed to modulate plasticity of D1 and D2 synapses in opposing way, inducing LTP on D1 and LTD on D2 synapses if being high and vice versa if being low. Crucially, this form of opponent plasticity implements temporal-difference (TD)-like update of both positive and negative outcome expectations separately and performs appropriate action selection adaptation.

We implemented the network in the NEST simulator [3] using leaky integrate-and-fire spiking neurons and designed a battery of experiments involving application of reward and punishment in various grid world tasks. In each task, an agent had to explore the states and learn to maximize the total reward obtained. Number of states, magnitudes and delays of reward and punishment were manipulated across different tasks. We demonstrate that across the tasks the network can learn both to approach the delayed rewards while consequently avoiding punishments, the latter posing severe difficulties for the previous model without D1/D2 segregation [1]. Thus, the spiking neural network model highlights the functional role of D1/D2 MSN segregation within the striatum in implementing appropriate TD-like learning from both reward and punishment and explains necessity for opponent direction of DA-dependent plasticity found at synapses converging on distinct striatal MSN types. This modeling approach can be extended in the future work to study how abnormal D1/D2 plasticity may lead to a reorganization of the basal ganglia network towards pathological, dysfunctional states, like for instance those observed in Parkinson disease under condition of progressive dopamine depletion.

References

Potjans W, Diesmann M, Morrison A: An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol. 2011, 7-
Google Scholar
Shen W, Flajolet M, Greengard P, Surmeier DJ: Dichotomous dopaminergic control of striatal synaptic plasticity. Science. 2008, 321: 848-851. 10.1126/science.1160575.
Article PubMed Central CAS PubMed Google Scholar
Gewaltig M-O, Diesmann M: NEST. Scholarpedia. 2007, 2 (4): 1430-10.4249/scholarpedia.1430.
Article Google Scholar

Download references

Acknowledgements

The work was supported by the German Research Foundation (DFG, clinical research unit KFO 219); partial funding was provided by the Helmholtz Alliance on Systems Biology (Germany), Neurex and the Junior Professor Program of Baden-Württemberg.

Author information

Authors and Affiliations

Cortical Networks Group, Max-Planck-Institute for Neurological Research, Gleueler Str. 50, 50931, Cologne, Germany
Jenia Jitsev, Nobi Abraham & Marc Tittgemeyer
Functional Neural Circuits Group, Institute of Neuroscience and Medicine (INM-6), Computational and Systems Neuroscience, Research Center Jülich, 52425, Jülich, Germany
Abigail Morrison
Simulation Lab Neuroscience, Bernstein Facility Simulation and Database Technology (BFSD), Research Center Jülich, 52425, Jülich, Germany
Abigail Morrison

Authors

Jenia Jitsev
View author publications
You can also search for this author in PubMed Google Scholar
Nobi Abraham
View author publications
You can also search for this author in PubMed Google Scholar
Abigail Morrison
View author publications
You can also search for this author in PubMed Google Scholar
Marc Tittgemeyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jenia Jitsev.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Jitsev, J., Abraham, N., Morrison, A. et al. Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning. BMC Neurosci 14 (Suppl 1), P199 (2013). https://doi.org/10.1186/1471-2202-14-S1-P199

Download citation

Published: 08 July 2013
DOI: https://doi.org/10.1186/1471-2202-14-S1-P199

Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Functional role of opponent, dopamine modulated D1/D2 plasticity in reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation