Prediction of Immediate and Future Rewards Differentially Recruits Cortico-Basal Ganglia Loops

Tanaka, Saori C.; Doya, Kenji; Okada, Go; Ueda, Kazutaka; Okamoto, Yasumasa; Yamawaki, Shigeto

doi:10.1007/978-4-431-55402-8_22

Saori C. Tanaka⁵,
Kenji Doya⁶,
Go Okada⁷,
Kazutaka Ueda⁷,
Yasumasa Okamoto⁷ &
…
Shigeto Yamawaki⁷

3211 Accesses
16 Citations
2 Altmetric

Abstract

Evaluation of both immediate and future outcomes of an action is a critical requirement for intelligent behavior. We investigated brain mechanisms for reward prediction at different time scales in an fMRI experiment using a Markov decision task. When subjects learned actions from immediate rewards, significant activity was found in the lateral orbitofrontal cortex and the striatum. When subjects learned to acquire large future rewards despite small immediate losses, the dorsolateral prefrontal cortex, inferior parietal cortex, dorsal raphe nucleus, and cerebellum were also activated. Computational model-based regression analysis using the predicted future rewards and prediction errors estimated from subjects’ performance data revealed graded maps of time scale within the insula and the striatum, where ventroanterior parts were responsible for predicting immediate rewards and dorsoposterior parts for future rewards. These results suggest differential involvement of the cortico-basal ganglia loops in reward prediction at different time scales.

The original article first appeared in Nature Neuroscience 7(8):887–893, 2004. A newly written addendum has been added to this book chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baker SC, Rogers RD, Owen AM, Frith CD, Dolan RJ, Frackowiak RS, Robbins TW (1996) Neural systems engaged by planning: a PET study of the Tower of London task. Neuropsychologia 34(6):515–526
Article Google Scholar
Balleine BW, Dickinson A (2000) The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J Neurosci 20(23):8954–8964
Google Scholar
Bechara A, Damasio H, Damasio AR (2000) Emotion, decision making and the orbitofrontal cortex. Cereb Cortex 10(3):295–307
Article Google Scholar
Berns GS, McClure SM, Pagnoni G, Montague PR (2001) Predictability modulates human brain response to reward. J Neurosci 21(8):2793–2798
Google Scholar
Breiter HC, Aharon I, Kahneman D, Dale A, Shizgal P (2001) Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30(2):619–639
Article Google Scholar
Cardinal RN, Pennicott DR, Sugathapala CL, Robbins TW, Everitt BJ (2001) Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science 292(5526):2499–2501
Article Google Scholar
Cavada C, Company T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Suarez F (2000) The anatomical connections of the macaque monkey orbitofrontal cortex. A review. Cereb Cortex 10(3):220–242
Article Google Scholar
Celada P, Puig MV, Casanovas JM, Guillazo G, Artigas F (2001) Control of dorsal raphe serotonergic neurons by the medial prefrontal cortex: involvement of serotonin-1A, GABA(A), and glutamate receptors. J Neurosci 21(24):9917–9929
Google Scholar
Chikama M, McFarland NR, Amaral DG, Haber SN (1997) Insular cortical projections to functional regions of the striatum correlate with cortical cytoarchitectonic organization in the primate. J Neurosci 17(24):9686–9705
Google Scholar
Compan V, Segu L, Buhot MC, Daszuta A (1998) Selective increases in serotonin 5-HT1B/1D and 5-HT2A/2C binding sites in adult rat basal ganglia following lesions of serotonergic neurons. Brain Res 793(1–2):103–111
Article Google Scholar
Critchley HD, Mathias CJ, Dolan RJ (2001) Neural activity in the human brain relating to uncertainty and arousal during anticipation. Neuron 29(2):537–545
Article Google Scholar
Doya K (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732–739
Article Google Scholar
Doya K (2002) Metalearning and neuromodulation. Neural Netw 15(4–6):495–506
Article Google Scholar
Eagle DM, Humby T, Dunnett SB, Robbins TW (1999) Effects of regional striatal lesions on motor, motivational, and executive aspects of progressive-ratio performance in rats. Behav Neurosci 113(4):718–731
Article Google Scholar
Elliott R, Friston KJ, Dolan RJ (2000) Dissociable neural responses in human reward systems. J Neurosci 20(16):6159–6165
Google Scholar
Elliott R, Newman JL, Longe OA, Deakin JF (2003) Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study. J Neurosci 23(1):303–307
Google Scholar
Evenden JL, Ryan CN (1996) The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology (Berl) 128(2):161–170
Article Google Scholar
Friston KJ, Holmes AP, Worsley KJ, Poline JP, Frith CD, Frackowiak RSJ (1994) Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2:189–210
Article Google Scholar
Haber SN, Kunishio K, Mizobuchi M, Lynd-Balta E (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci 15(7 Pt 1):4851–4867
Google Scholar
Hanakawa T, Honda M, Sawamoto N, Okada T, Yonekura Y, Fukuyama H, Shibasaki H (2002) The role of rostral Brodmann area 6 in mental-operation tasks: an integrative neuroimaging approach. Cereb Cortex 12(11):1157–1170
Article Google Scholar
Haruno M, Kuroda T, Doya K, Toyama K, Kimura M, Samejima K, Imamizu H, Kawato M (2004) A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J Neurosci 24(7):1660–1665
Article Google Scholar
Hikosaka K, Watanabe M (2000) Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb Cortex 10(3):263–271
Article Google Scholar
Hikosaka O, Nakahara H, Rand MK, Sakai K, Lu X, Nakamura K, Miyachi S, Doya K (1999) Parallel neural networks for learning sequential procedures. Trends Neurosci 22(10):464–471
Article Google Scholar
Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia, Computational neuroscience. MIT Press, Cambridge, MA, pp 249–270
Google Scholar
Knutson B, Adams CM, Fong GW, Hommer D (2001) Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J Neurosci 21(16):RC159
Google Scholar
Knutson B, Fong GW, Bennett SM, Adams CM, Hommer D (2003) A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI. Neuroimage 18(2):263–272
Article Google Scholar
Koepp MJ, Gunn RN, Lawrence AD, Cunningham VJ, Dagher A, Jones T, Brooks DJ, Bench CJ, Grasby PM (1998) Evidence for striatal dopamine release during a video game. Nature 393(6682):266–268
Article Google Scholar
Martin-Ruiz R, Puig MV, Celada P, Shapiro DA, Roth BL, Mengod G, Artigas F (2001) Control of serotonergic function in medial prefrontal cortex by serotonin-2A receptors through a glutamate-dependent mechanism. J Neurosci 21(24):9856–9866
Google Scholar
Matsumoto K, Suzuki W, Tanaka K (2003) Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301(5630):229–232
Article Google Scholar
McClure SM, Berns GS, Montague PR (2003) Temporal prediction errors in a passive learning task activate human striatum. Neuron 38(2):339–346
Article Google Scholar
Mesulam MM, Mufson EJ (1982) Insula of the old world monkey. III: efferent cortical output and comments on function. J Comp Neurol 212(1):38–52
Article Google Scholar
Middleton FA, Strick PL (2000) Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res Brain Res Rev 31(2–3):236–250
Article Google Scholar
Mijnster MJ, Raimundo AG, Koskuba K, Klop H, Docter GJ, Groenewegen HJ, Voorn P (1997) Regional and cellular distribution of serotonin 5-hydroxytryptamine2a receptor mRNA in the nucleus accumbens, olfactory tubercle, and caudate putamen of the rat. J Comp Neurol 389(1):1–11
Article Google Scholar
Mobini S, Chiang TJ, Ho MY, Bradshaw CM, Szabadi E (2000) Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 152(4):390–397
Article Google Scholar
Mobini S, Body S, Ho MY, Bradshaw CM, Szabadi E, Deakin JF, Anderson IM (2002) Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 160(3):290–298
Article Google Scholar
O’Doherty JP, Deichmann R, Critchley HD, Dolan RJ (2002) Neural responses during anticipation of a primary taste reward. Neuron 33(5):815–826
Article Google Scholar
O’Doherty J, Critchley H, Deichmann R, Dolan RJ (2003a) Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J Neurosci 23(21):7931–7939
Google Scholar
O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003b) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329–337
Article Google Scholar
Owen AM, Doyon J, Petrides M, Evans AC (1996) Planning and spatial working memory: a positron emission tomography study in humans. Eur J Neurosci 8(2):353–364
Article Google Scholar
Pagnoni G, Zink CF, Montague PR, Berns GS (2002) Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci 5(2):97–98
Article Google Scholar
Pears A, Parkinson JA, Hopewell L, Everitt BJ, Roberts AC (2003) Lesions of the orbitofrontal but not medial prefrontal cortex disrupt conditioned reinforcement in primates. J Neurosci 23(35):11189–11201
Google Scholar
Reynolds JN, Wickens JR (2002) Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw 15(4–6):507–521
Article Google Scholar
Rogers RD, Everitt BJ, Baldacchino A, Blackshaw AJ, Swainson R, Wynne K, Baker NB, Hunter J, Carthy T, Booker E, London M, Deakin JF, Sahakian BJ, Robbins TW (1999a) Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology 20(4):322–339
Article Google Scholar
Rogers RD, Owen AM, Middleton HC, Williams EJ, Pickard JD, Sahakian BJ, Robbins TW (1999b) Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J Neurosci 19(20):9029–9038
Google Scholar
Rolls ET (2000) The orbitofrontal cortex and reward. Cereb Cortex 10(3):284–294
Article Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599
Article Google Scholar
Shidara M, Richmond BJ (2002) Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296(5573):1709–1711
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press, Cambridge, MA
Google Scholar
Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S (2004) Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7(8):887–893
Article Google Scholar
Tanaka SC, Schweighofer N, Asahi S, Shishida K, Okamoto Y, Yamawaki S, Doya K (2007) Serotonin differentially regulates short- and long-term prediction of rewards in the ventral and dorsal striatum. PLoS One 2(12):e1333
Article Google Scholar
Tremblay L, Schultz W (2000) Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J Neurophysiol 83(4):1864–1876
Google Scholar
Ullsperger M, von Cramon DY (2003) Error monitoring using external feedback: specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J Neurosci 23(10):4308–4314
Google Scholar

Download references

Acknowledgments

We thank K. Samejima, N. Schweighofer, M. Haruno, H. Imamizu, S. Higuchi, T. Yoshioka, T. Chaminade, and M. Kawato for helpful discussions and technical advice. This research was funded by “Creating the Brain”, Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency.

Author information

Authors and Affiliations

ATR Brain Information Communication Research Lab. Group, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan
Saori C. Tanaka
Neural Computation Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa, 904-0495, Japan
Kenji Doya
Department of Psychiatry and Neurosciences, Hiroshima University, 1-2-3 Kasumi, Minamiku, Hiroshima, 734-8551, Japan
Go Okada, Kazutaka Ueda, Yasumasa Okamoto & Shigeto Yamawaki

Authors

Saori C. Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Doya
View author publications
You can also search for this author in PubMed Google Scholar
Go Okada
View author publications
You can also search for this author in PubMed Google Scholar
Kazutaka Ueda
View author publications
You can also search for this author in PubMed Google Scholar
Yasumasa Okamoto
View author publications
You can also search for this author in PubMed Google Scholar
Shigeto Yamawaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saori C. Tanaka .

Editor information

Editors and Affiliations

Inst. of Soc. and Econ. Research, Osaka University, Ibaraki, Osaka, Japan
Shinsuke Ikeda
Nagoya University Graduate School of Economics, Nagoya, Aichi, Japan
Hideaki Kiyoshi Kato
Institute of Social and Economic Research, Osaka University, Ibaraki, Osaka, Japan
Fumio Ohtake
Faculty of Economics, Konan University, Kobe, Hyogo, Japan
Yoshiro Tsutsui

1 Electronic Supplementary Material

ESM 1

Below is the link to the electronic supplementary material.

Supple Figures (eps) (ZIP 4453 kb)

1 Addendum: Recent Developments This addendum has been newly written by Saori C. Tanaka for this book chapter (partly taken from the doctoral dissertation “Functional model of serotonin in human reward system based on reinforcement learning theory” by Saori C. Tanaka, 2006).

This addendum has been newly written by Saori C. Tanaka for this book chapter (partly taken from the doctoral dissertation “Functional model of serotonin in human reward system based on reinforcement learning theory” by Saori C. Tanaka, 2006).

We hypothesized that different cortico-basal ganglia loops are involved in reward prediction at different time scales simultaneously, and one of these time scales is chosen by serotonergic modulation on parallel loops and used in actual action selection. To elucidate the effects of serotonin on the parallel cortico-striatum loop mechanisms, we controlled subjects’ serotonin levels by dietary tryptophan (the precursor of serotonin) manipulation, and measured brain activity at different serotonin levels during choice tasks for both immediate-small reward and delayed-large reward (Experiment 2) (Tanaka et al. 2007). Using a regression analysis of reward prediction signals, we found that while the activity in the ventral part of the striatum correlated strongly with short-term reward prediction at low serotonin levels, those of the dorsal part strongly correlated with long-term reward prediction at high serotonin levels. This result supports the possibility that serotonin controls the time scale of reward prediction by differentially regulating the activity within the striatum.

We found similar graded time-scale maps for reward prediction in the striatum in our previous experiment (Experiment 1) (Tanaka et al. 2004) and later experiment (Experiment 2) (Tanaka et al. 2007). In both maps, the ventral parts are correlated with reward prediction at shorter time scales, indicated by smaller γ values, whereas dorsal parts are correlated with reward prediction at longer time scales (larger γ values). Are both maps graded on the same time scale? That is, is a particular part of the graded map involved in reward prediction at a particular time scale? If so, a question arises as to whether this map is graded in theoretical time or real time. To answer these questions, we verify the graded maps in the striatum that we found in Experiments 1 and 2.

Fig. 22.8

The number of voxels at each z-level that were significantly correlated with reward prediction at each time scale in Experiment 1 and 2 in (a) γ-grading and (b) τ-grading. Colored lines show the median z-coordinate of voxel distribution with each time scale. Although there are gradients of time scales from ventral (low z-level) to dorsal (high z-level) both in Experiments1 and 2, we can see good consistency of time scales between Experiments 1 and 2 only in γ-grading (Note that different color scales are used in γ-grading and τ-grading)

Fig. 22.9

The median z-coordinate of voxel distribution with each time scale. (a) In the γ-grading, we can see good fitting of data in both Experiments 1 (*) and 2 (○) by the same function. (b) In the τ-grading, in contrast, this seems difficult to be explained by the same function

These results indicate that particular parts of the striatum are involved in reward prediction not at absolute time scales but at relative time scales depending on the task. In the real world, we need to solve problems with variable time scales. At some times we choose an action producing a reward after several seconds or minutes, and at other times, we make decisions that reap rewards several years later. In this case, the relative grading of a time scale may be effective because the broader region of the striatum can be engaged to compute reward prediction with a limited number of striatal neurons.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tanaka, S.C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S. (2016). Prediction of Immediate and Future Rewards Differentially Recruits Cortico-Basal Ganglia Loops. In: Ikeda, S., Kato, H., Ohtake, F., Tsutsui, Y. (eds) Behavioral Economics of Preferences, Choices, and Happiness. Springer, Tokyo. https://doi.org/10.1007/978-4-431-55402-8_22

Download citation

DOI: https://doi.org/10.1007/978-4-431-55402-8_22
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-55401-1
Online ISBN: 978-4-431-55402-8
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics