Abstract
Evaluation of both immediate and future outcomes of an action is a critical requirement for intelligent behavior. We investigated brain mechanisms for reward prediction at different time scales in an fMRI experiment using a Markov decision task. When subjects learned actions from immediate rewards, significant activity was found in the lateral orbitofrontal cortex and the striatum. When subjects learned to acquire large future rewards despite small immediate losses, the dorsolateral prefrontal cortex, inferior parietal cortex, dorsal raphe nucleus, and cerebellum were also activated. Computational model-based regression analysis using the predicted future rewards and prediction errors estimated from subjects’ performance data revealed graded maps of time scale within the insula and the striatum, where ventroanterior parts were responsible for predicting immediate rewards and dorsoposterior parts for future rewards. These results suggest differential involvement of the cortico-basal ganglia loops in reward prediction at different time scales.
The original article first appeared in Nature Neuroscience 7(8):887–893, 2004. A newly written addendum has been added to this book chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker SC, Rogers RD, Owen AM, Frith CD, Dolan RJ, Frackowiak RS, Robbins TW (1996) Neural systems engaged by planning: a PET study of the Tower of London task. Neuropsychologia 34(6):515–526
Balleine BW, Dickinson A (2000) The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J Neurosci 20(23):8954–8964
Bechara A, Damasio H, Damasio AR (2000) Emotion, decision making and the orbitofrontal cortex. Cereb Cortex 10(3):295–307
Berns GS, McClure SM, Pagnoni G, Montague PR (2001) Predictability modulates human brain response to reward. J Neurosci 21(8):2793–2798
Breiter HC, Aharon I, Kahneman D, Dale A, Shizgal P (2001) Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30(2):619–639
Cardinal RN, Pennicott DR, Sugathapala CL, Robbins TW, Everitt BJ (2001) Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science 292(5526):2499–2501
Cavada C, Company T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Suarez F (2000) The anatomical connections of the macaque monkey orbitofrontal cortex. A review. Cereb Cortex 10(3):220–242
Celada P, Puig MV, Casanovas JM, Guillazo G, Artigas F (2001) Control of dorsal raphe serotonergic neurons by the medial prefrontal cortex: involvement of serotonin-1A, GABA(A), and glutamate receptors. J Neurosci 21(24):9917–9929
Chikama M, McFarland NR, Amaral DG, Haber SN (1997) Insular cortical projections to functional regions of the striatum correlate with cortical cytoarchitectonic organization in the primate. J Neurosci 17(24):9686–9705
Compan V, Segu L, Buhot MC, Daszuta A (1998) Selective increases in serotonin 5-HT1B/1D and 5-HT2A/2C binding sites in adult rat basal ganglia following lesions of serotonergic neurons. Brain Res 793(1–2):103–111
Critchley HD, Mathias CJ, Dolan RJ (2001) Neural activity in the human brain relating to uncertainty and arousal during anticipation. Neuron 29(2):537–545
Doya K (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732–739
Doya K (2002) Metalearning and neuromodulation. Neural Netw 15(4–6):495–506
Eagle DM, Humby T, Dunnett SB, Robbins TW (1999) Effects of regional striatal lesions on motor, motivational, and executive aspects of progressive-ratio performance in rats. Behav Neurosci 113(4):718–731
Elliott R, Friston KJ, Dolan RJ (2000) Dissociable neural responses in human reward systems. J Neurosci 20(16):6159–6165
Elliott R, Newman JL, Longe OA, Deakin JF (2003) Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study. J Neurosci 23(1):303–307
Evenden JL, Ryan CN (1996) The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology (Berl) 128(2):161–170
Friston KJ, Holmes AP, Worsley KJ, Poline JP, Frith CD, Frackowiak RSJ (1994) Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2:189–210
Haber SN, Kunishio K, Mizobuchi M, Lynd-Balta E (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci 15(7 Pt 1):4851–4867
Hanakawa T, Honda M, Sawamoto N, Okada T, Yonekura Y, Fukuyama H, Shibasaki H (2002) The role of rostral Brodmann area 6 in mental-operation tasks: an integrative neuroimaging approach. Cereb Cortex 12(11):1157–1170
Haruno M, Kuroda T, Doya K, Toyama K, Kimura M, Samejima K, Imamizu H, Kawato M (2004) A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J Neurosci 24(7):1660–1665
Hikosaka K, Watanabe M (2000) Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb Cortex 10(3):263–271
Hikosaka O, Nakahara H, Rand MK, Sakai K, Lu X, Nakamura K, Miyachi S, Doya K (1999) Parallel neural networks for learning sequential procedures. Trends Neurosci 22(10):464–471
Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia, Computational neuroscience. MIT Press, Cambridge, MA, pp 249–270
Knutson B, Adams CM, Fong GW, Hommer D (2001) Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J Neurosci 21(16):RC159
Knutson B, Fong GW, Bennett SM, Adams CM, Hommer D (2003) A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI. Neuroimage 18(2):263–272
Koepp MJ, Gunn RN, Lawrence AD, Cunningham VJ, Dagher A, Jones T, Brooks DJ, Bench CJ, Grasby PM (1998) Evidence for striatal dopamine release during a video game. Nature 393(6682):266–268
Martin-Ruiz R, Puig MV, Celada P, Shapiro DA, Roth BL, Mengod G, Artigas F (2001) Control of serotonergic function in medial prefrontal cortex by serotonin-2A receptors through a glutamate-dependent mechanism. J Neurosci 21(24):9856–9866
Matsumoto K, Suzuki W, Tanaka K (2003) Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301(5630):229–232
McClure SM, Berns GS, Montague PR (2003) Temporal prediction errors in a passive learning task activate human striatum. Neuron 38(2):339–346
Mesulam MM, Mufson EJ (1982) Insula of the old world monkey. III: efferent cortical output and comments on function. J Comp Neurol 212(1):38–52
Middleton FA, Strick PL (2000) Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res Brain Res Rev 31(2–3):236–250
Mijnster MJ, Raimundo AG, Koskuba K, Klop H, Docter GJ, Groenewegen HJ, Voorn P (1997) Regional and cellular distribution of serotonin 5-hydroxytryptamine2a receptor mRNA in the nucleus accumbens, olfactory tubercle, and caudate putamen of the rat. J Comp Neurol 389(1):1–11
Mobini S, Chiang TJ, Ho MY, Bradshaw CM, Szabadi E (2000) Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 152(4):390–397
Mobini S, Body S, Ho MY, Bradshaw CM, Szabadi E, Deakin JF, Anderson IM (2002) Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 160(3):290–298
O’Doherty JP, Deichmann R, Critchley HD, Dolan RJ (2002) Neural responses during anticipation of a primary taste reward. Neuron 33(5):815–826
O’Doherty J, Critchley H, Deichmann R, Dolan RJ (2003a) Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J Neurosci 23(21):7931–7939
O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003b) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329–337
Owen AM, Doyon J, Petrides M, Evans AC (1996) Planning and spatial working memory: a positron emission tomography study in humans. Eur J Neurosci 8(2):353–364
Pagnoni G, Zink CF, Montague PR, Berns GS (2002) Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci 5(2):97–98
Pears A, Parkinson JA, Hopewell L, Everitt BJ, Roberts AC (2003) Lesions of the orbitofrontal but not medial prefrontal cortex disrupt conditioned reinforcement in primates. J Neurosci 23(35):11189–11201
Reynolds JN, Wickens JR (2002) Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw 15(4–6):507–521
Rogers RD, Everitt BJ, Baldacchino A, Blackshaw AJ, Swainson R, Wynne K, Baker NB, Hunter J, Carthy T, Booker E, London M, Deakin JF, Sahakian BJ, Robbins TW (1999a) Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology 20(4):322–339
Rogers RD, Owen AM, Middleton HC, Williams EJ, Pickard JD, Sahakian BJ, Robbins TW (1999b) Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J Neurosci 19(20):9029–9038
Rolls ET (2000) The orbitofrontal cortex and reward. Cereb Cortex 10(3):284–294
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599
Shidara M, Richmond BJ (2002) Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296(5573):1709–1711
Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press, Cambridge, MA
Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S (2004) Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7(8):887–893
Tanaka SC, Schweighofer N, Asahi S, Shishida K, Okamoto Y, Yamawaki S, Doya K (2007) Serotonin differentially regulates short- and long-term prediction of rewards in the ventral and dorsal striatum. PLoS One 2(12):e1333
Tremblay L, Schultz W (2000) Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J Neurophysiol 83(4):1864–1876
Ullsperger M, von Cramon DY (2003) Error monitoring using external feedback: specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J Neurosci 23(10):4308–4314
Acknowledgments
We thank K. Samejima, N. Schweighofer, M. Haruno, H. Imamizu, S. Higuchi, T. Yoshioka, T. Chaminade, and M. Kawato for helpful discussions and technical advice. This research was funded by “Creating the Brain”, Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
1 Addendum: Recent Developments This addendum has been newly written by Saori C. Tanaka for this book chapter (partly taken from the doctoral dissertation “Functional model of serotonin in human reward system based on reinforcement learning theory” by Saori C. Tanaka, 2006).
This addendum has been newly written by Saori C. Tanaka for this book chapter (partly taken from the doctoral dissertation “Functional model of serotonin in human reward system based on reinforcement learning theory” by Saori C. Tanaka, 2006).
This addendum has been newly written by Saori C. Tanaka for this book chapter (partly taken from the doctoral dissertation “Functional model of serotonin in human reward system based on reinforcement learning theory” by Saori C. Tanaka, 2006).
We hypothesized that different cortico-basal ganglia loops are involved in reward prediction at different time scales simultaneously, and one of these time scales is chosen by serotonergic modulation on parallel loops and used in actual action selection. To elucidate the effects of serotonin on the parallel cortico-striatum loop mechanisms, we controlled subjects’ serotonin levels by dietary tryptophan (the precursor of serotonin) manipulation, and measured brain activity at different serotonin levels during choice tasks for both immediate-small reward and delayed-large reward (Experiment 2) (Tanaka et al. 2007). Using a regression analysis of reward prediction signals, we found that while the activity in the ventral part of the striatum correlated strongly with short-term reward prediction at low serotonin levels, those of the dorsal part strongly correlated with long-term reward prediction at high serotonin levels. This result supports the possibility that serotonin controls the time scale of reward prediction by differentially regulating the activity within the striatum.
We found similar graded time-scale maps for reward prediction in the striatum in our previous experiment (Experiment 1) (Tanaka et al. 2004) and later experiment (Experiment 2) (Tanaka et al. 2007). In both maps, the ventral parts are correlated with reward prediction at shorter time scales, indicated by smaller Îł values, whereas dorsal parts are correlated with reward prediction at longer time scales (larger Îł values). Are both maps graded on the same time scale? That is, is a particular part of the graded map involved in reward prediction at a particular time scale? If so, a question arises as to whether this map is graded in theoretical time or real time. To answer these questions, we verify the graded maps in the striatum that we found in Experiments 1 and 2.
Fig. 22.8
The number of voxels at each z-level that were significantly correlated with reward prediction at each time scale in Experiment 1 and 2 in (a) Îł-grading and (b) Ď„-grading. Colored lines show the median z-coordinate of voxel distribution with each time scale. Although there are gradients of time scales from ventral (low z-level) to dorsal (high z-level) both in Experiments1 and 2, we can see good consistency of time scales between Experiments 1 and 2 only in Îł-grading (Note that different color scales are used in Îł-grading and Ď„-grading)
Fig. 22.9
The median z-coordinate of voxel distribution with each time scale. (a) In the Îł-grading, we can see good fitting of data in both Experiments 1 (*) and 2 (â—‹) by the same function. (b) In the Ď„-grading, in contrast, this seems difficult to be explained by the same function
These results indicate that particular parts of the striatum are involved in reward prediction not at absolute time scales but at relative time scales depending on the task. In the real world, we need to solve problems with variable time scales. At some times we choose an action producing a reward after several seconds or minutes, and at other times, we make decisions that reap rewards several years later. In this case, the relative grading of a time scale may be effective because the broader region of the striatum can be engaged to compute reward prediction with a limited number of striatal neurons.
Rights and permissions
Copyright information
© 2016 Springer Japan
About this chapter
Cite this chapter
Tanaka, S.C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S. (2016). Prediction of Immediate and Future Rewards Differentially Recruits Cortico-Basal Ganglia Loops. In: Ikeda, S., Kato, H., Ohtake, F., Tsutsui, Y. (eds) Behavioral Economics of Preferences, Choices, and Happiness. Springer, Tokyo. https://doi.org/10.1007/978-4-431-55402-8_22
Download citation
DOI: https://doi.org/10.1007/978-4-431-55402-8_22
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-55401-1
Online ISBN: 978-4-431-55402-8
eBook Packages: Economics and FinanceEconomics and Finance (R0)